Lots of discussion about SB-1047, the new gpt2-chatbot on lmsys, and extending Llama-3-8B to 1m context, but otherwise no clear top story emerges. You can check out the WebSim/WorldSim podcast as Nous Research gets ready to relaunch it after briefly taking it down due to security issues.
Table of Contents
[TOC]
AI Reddit Recap
Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!
Advances in AI Models and Capabilities
- Yann LeCun predicts shift to AR interfaces with AI assistants: In /r/singularity, Yann LeCun says that in 10-15 years we will interact with intelligent assistants via AR glasses and bracelets instead of smartphones.
- Dolphin-2.9 model released based on Llama-3: In /r/LocalLLaMA, a new Dolphin-2.9 model based on Llama-3 was released, potentially fixing quality issues of the previous version.
- PixArt Sigma achieves Stable Diffusion 3.0 level with 0.6B parameters: In /r/singularity, the PixArt Sigma model achieves Stable Diffusion 3.0 level performance with only 0.6B parameters, complete prompt adherence, and can be used locally.
- Transformers can use meaningless filler tokens for algorithmic tasks: In /r/LocalLLaMA and /r/MachineLearning, it was shown that transformers can use meaningless filler tokens like ââŠâ in place of a chain of thought to solve algorithmic tasks, requiring specific dense supervision to converge.
Applications of AI
- AI-generated restaurant reviews can pass Turing test: In /r/MachineLearning and /r/singularity, a new study finds that AI-generated restaurant reviews can pass a Turing test, fooling both humans and AI detectors.
- Uber uses graph algorithms and learned embeddings for ETA prediction: In /r/MachineLearning, it was shared that Uber uses a 2-layer approach combining graph algorithms and learned embeddings to predict ETAs.
- Coca-Cola and Microsoft announce 5-year AI partnership: In /r/singularity, it was announced that The Coca-Cola Company and Microsoft are entering a 5-year partnership to accelerate cloud and generative AI initiatives.
Deploying and Optimizing AI Models
- Llama-3 70B model can run on 4GB GPU with AirLLM: In /r/LocalLLaMA, it was shown that the Llama-3 70B model can be run on a single 4GB GPU using AirLLM optimization techniques, without quantization or compression, but is very slow.
- Mistral.rs is fast LLM inference platform: In /r/singularity, Mistral.rs was introduced as a fast LLM inference platform with quantization, device support, and OpenAI API compatibility.
- Challenges moving LLMs from prototype to production: In /r/MachineLearning, a survey found that only 5% of LLMs make it from prototype to production, especially in enterprise settings, due to various challenges.
- EXL2 and GGUF quantization of Llama models compared: In /r/LocalLLaMA, EXL2 quantization of Llama-3 was found to perform the same as latest GGUF quantization in terms of perplexity vs model size, with both Llama-3 and Llama-2 degrading more with quantization compared to full precision.
Concerns and Challenges
- Eric Schmidt warns about AI agents communicating in own language: In /r/singularity, Eric Schmidt said that we should unplug computers if AI agents start talking to each other in a language we canât understand, which already happened with Facebook chatbots in 2017.
- OpenAI overcharged user, ignoring billing limit: In /r/OpenAI, a user reported being overcharged by OpenAI who did not respect their set billing limit, potentially leading to a class action lawsuit.
- California bill SB-1047 could impact open source AI: In /r/StableDiffusion, concerns were raised that California bill SB-1047, if passed, could negatively impact open source AI efforts.
AI Twitter Recap
all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
Prompt Engineering Techniques and Applications
- Reasoning and Multi-Step Problem Solving: @cwolferesearch outlines recent prompt engineering research for reasoning tasks, including zero-shot CoT prompting, selecting CoT exemplars based on complexity, progressive refinement of rationales, and decomposing complex tasks into sub-tasks.
- Tool Usage and API Integration: @cwolferesearch highlights research on teaching LLMs to leverage external tools and APIs, such as text-based APIs, natural language programs composed of tool calls, and code execution in sandboxed environments.
- Optimizing Context Window Usage: @cwolferesearch discusses studies on the impact of context window properties, such as the negative effects of irrelevant context, attention biases towards the beginning/end of prompts, and strategies for selecting optimal few-shot exemplars.
- Improving LLM-Assisted Writing: @cwolferesearch covers techniques for enhancing LLM-generated writing, such as outline generation and iterative filling, using smaller LLMs to generate âdirectional stimuliâ, and iteratively increasing information density in summaries.
Emerging Abilities and Scaling Laws in Large Language Models
- Emergent Abilities and Pretraining Loss: @_jasonwei discusses a paper that plots emergent abilities against pretraining loss, showing linear correlations for some benchmarks and emergent behavior at specific loss thresholds for others. Pretraining loss is suggested as a better metric than compute for comparing models.
- Potential Upper Bounds on Function Approximation: @jxmnop shares insights from a paper showing that vastly different architectures can produce identical performance at the same parameter count, suggesting we may be close to the upper bound of approximating functions given a certain amount of compute.
- Limitations and Potential Walls for Language Models: @bindureddy argues that language models may soon hit a wall due to the limits of human language, reasoning, and the inability to surpass a certain level on benchmarks like MMLU despite increased compute or data.
Advancements in Vision-Language Models and Video Understanding
- PLLaVA: Parameter-free LLaVA Extension to Videos: @_akhaliq introduces PLLaVA, which extends the LLaVA framework to video dense captioning without requiring extensive paired data. The approach leverages pre-trained 2D diffusion models and a pooling strategy to achieve state-of-the-art performance on video question-answering and captioning tasks.
- HaLo-NeRF: Learning Geometry-Guided Semantics: @_akhaliq presents HaLo-NeRF, a system that connects neural representations of landmark scenes with text descriptions to enable fine-grained understanding and localization of semantic regions. The approach harnesses vision-and-language models adapted for 3D-compatible segmentation and volumetric scene representation.
Techniques for Efficient Training and Deployment of Large Language Models
- FP6 Quantization for Efficient LLM Inference: @rohanpaul_ai shares a paper on using six-bit quantization (FP6) to reduce the size of LLMs while preserving model quality across various applications and model sizes. The paper introduces TC-FPx, a GPU kernel design scheme supporting float-point weights for various quantization bit-widths, enabling practical performance improvements during LLM inference.
- Proxy-Tuning: Efficient Customization of Large LMs: @rohanpaul_ai explains Proxy-Tuning, a lightweight decoding-time algorithm that achieves the result of directly tuning a large LM by using smaller tuned LMs to shift the original predictions. This approach allows for efficient customization of large, potentially proprietary LMs through decoding-time guidance.
- Parameter-Efficient Sparsity Crafting for Instruction Tuning: @rohanpaul_ai discusses a paper proposing Parameter-Efficient Sparsity Crafting (PESC), which converts dense models into sparse Mixture-of-Experts (MoE) models for efficient instruction tuning. PESC inserts adapters into each expert, updating only the adapter parameters, significantly reducing computational costs and memory requirements while achieving state-of-the-art performance.
Regulations and Policy
- California Bill 1047 Details: @nearcyan shared details on California Bill 1047 which has been fast-tracked. The bill covers all models made with 10^26 flops or similar performance, requires developers to assert models are safe under penalty of perjury, and creates a Frontier Model Division to report to.
- Concerns with California SB-1047: @jeremyphoward expressed concerns that California SB-1047 âSafe and Secure Innovation for Frontier Artificial Intelligence Models Actâ could do great harm to startups, American innovation, open source, and safety. The bill imposes overly broad definitions, misunderstands dual use, has restrictive requirements, and disincentivizes openness.
AI Discord Recap
A summary of Summaries of Summaries
1. Advancements in Large Language Models (LLMs) and AI Capabilities
-
Llama 3 has been extended to support a 1M token context window, showcasing the progress in handling longer sequences. Tutorials demonstrate using Retrieval-Augmented Generation (RAG) with Llama 3 and integrating it with web browsing capabilities via Langchain and Groq.
-
Microsoftâs Phi-3, the next generation of fast and capable models, has been openly released, amassing over 6K votes on the leaderboard. Discussions explore tokenizer changes in Llamafied versions for better chat application performance.
-
Snowflake Arctic, an enterprise-focused LLM, aims to provide cost-effective AI solutions for businesses, pushing the frontiers of enterprise AI adoption.
2. Model Optimization, Quantization, and Efficiency Techniques
-
Extensive discussions around quantization techniques like 4bit lora and 4bit qlora, with debates on their effects on model performance based on training extent. Binary Quantization is explored for creating smaller indexes for similarity searches.
-
DeepSpeedâs FP6 quantization promises quantized inference with similar throughput, generating excitement for improved efficiency.
-
Researchers present CPU-optimized LLMs capable of generating Python code using a Chain-of-Thought prompt method, highlighting the pursuit of efficient, low-cost models.
3. Open-Source AI Development and Community Collaboration
-
The Eleuther community compares LLM performance, discusses emergent abilities, and shares research on topics like redundant neural circuits and adversarial prompting against LLMs.
-
OpenAccess AI Collective delves into fine-tuning strategies, quantization methods, and tokenization challenges, with members sharing insights from repositories like axolotl and FastChat.
-
The LlamaIndex community explores techniques like multi-hop retrieval, knowledge graphs for long-term memory, and shares resources like an AWS workshop on LLM app development patterns.
4. Ethical Concerns and Regulatory Challenges in AI Development
-
LAION faces restrictions due to EU laws, limiting access to public compute clusters and prompting researchers to gravitate towards more active communities with ongoing experimentation.
-
Discussions around the proposed California SB-1047 bill and its potential harm to startups, open-source AI development, and American innovation, underscoring regulatory challenges.
5. Misc
-
CUDA C++ claims the spotlight: A YouTube lecture on CUDA C++ llm.cpp delves into optimizing LLM training, with promises of cleaner and faster code. Support materials and related discussions suggest significant performance improvements and readiness for scaling LLMs to gpt-large sizes.
-
Intelâs oneAPI spreads its wings: Intelâs oneAPI garners attention for offering a unified programming model across CPUs, GPUs, and FPGAs. Enthusiasm bubbles up for the upcoming Battlemage GPU lineup, and the oneAPI ecosystem welcomes contributions for cross-vendor support, with developer resources on GitHub and announcements over Codeplayâs official press release.
-
Machine Learning gig at InstaDeep: InstaDeep is on the hunt for Machine Learning Engineers versed in high performance ML, Bio AI, and custom CUDA kernels. They offer a stimulating environment and multiple positions for problem solvers ready to make real-world impacts, with applications open on the InstaDeep job portal.
-
AMD stokes the competitive fires: Discussions revolve around the AMD Instinct MI300Xâs potential for server environments and ROCmâs current state, with links to product pages and rental options hinting at a heated rivalry with NVIDIA. ROCm support and comparisons suggest AMDâs focus on greater accessibility and performance enhancement for developers.
-
Triton and PyTorch Forge Ahead: GitHub repositories such as unsloth and attorch emerge as treasure troves for those seeking Triton and PyTorch integrations. While flash-attn 2.5.8 earned compatibility accolades with PyTorch 2.3.0, discussions on optimal CUDA tensor indexing techniques and tensor gradient calculations in Triton reinforce the communityâs drive for efficiency.
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
-
Phi 3 Integration an Unsloth Triumph: Unsloth AI now supports Phi 3, delivering twice the speed with half the memory usage. Enthusiasts can explore the Colab notebook for detailed guidance.
-
Bilingual Model Makes a Splash: Thermostatic introduced NeuralTranslate_v0.2_GGUF, a bi-directional English-Spanish translation model that preserves Mistralâs reasoning without overfitting, all available on Hugging Face.
-
GPU optimization chatter: AI community debates best practices for minimizing VRAM usage, sharing insights on manual layer pruning, and discussing offloading techniques with code examples from Kolibrifyâs GitHub repository.
-
Dataset Dexterity: A tip for merging raw text and chat datasets to improving fine-tuning outcomes was shared, alongside a notion to use larger datasets for base models and smaller ones for instruct models. Thereâs also mention of offloading parts of language models to reduce inference memory, as explained with code in a GitHub repository.
-
Future Functionality Features: Suggestions for Unsloth AI included automatic optimization of hyperparameters like batch size and learning rate. Meanwhile, a community member humorously anticipated the addition of a cake-baking feature upon training completion.
CUDA MODE Discord
CUDA C++ claims the spotlight: A YouTube lecture on CUDA C++ llm.cpp delves into optimizing LLM training, with promises of cleaner and faster code. Support materials and related discussions suggest significant performance improvements and readiness for scaling LLMs to gpt-large sizes.
Intelâs oneAPI spreads its wings: Intelâs oneAPI garners attention for offering a unified programming model across CPUs, GPUs, and FPGAs. Enthusiasm bubbles up for the upcoming Battlemage GPU lineup, and the oneAPI ecosystem welcomes contributions for cross-vendor support, with developer resources on GitHub and announcements over Codeplayâs official press release.
Machine Learning gig at InstaDeep: InstaDeep is on the hunt for Machine Learning Engineers versed in high performance ML, Bio AI, and custom CUDA kernels. They offer a stimulating environment and multiple positions for problem solvers ready to make real-world impacts, with applications open on the InstaDeep job portal.
AMD stokes the competitive fires: Discussions revolve around the AMD Instinct MI300Xâs potential for server environments and ROCmâs current state, with links to product pages and rental options hinting at a heated rivalry with NVIDIA. ROCm support and comparisons suggest AMDâs focus on greater accessibility and performance enhancement for developers.
Triton and PyTorch Forge Ahead: GitHub repositories such as unsloth and attorch emerge as treasure troves for those seeking Triton and PyTorch integrations. While flash-attn 2.5.8 earned compatibility accolades with PyTorch 2.3.0, discussions on optimal CUDA tensor indexing techniques and tensor gradient calculations in Triton reinforce the communityâs drive for efficiency.
Perplexity AI Discord
Slow Pro Search Annoys Users: Perplexity AIâs Pro Search users are complaining of increased search times, lamenting that searches are taking up to 90 seconds across all engines, affecting the web client but not the mobile app.
Claude 3 Opus Chat: To Subscribe or Not?: Members debate the merit of subscribing to Claude 3 Opus chat, with some users reporting positive experiences, although no specific comparative features with the API version have been discussed.
New AI Model Anticipation: Thereâs keen interest in the potential integration of WizardLM 2 and LLama-3 70B Sonar Large 32k models into Perplexity AI, with users noting they may outperform existing models on specific tasks.
Frustrations Over Opus Daily Limits: Perplexity users are voicing frustration over a 50 queries per 24 hours cap on Opus, calling for greater transparency and lamenting perceived degradation in quality.
Billing Blues and API Queries: Users are expressing issues with billing, citing being charged despite expecting a free trial, and seeking the right channels for enterprise API discussions. Meanwhile, questions about single-turn conversation guidelines with online LLMs, Harpa configuration, and model accessibility on third-party platforms like make.com are stirring up technical curiosity.
Stability.ai (Stable Diffusion) Discord
Forge Forgets Functions: Trouble with SDXL and Forge UI is boiling over; users report issues with image previews and express concerns over the potential abandonment of Forge. Workarounds include delving into GitHub issues and tweaking startup flags like --no-gradio-queue
.
Release Radar - Stable Diffusion 3.0: The AI engineering community eagerly awaits the launch of Stable Diffusion 3, triggered by hints from a CivitAI newsletter pointing to an end-of-May release. Anticipation is mixed with skepticism about open weight availability and comparisons with Pony Diffusion V7, discussed in a Civitai article.
Cashing in on AI Art: Discussions on monetizing AI-generated art revealed that NSFW creators are outperforming SFW artists in marketplaces like Civitai. Brainstorming ensued on potentially lucrative trends such as AI girlfriend apps and a noted indifference towards fine-tuning efforts for models like Stable Cascade.
Toolbelt Expansion: Engineers swapped tips on AI model training tools beyond AUTOMATIC1111, spotlighting dreambooth and kohya_ss for custom training, while also contemplating the ethical quandary of using artist names in datasets.
Enigmatic Enquiries Enlighten: Inquisitive interactions ranged from exploring text-to-speech solutions to diving into model fine-tuning specifics. The discussion sometimes took a lighter turn with humorous comments about virtual âgraphics card downloadsâ and idle curiosity about Stable Diffusionâs ability to visualize without explicit prompts.
LM Studio Discord
A New Challenger for VRAM: Discussions underscore the importance of VRAM for LLM operations, with 16GB as the minimal baseline and aspiration for the 32GB VRAM club stirring excitement. The performance gains from using Nvidiaâs contemporary GPUs and the feasibility of models split across multiple cards, potentially streamlined by NVLink, were also key points.
LLM Leapfrog: The Meta-Llama-3-8B-Instruct-Q5_K_M.gguf model is earning praise for its performance on an M1 MacBook Pro. Users are advised to consider quantization types when running models to ensure compatibility with their hardware, and resources for local model deployment and instructions are deemed helpful, with pointers to tools like LM Studio and Groq API.
The Quirks of Model Behavior: Users encountered various version-related issues, such as phi-3 mini models outputting nonsense after an update to LM Studio version 0.2.21, and handling crashes in LM Studio since recent updates. Concerns about LLama 8b models rambling and the need to restrict reliance on integrated graphics for dedicated GPU utilization were also highlighted.
Bots, Books, and Bugs: Integrating Discord bots with LLM models for message retrieval and Wikipedia searches has gained traction. Meanwhile, navigating the capacity to run models like Stanfordâs Octopus v2 on mobile or PC devices surfaced as a complex issue, and LLama 3 models are suspected of âhallucinatingâ current event knowledge, given their lack of internet access.
ROCm Hiccups: Users battling with LM Studio ROCmâs limitations discovered that it doesnât support RX 6700, which provokes thoughts on HIP SDK compatibility and potential workarounds such as those implemented by KoboldAI. Additionally, a server error within the platform sparked dialogues, but no resolution was reported.
Nous Research AI Discord
-
Snowflake Arctic Unveils Cost-Efficient AI Solutions: The Snowflake AI Research Team launched Snowflake Arctic, an LLM aimed at providing cost-efficient enterprise AI solutions, amidst other less-contextualized YouTube video shares.
-
Intel and Logitech Augment AI Offerings: Intelâs CEO highlighted AIâs growth potential during their quarterly results, as shown in a YouTube video, while Logitech introduced an AI Prompt Builder for more fluent ChatGPT interactions, demo video available.
-
Emerging Trends in AI Quantization and Model Architectures: Hugging Face hosts binary-siglip-text and binary-siglip-vision, demonstrating efficient embeddings, with discussions also encompassing speculations around OpenAIâs naming schemes and the introduction of DeepSpeed FP6 quantization for improved throughput.
-
LLM Discussion: Performance Issues and Legal Confusion: Users report LLaMA-3âs EOS token generation issues, which link to stopping criteria solutions on GitHub, while Cohereâs licensing for command-r models stirs debates over commercial code usage, and frustrations are aired about a gpt2-chatbot, mistakenly associated with GPT-4 capabilities.
-
Data, Documentation, and Development through AI Community Collaboration: Technical contributions include generating multi-hop literature data, using pydantic models for ideation, and refining graph representations of LLM outputs. Annaâs Blog provided information on WorldCat data scraping and utilization in literature comprehension datasets.
-
Web and World Simulation Tools Garner Interest: The Nous Research community gears up for worldsim testing with free invites, and reveals experiences with various web simulation tools, such as companion-based AI, documented at websim example, and long conversations, indicating a growing interest in AIâs conversational stability potential.
HuggingFace Discord
-
Community Constructs Computer Vision Course: A new community-built computer vision course is live on HuggingFace, covering machine learning principles in the field using models from their ecosystem.
-
Model Showcase and Updates: The newly announced multilingual Qwen1.5-110B-Chat model supports a 32K context length and other improvements; its details can be found on its model page. Additionally, the link to the âQwen1.5-110Bâ model has been corrected and can now be accessed on HuggingFace and the associated blog post.
-
Creative Solutions and Collaborations Encouraged: Amidst various technical inquiries, members sought creative problem-solving ranging from undisclosed Gradio issues to LLM Performance optimizations based on hardware constraints, specifically mentioning 32 GB of RAM should suffice for many tasks. Thereâs also a push to identify and improve image classification or object recognition models for practical applications like pinball game scoring systems.
-
Model and Space Innovations Abound: Various models and spaces surfaced including a Sentence Transformer Model for semantic search tasks with a context length of 16,384 (BEE-spoke-data), and a Minecraft Skin Generator using a stable diffusion model (Stable Diffusion Finetuned Minecraft Skin Generator). The Instant Video space by KingNish leverages ByteDanceâs AnimateDiff Lightning model for quick text-to-video creation (Instant Video).
-
Explorations in Diffusion and AI Advertisement Detection: Participants exchange best practices for object generation with precision, incorporating tools like the IP-Adapter in diffuse models for enhanced image prompting, and addressing color consistency issues across platforms. Conversations also navigated toward evaluating YOLO classifiers for improved accuracy and performance in various applications.
OpenAI Discord
- ChatGPT Gets a Memory Upgrade: ChatGPT Plus users can now save conversational context using the newly introduced Memory feature, though availability is still limited, excluding users in Europe and Korea.
- Exploring AIâs Relation to Consciousness: The community engaged in intense debates over whether AI could exhibit consciousness, with discussions venturing into the philosophical domain, comparing AIâs experience of the temporal with continuous human consciousness, and the perception of self in neural networks.
- Model Comparisons Spark Discussions: Technical discussions emphasized the strengths and weaknesses of various AI models, with ChatGPT, Claude 3 Opus, and Gemini 1.5 being benchmarked, while acknowledging that while command-R Plus and Llama3-70b may fall behind GPT-4, they represent their own leaps in progress.
- Prompts as Competitive Sport: Members proposed the idea of prompt competitions, both paid and for play, to sharpen skills and enhance community engagement, highlighting the potential for emerging qualities in LLMs that cannot be predicted by simply scaling up smaller models.
- API Ups and Downs Noted: Engineers discussed various operational issues from rate-limits on custom GPT uses, backend errors at âhttps://chat.openai.com/backend-api/gizmos/â, to concerns about performance and availability of GPT-4âs features like memory and voice control.
Eleuther Discord
Exploring the Limits of Model Size: Engineers debate the effective cutoff for model parameters, seeking a point where further addition offers negligible returns. In a bid for efficiency, the criterion has shifted towards focusing on non-embedding parameters, potentially finding a sweet spot under 200 million.
Multilingual Hurdles in The Pile: The Pileâs dataset limitations were highlighted, indicating a lack of multilingual representation which might impact model training and performance, particularly in languages like German. Additionally, while comparing models like GPT-NeoX and Megatron, discussions centered on NeoXâs user-centric quality improvements.
Stability or Speed? The Model Serving Conundrum: Technical discussions have surfaced regarding discrepancies in model serving speeds, such as between Mixtral and Llama models at Fireworks.ai; considerations included batching size and hardware specifics as potential factors.
Refusalâs Single Neuronal Pointer: The AI Alignment Forum presented a discovery that refusal mechanisms in LLMs might hinge on a solitary direction within network layers. This spurred discussions about orthogonalization and fine-tuning possibilities for refusal behavior.
Pull Request Perils and Pipeline Woes: Members expressed concerns about CLA signing issues and failing checks on GitHub pull requests, with some conversations dwelling on the stagnation of specific branches. Questions were raised about the adaptability of evaluation prompts to different modelsâ finetuning needs, with suggestions for custom functions to handle diversity.
OpenRouter (Alex Atallah) Discord
-
Two-Step Price Hike for Soliloquy 8B: The Soliloquy 8B model transitioned to a paid usage model at $0.1 per 1M tokens, followed by a further increase to $0.2 per 1M tokens. The rates reflect OpenRouter LLCâs policy changes and are documented on the modelâs OpenRouter page.
-
Claudeâs Checkup: Users troubleshooting Claude models found that they max out at a generation of 4k tokens with a capability to read up to 200k tokens, and that proper API settings can optimize response. Relevant documentation can be found here.
-
WLM-2 Hosting Huddle: A detailed analysis of WLM-2 hosting costs led to the conclusion that profitability hinges on factors like GPU efficiency and the off-chance revenue from idle resources.
-
Quiet Arrival of FireLLaVA: FireLLaVA, an open multimodal model boasting swift initialization, has quietly entered the OpenRouter suite. Itâs a significant addition for developers given its non-proprietary nature and can be explored on OpenRouterâs page.
-
Frontend Frustrations Find Frugality: A quest for a budget-friendly frontend to allow family members to access OpenRouter services without individual OpenAI accounts inspired recommendations for using free-tier offerings like Vercel, or economical VPS like Contabo.
OpenAccess AI Collective (axolotl) Discord
-
WizardLM Stays Magical: Contrary to whispers, Microsoftâs WizardLM models have not vanished; rather, updates were made by the wizardlm team, ensuring continued public access to the repository.
-
The Fine Art of Model Fine-Tuning: Discussions contrasted fine-tuning domain-specific language models against using Retrieval-Augmented Generation (RAG), with references made to the medically-focused LLM paper and the usage of llama-pro methodology as seen in fsdp_qlora.
-
Quantization Quandaries and Tokenization Tactics: Considerable chatter surrounded tokenization challenges, requiring the latest fastchat formatter for models like LLaMA-3; meanwhile, the community grappled with understanding quantization methods like 4bit lora and 4bit qlora through discussions and a Twitter thread, revealing a sensitivity to quantization based on the extent of model training.
-
AIâs Need for Space and Speed: A stark reminder that Fast Fourier Transform (FFT) with zero3 could gobble up to 167GB of RAM, even on 2x24GB GPUs, setting off discussions on memory management techniques like torchtune and the perplexing observation of high disk space usage, as well as the utility of PEFT models for efficiency in fine-tuning neural networks.
-
GPU Scaling Secrets and FSDP Mechanics: The collective cornered the topic of GPU scaling, exchanging insights on the fine details of micro batch sizes, gradient aggregation, and the use of Fully Sharded Data Parallelism (FSDP) and ZeRO Stage 3 for model loading across GPUs - all critical for the effective use of hardware resources.
Modular (Mojo đ„) Discord
- Mojo Gets Modular: Modularâs standard library, modularml/mojo, saw a 23% increase in commits post open-sourcing, signaling heightened contribution activity.
- Multimodal Search Empowered by MAX: A blog post by Modular revealed the MAX Engine outshines both PyTorch eager and ONNX runtime in benchmarks, excelling in multimodal search involving textual and visual data.
- Modular Tweets Curated: Key tweets from Modular were highlighted, spanning updates and announcements, with links including Tweet 1, Tweet 2, Tweet 3, and Tweet 4.
- Advancements and Issues in Mojo Land: Key discussions covered converting Python to Mojo, memory allocation optimizations, and matrix slicing in Mojo. Importing challenges in the standard library were tackled, and nightly compiler updates continue to roll out, catching issues like file handle lifetime management.
- Performance Pursuits Proliferate: From investigations into dictionary performance to SIMD optimizations for error-correction algorithms, the community delved into efficiency enhancements. The compact-dict library was mentioned as a potential speed booster, and
__copyinit__
usage was debated, exemplified in a listed Gist.
LlamaIndex Discord
AWS and Llama Index Sit Down to Code: A workshop with AWS to demonstrate 3 patterns for LLM app development emphasizes data ingestion with S3 and embeddings with AWS Bedrock.
Security Spotlight on ML Podcast: The latest mlsecops podcast features the co-founder of Llama Index discussing LLM-based application futures and data security, including tools like LlamaParse and LlamaCloud.
RAG Under the Microscope: Marco Bertelliâs 9-part RAG tutorial series paves the road for any prototype to hit the production stage with a delineation of vital architectural components.
Multistep Quest for Improved RAG Reasoning: A methodology enhancing RAG involves a multi-hop retrieval process, combining Llama Index and Cohere reranking, which sharpens context awareness and minimizes hallucinations, as discussed in this post.
Remember All with memary: Unveiling memary, a long-term memory framework using knowledge graphs, which promises to expand memory capabilities in autonomous agents supplemented by LLMs, explained in this tweet.
OpenInterpreter Discord
Flask and Keys: An OpenInterpreter member encountered issues when running a Flask server and discussed workarounds like setting a dummy api_key
and modifying pydantic configurations to resolve namespace conflicts.
Hardware Hurdles Surmounted: The absence of Groq integration with OpenInterpreter prompted discussions, citing a pull request #1238 aimed at adding support. There were also questions around the use of devices like the Rabbit r1 with OpenInterpreter, focusing on the systemâs language and voice command capabilities.
Anticipating the Heavy: Eager anticipation bubbles around the so-called 01 Heavy device without concrete release details, while a custom 3D project for OpenInterpreter garners attention and a member cues in an upcoming discussion on the timeline for 01 Light.
Community Code Crusade: Members actively shared progress and assistance requests for projects associated with OpenInterpreter. This includes the llm-switcher, and potential Groq API implementations, encouraging community contributions.
Open AI Ethics Discourse: A conversation sparked around the ethical implications of AI abilities like file modification, particularly in reference to Microsoftâs capabilities, with the implicit suggestion that OpenInterpreter could be crafted to be more aligned with diverse user needs.
Latent Space Discord
Berkeley Benchmarks Function Call Skills: The Berkeley Function Calling Leaderboard serves as a new measure, periodically updating to benchmark how effectively Language Models (LLMs) call functions in real-world scenarios.
Laying Down the Law with LLM Limitations: An exploration into the confines of LLMs highlights their inability to prevent âgoal driftâ, with details provided in a Strangeloopcanon article, emphasizing areas for potential improvement.
Swyx Keeps the Pod Waves Flowing: A shout-out to a new podcast episode from swyxio
might capture the audienceâs interest; details shared via a tweet.
Elevating the Mix with Mixture of Depths: The new Expert Choice Routing transformer layer, which aims to achieve faster convergence and better longer sequence processing introduced in a paper, is stirring up discussions. For more in-depth information, engineers can take a look at the paper here.
Linux Video Sharing Level-Up: Vesktop appears to be the hot topic for Linux users seeking better video sharing experiences on Discord, with its performance and compatibility improvements detailed on the GitHub repository.
LAION Discord
-
LAIONâs Compute Conundrum: EU regulations are impeding LAIONâs ability to utilize public compute clusters, prompting researchers to shift their attention towards more active research communities with ongoing experimentation.
-
Terminus Group Draws in Diverse Experts: The Terminus Research Group, an informal collective, recently welcomed the âpixart guy,â signaling a trend of burgeoning communities rich in cross-disciplinary talent.
-
Pursuing the Aesthetics of AI: LAION-Aesthetics aims to quantify visual appeal using machine learning models, with their open-source code accessible on GitHub for public collaboration and use.
-
Quantization Conundrum Raises Eyebrows: Discord members examined a Reddit post on LLM benchmark inconsistencies across precision levels, casting the spotlight on the testing procedures and inherent unpredictability in LLM performances.
-
Token Generation Rate Talks: AI engineers discussed the token generation speeds on advanced GPUs for varying models and configurations, sharing that selecting effective tools like exllama and TabbyAPI can enhance overall performance.
-
VAST Interest Peaks Among Engineers: Members delved into the potential of the omni-modality foundation model and dataset, VAST, expressing interest in its capabilities by soliciting use-cases and tips for fine-tuning.
-
Emerging Research Stirs Excitement: A newly published research paper grabbed attention with its novel proposals for more efficient large model inference and layer management, sparking conversations on its practical applications.
-
Graph Integration into LLMs Explored: Inquires about amalgamating graph data structures with LLMs triggered exchanges on techniques and literature for enriching language models with non-sequential data.
-
Fine-Tuning Frustrations on Medical Mistral: Challenges in fine-tuning Mistral models for medical text generation surfaced, focusing on excessive sequence generation and the utility of padding protocols to assuage these issues.
-
Eleuther Expertise Exchange Encouraged: Members suggested consulting the Eleuther server for expert guidance in LLM fine-tuning, generating interest in this hub of specialized knowledge.
Cohere Discord
Engines Revving Up for AI-Enhanced Browsers: AI enthusiasts debated the merits of Tavily and Brave Search API as search engine tools for integration with AI, discussing price points and efficiency while addressing rate limitations Brave Search API Info and exploring Tavily API Info.
Cohere Toolkit Love: The community showed appreciation for Cohereâs open-source toolkit, benefiting from its prebuilt components to expedite the deployment of RAG applications Cohere Toolkit on GitHub.
Squashing Bugs and Deployment Dilemmas: Technical roadblocks such as sqlite3 errors when using cohere-toolkit locally and deployment challenges on Azure surfaced, with shared solutions found in various GitHub resources.
Customizing and Fine-Tuning Queries: Questions around the specifics of model fine-tuning and the boundaries of Cohereâs free trial API arose, prompting discussions of model availability and detailed terms.
Command-r Shines in Multi-Language Support: Command-râs effectiveness with non-English languages was acknowledged, plus inquiries into its commercial use specs sparked discussions, suggesting avenues through contacting Cohereâs sales team or using AWS Sagemaker.
tinygrad (George Hotz) Discord
-
Formula Flexibility in Tinygrad: Discussion around tinygrad focused on creating mathematical formulas through basic primitive operations and emphasizing the importance of constructing a dependency graph for efficient gradient calculations and hardware utilization in AI modeling.
-
Tinygradâs Dynamic Enhancements Await: Members shared excitement for the upcoming tinygrad 0.9 release, anticipating new features that could further improve AI model training and discussed ongoing work on handling dynamic testing and symbolic shapes to enhance operation flexibility.
-
Proposing a Learning Path for Tinygrad Enthusiasts: For those eager to dive into tinygradâs intricacies, members recommended starting with MicroGrad and MiniTorch, then proceeding through the tinygrad codebase. This aims to solidify foundational concepts for better contributions to tinygradâs development.
-
Kernel Optimization Insights: A member highlighted optimization techniques such as loop unrolling, while sharing detailed technical writeups and guides to understand the inner workings of tinygradâs kernel optimizations, particularly targeting AI performance boosts.
-
Hybrid Model Harmony Highlighted: There was mention of successful integration between tinygrad and PyTorch, utilizing
nn.module
to combine features of both frameworks into a hybrid model, demonstrating the potential synergy in AI tooling.
Interconnects (Nathan Lambert) Discord
Bold Moves for Newsletter Growth: Members weighed the pros and cons of cross-promoting with Semafor, debating potential audience growth against the risk of diminishing brand value with unwanted plugs.
Phi-3 and Arena Gather Steam, OLMo Training Insights Offered: Microsoftâs unveiling of Phi-3 and Arenaâs milestone of 800K votes sparked discussions, as did a seminar on Open Language Model training, which left the audience desiring deeper insights.
RLHF Nuances and Ghost Attentionâs Diminished Glow: Engineers dissected the nuanced performance of Reinforcement Learning from Human Feedback (RLHF), touched on KTOâs promise, and debated the fading significance of Ghost Attention, once thought to be crucial for maintaining long conversation consistency in LLaMA 2 models.
OpenELM Triumphs, Encouraging Progressive AI Ideals: Conversations centered around OpenELMâs performance surpassing OLMo, reflected on the communityâs development ethos, focusing on continuous improvement, and underscored the educational value of open models.
AGI - A Philosophical Conundrum: Thereâs an ongoing dialogue about the subjective nature of AGI, with members appreciating posts that ignite thoughtful considerations on the topic.
LangChain AI Discord
AI Integration Queries and Challenges: Engineers requested guidance on prompt integration and reported issues with AzureSearchVectorStoreRetriever being incompatible with async operations, hinting at possibly wrapping sync functions in async for compatibility. Thereâs also a confusion within the community regarding the Gemini 1.5 Pro model, clarifying that it works exclusively with VertexAI, as demonstrated with successful ChatVertexAI
implementations.
LLM Deployments and Observability Preferences: Discussions unfolded around different deployment approaches, including Hugging Face versus OpenAI API; security considerations were mentioned with respect to bypassing LangChain for direct SQL Server connections. There was also debate on effective observability tools for LLMs, like Arize Phoenix and Langfuze, highlighting a slight preference toward self-hosted options.
Galactic API Giveaway and AI Job-Hunters: GalaxyAI is providing free API access, boasting compatibility with premium models such as GPT-4 and GPT-3.5-turbo. Separately, a GitHub repository introduced Genai-Job-Agents, a Langchain/Langgraph-based agent for streamlining job searches and CV optimisation.
AI Tutorials Amass: A suite of tutorials surfaced, including âLocal RAG agent with LLaMA3 and Langchainâ and âLlama 3 Web Browsing Agent with Langchain and Groq,â addressing the design and implementation of RAG systems and web browsing capabilities. A captcha issue was flagged when trying to access a potentially useful Amazon book on NLP and LLMs, but the underlying material was not dismissed.
Reviving the RAG, Ride the Llama: Insights from sharing channels reveal advancements in Retrieval-Augmented Generation (RAG) implemented with LLaMA3, underpinning the creation of AI-driven web UI for applications, and interactive avatars for customer Q&As, expanding the horizons of interactive AI utilization across various platforms.
Mozilla AI Discord
-
Segmentation Fault in Llama: Engineers are facing a segmentation fault when running
llamafile
, especially on Modal Labs platforms while using files likePhi-3-mini-128k-instruct.F16.llamafile
. This issue has been widely reported among users attempting to integrate various llamafiles. -
Memory Reporting Woes in htop: A notable bug in htop misrepresents shared memory usage on Linux, which could affect how AI engineers perceive memory demands during intensive model operations.
-
Get Your Update to Llamafile v0.8.1: The release of llamafile v0.8.1 promises support for the Phi-3 Mini 4k, fixes GPU module crash issues, and provides bundled NVIDIA + AMD shared objects for Ubuntu, thus potentially smoothing out some persistent wrinkles for engineers.
-
Unraveling Quirks in LLM Output: Anomalous outputs with parentheses and line breaks have been observed by users operating LLMs like Llama3 70B and Mistral via
llamafile
, sparking conversations about the consistency and idiosyncrasies of model behaviors. -
Optimizing Llamafile for Peak Performance: Thereâs a shared interest in optimizing GPU usage with
llamafile
, where users exchanged tips on maximizing system RAM utility. Clarity is sought on identifying if a model runs on GPU or CPU, along with managing the llamafile-generated endless output.
AI Stack Devs (Yoko Li) Discord
AI Companion Radar: Faraday and Amica Catch the Eye: Faraday and Amica garnered attention for their position as AI companion apps that prioritize data privacy, where Faraday can operate locally thanks to llama.cpp, and Amica offers self-hosting and cloud services with enhanced features. Both apps introduce a new angle on AI relationships, promoting user privacy, with Faraday receiving a nod for its month-long performance and Amica as an emerging contender.
Bedtime Stories Win Big: Creative design with AI NPC characters by the participants of the Rosebud AI Sleep Game Jam led to notable entries, with Bedtime Negotiation standing out and winners announced via Twitter. A new game jam focusing on Education and AI is up next, with details available on Twitter.
A Town Called Addictive: AI Town was celebrated for its addictive quality in a Twitter post, inspiring ideas for a developer-centric simulation. LLM-powered NPC models and infrastructure enhancements were shared, with a repository on GitHub and a model hub on Huggingface, despite a broken API access link, and feedback was solicited for these NPC advancements.
Map Quest for AI Town: Debate on map handling for AI Town surfaced with suggestions ranging from using static assets to reduce bandwidth, to optimizing the original file reading method for maps. A YouTube tutorial titled â100% Local âAI Townâ with Llama 3 AGENTS!!!â was promoted, delivering a how-to for those eager to dive into their local setup.
Character Crafting Challenges: Dialogue around the development of NPC characters led to a promise for a detailed blog post. Discussions pinpointed the effort to compress model output, minimize model calls, and address issues found with generalist instruct-models like GPT-3.5 or Mistral.
DiscoResearch Discord
DiscoResearch Delves into Router Coefficient Mysteries: Engineers discuss inconsistencies in router_aux_loss_coef
between versions of Mixtral â 0.02 for Mixtral-8x7B-Instruct-v0.1 and 0.001 for Mixtral-8x22B-Instruct-v0.1 â suggesting the potential need for higher loss_coef
in smaller experts.
Initialization Inconsistencies Spark GPU Conversations: The DiscoLM_German_7b_v1 model encounters slow initiation times on HPCs compared to local machines; inference times improved from over 12 minutes to 10 seconds after loading the model to GPUs.
Speed Humps Ahead for Model Loading: Attempts to improve DiscoLM_German_7b_v1 load times using low_cpu_mem_usage=True
have failed, sparking suggestions that the model may be bottlenecked by slow storage drives.
Downloading German with Gusto: The gguf model reaches 1500 downloads in two days, showing a strong demand for German language models within the community.
Tokenizing for Chit-Chat: Questions arise about changes to tokenizer configurations in Phi-3 Llamafied german models intended for chat application optimization, while the newly created Phi-3 MoE model emerges for experiments needing further training.
Alignment Lab AI Discord
- AI Tackles Tough Topics: There was a discussion regarding the application of Llama 3 for assessing topic complexity with reports of effective outcomes. This indicates ongoing exploration into AI capabilities for content assessment.
Skunkworks AI Discord
Python Code Gen Breakthrough with CPU-Optimized LLMs: A new study presents CPU-optimized language models capable of generating Python code, suggesting a Chain-of-Thought prompt method to improve model outcomes, outlined in the paper âLow-Cost Language Models: Survey and Performance Evaluation on Python Code Generationâ.
Binary Quantization Buzz in HaystackDB: Discussions revolve around the HaystackDB repository potentially using 2bit embeddings, with further clarification that Binary Quantization assists in efficiency by creating smaller indexes for similarity searches.
Trouble Training LLaMA-3 to Finish Up: A member experienced issues with LLaMA-3 models during fine-tuning, as models are not generating the End Of Sentence (EOS) token, impacting model performance where completion is critical.
Snowflake Arctic Chills Enterprise AI Costs: A video introduced Snowflake Arctic, a large language model designed for enterprise applications focusing on cost-effective AI solutions for businesses.
RAG-nificent Demonstrations with LLaMA3: Tutorial videos were shared, showcasing the use of Retrieval-Augmented Generation (RAG) with LLaMA3 in local environments through Langchain, as well as a session on implementing web browsing with LLaMA 3, Langchain, and Groq hardware here.
LLM Perf Enthusiasts AI Discord
Gamma Seeking AI Engineer: Gamma, highlighted by a16z and boasting over 10 million users, is looking to hire an AI engineer for prompt engineering, evaluations, and fine-tuning of text and image models. The role is pivotal in their content creation tools expansion, and the company prides itself on its growth, achieved with minimal team size and substantial funding, indicating a robust business model and significant market impact.
Spot the AI Talent: Candidates can apply for the AI engineer position at Gamma, set in the heart of San Francisco with a requirement of on-site collaboration thrice a week. This opportunity is for those keen on pushing the boundaries of large language models (LLMs) and can be explored further at Gammaâs career page.
GPT Sleuthing: Speculation arose around gpt2-chatbot, which is suspected by some to be a leaked version of GPT-4.5, triggered by discussions around a tweet by @phill__1 regarding its sophisticated domain knowledge. Community members simply responded with enthusiasm, acknowledging the botâs quality.
A Tweet of Approval: The community expressed a succinct sentiment that the gpt2-chatbot is âgood,â suggesting a community consensus on the botâs impressive performance, which hints at its potential and future capabilities in the field.
Datasette - LLM (@SimonW) Discord
- Code-Gen Goes Custom: Discussion about enhancing code-generation included the idea of custom grammar implementation to prevent syntax errors, emphasizing a model-specific option that could improve semantic accuracy.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) â· #general (912 messagesđ„đ„đ„):
- Unsloth Supports Phi 3 Release: Phi 3 is now officially supported by Unsloth, offering 2x faster speed & 50% less memory usage. Users can find the detailed Colab notebook here.
- Unsloth Performance Enhancements: Phi 3 can be finetuned using 4-bit precision with the Unsloth framework, accommodating limitations on VRAM. Users are experimenting with various finetuning flows combining SFT, DPO, and ORPO to enhance model performance.
- Checkpoints Management in Finetuning: Users can create checkpoints during finetuning with Unsloth to save progress and avoid overfitting. To do so, one must modify training arguments accordingly and handle resumes from the desired checkpoints.
- Usage of Colab and Alternatives Dissected: Users discuss the limitations of Google Colabâs paid version due to runtime disconnections and explore alternative services like TensorDock that offer more affordable and reliable GPU access for model training.
- Technical Difficulties with GGUF Conversion: There are ongoing issues with converting models to GGUF format even when the Unsloth framework is used locally. Users are encouraged to upgrade Unsloth and possibly recompile llama.cpp to resolve quantization failures.
Links mentioned:
- Google Colaboratory: no description found
- Tweet from RomboDawg (@dudeman6790): My gift to the world. Train llama-3-8b on any dataset with 1,500 lines or less (about) with free google colab tier (all code provided in model card. Using (Unsloth + Galore + Qlora) Qalore if you will...
- Expanding Model Context and Creating Chat Models with a Single Click: no description found
- rombodawg/test_dataset_Codellama-3-8B · Hugging Face: no description found
- unsloth/llama-3-8b · Hugging Face: no description found
- Google Colaboratory: no description found
- Google Colaboratory: no description found
- generation_config.json · unsloth/llama-3-8b-Instruct-bnb-4bit at main: no description found
- How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across ...
- config.json · unsloth/llama-3-8b-Instruct-bnb-4bit at main: no description found
- The Office Pam Beesly GIF - The Office Pam Beesly How Would One Do That - Discover & Share GIFs: Click to view the GIF
- A Not Found Error Has Occurred! - TensorDock: A Not Found Error Has Occurred! - TensorDock. Deploy GPUs in seconds and save 80%. No contracts, no commitments. Secure and reliable. Easy with TensorFlow and PyTorch. Start with only $5.
- How to keep processes running after ending ssh session?: Let's say I launch a bunch of processes from a ssh session. Is it possible to terminate the ssh session while keeping those processes running on the remote machine?
- DiscoResearch/DiscoLM_German_7b_v1 · Hugging Face: no description found
- Google Colaboratory: no description found
- unsloth/Phi-3-mini-4k-instruct-bnb-4bit · Hugging Face: no description found
- Wow GIF - Wow - Discover & Share GIFs: Click to view the GIF
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- Support Unsloth AI on Ko-fi! â€ïž. ko-fi.com/unsloth: Support Unsloth AI On Ko-fi. Ko-fi lets you support the people and causes you love with small donations
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- no title found: no description found
- The PC Reborn â Introducing Snapdragon X Plus: The PC Reborn: Introducing Snapdragon X Plus, the newest platform within the Snapdragon X series.Equipped with cutting-edge technologies to deliver powerful ...
- LLAMA-3 đŠ: EASIET WAY To FINE-TUNE ON YOUR DATA đ: Learn how to fine-tune the latest llama3 on your own data with Unsloth. đŠŸ Discord: https://discord.com/invite/t4eYQRUcXBâ Buy me a Coffee: https://ko-fi.com...
- GitHub - PKU-YuanGroup/Machine-Mindset: An MBTI Exploration of Large Language Models: An MBTI Exploration of Large Language Models. Contribute to PKU-YuanGroup/Machine-Mindset development by creating an account on GitHub.
- GitHub - unslothai/hyperlearn: 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.: 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old. - unslothai/hyperlearn
- Is Success Luck or Hard Work?: In a competitive world, tiny advantages can make all the difference. Get 10% off Snatoms with code 'giveluck' in the US: https://ve42.co/USA or International...
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- How LangChain and ChatGPT plugins are getting attacked by this bug: Insecure Output Handling on LLMs deals with injecting poisonous data during the training phase. In this article, we will be focusing on real-world scenarios, practical demos, and prevention mechanisms...
- botbot-ai/CabraLlama3-8b at main: no description found
- arthrod/cicerocabra at main: no description found
- schedulefree optimizers by winglian · Pull Request #30079 · huggingface/transformers: What does this PR do? integrates meta's https://github.com/facebookresearch/schedule_free for adamw & sgd https://twitter.com/aaron_defazio/status/1776320004465582331 Before submitting This ...
- GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
- runtime is less than 10 hours for colab pro + User · Issue #3451 · googlecolab/colabtools: I am a google colab pro + user. I could run my work for 24 continuous hours in January 2023. However, since the beginning of February, my job times out after running for less than 10 hours. Althoug...
- Tutorial: How to convert HuggingFace model to GGUF format · ggerganov/llama.cpp · Discussion #2948: Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/ I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it...
- llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...
Unsloth AI (Daniel Han) â· #random (55 messagesđ„đ„):
-
Dataset Combination Hack: A conversation suggests merging raw text and chat datasets to improve results, hinting at a potential approach for fine-tuning models.
-
Notebook and Fine-tuning Tips Revealed: The Unsloth AI community shares a repository link with notebooks for fine-tuning language models, along with a specific Colab notebook for text completion tasks.
-
Colab Out of Memory (OOM) Solutions: A helpful snippet of code was shared to alleviate Colabâs OOM issues, suggesting the use of
torch.cuda.empty_cache()
andgc.collect()
in a loop. -
Peer-to-Peer Sharing Promoted: A user announces the creation of an open community to discuss the latest in Multimodal AI, providing a link to follow them on various social platforms.
-
Support for New Model in Unsloth AI: There is excitement about the Phi 3 model being now supported, as revealed by a user who provided a link to a Discord channel for a relevant Colab (link not accessible outside Discord).
Links mentioned:
- Out of memory - Wikipedia: no description found
- Google Colaboratory: no description found
- Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data: Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy re...
- OpenMultiModal: Community to explore and collaborate on multimodal AI
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Unsloth AI (Daniel Han) â· #help (506 messagesđ„đ„đ„):
-
Troubleshooting Compilation Issues: Users discussed errors while compiling code, specifically mentioning llama.cpp not being in the correct folder and successfully resolving their issue by following the correct installation instructions.
-
Support Queries and Update Requests: Discussions about Unsloth AIâs support for different models such as Llava and Qwen models revealed that they are not currently supported. Users suggested improvements like a feature to truncate from a specific part of chat templates. Updates were made to Colab notebook installations instructions following an xformers update.
-
Dataset Format and Fine-Tuning Inquiry: A user sought clarification on whether their dataset format is correct for fine-tuning and which exact Llama 3 model from Unsloth should be used for training with code. It was clarified that a larger dataset is suitable for the base model, while smaller datasets go well with instruct models.
-
GPU Usage for Unsloth Pro: A user queried about the benefits of Unsloth Pro with one or more RTX 4090 GPUs. They were informed that the benefits are multiplied with the additional GPUs.
-
Duplicate Python Installation Issues: Discussions highlighted issues with installations, including the case where a user had two Python versions installed, causing dependency issues. This was resolved by adjusting the Python version and removing the older one.
-
Finetuning Llama with Code: Questions about finetuning Llama 3 proceeded with guidance given for a user who wanted to finetune Llama with Svelte code. They were advised on using the base model and its distinctions from the instruct variant.
Links mentioned:
- Google Colaboratory: no description found
- Google Colaboratory: no description found
- Ollama: Get up and running with large language models.
- Google Colaboratory: no description found
- Google Colaboratory: no description found
- Google Colaboratory: no description found
- Docker: no description found
- xtuner/llava-llama-3-8b-v1_1 · Hugging Face: no description found
- no title found: no description found
- Load: no description found
- Models: no description found
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- Quantization: no description found
- Unsloth AI | Finetune Llama 3 & Mistral LLMs: Unslow finetuning for AI and LLMs. Get faster with Unsloth. Open-source.
- Qwen/CodeQwen1.5-7B-Chat · Hugging Face: no description found
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- I got unsloth running in native windows. · Issue #210 · unslothai/unsloth: I got unsloth running in native windows, (no wsl). You need visual studio 2022 c++ compiler, triton, and deepspeed. I have a full tutorial on installing it, I would write it all here but Iâm on mob...
- GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models.: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
- GitHub - janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM): Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM) - janhq/jan
- Conda installation detailed instructions · Issue #73 · unslothai/unsloth: I'm trying to follow the instructions for installing unsloth in a conda environment, the problem is that the conda gets stuck when running the install lines. I've tried running it twice, both ...
- GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
Unsloth AI (Daniel Han) â· #showcase (74 messagesđ„đ„):
-
Unveiling Kolibrify for Curriculum Learning: Kolibrify, a project designed for curriculum training of instruction-following LLMs with Unsloth, has been shared. Itâs described as useful for LLM fine-tuning and rapid prototyping.
-
Thermostatic Releases Bilingual Translation Model: A new version of Thermostaticâs bidirectional English-Spanish translation model, NeuralTranslate_v0.2_GGUF, has been published, which is said to maintain Mistralâs native reasoning capabilities and doesnât contain overfitting.
-
Scoped Skilled Agents in AIâs Future: @timelordraps predicts a 6-month roadmap where AI advancements will see highly capable small models, token-efficient pre-training, self-expanding and self-spawning subagents, leading to recursive self-improvement by November.
-
Token-Efficient Clone Project Underway: @timelordraps is optimizing a devin clone for token efficiency and is currently troubleshooting it for a simple snake game, with plans to test on other use cases and integrate with image models.
-
Llama Community Hub Announced: The newly launched llama-hub serves as a community platform for sharing and discussing models and use cases involving llama models. The official Unsloth llama-3-8b-bnb-4bit has been posted for community access.
Links mentioned:
- no title found: no description found
- winglian/llama-3-8b-256k-PoSE · Hugging Face: no description found
- Thermostatic/NeuralTranslate_v0.2_GGUF · Hugging Face: no description found
- xtuner/llava-phi-3-mini · Hugging Face: no description found
- vonjack/Phi-3-mini-4k-instruct-LLaMAfied at main: no description found
- GitHub - oKatanaaa/kolibrify: Curriculum training of instruction-following LLMs with Unsloth: Curriculum training of instruction-following LLMs with Unsloth - oKatanaaa/kolibrify
- GitHub - TimeLordRaps/timelord: Save you time.: Save you time. Contribute to TimeLordRaps/timelord development by creating an account on GitHub.
Unsloth AI (Daniel Han) â· #suggestions (119 messagesđ„đ„):
-
Enhancing Unslothâs Autotuning: A user suggested that Unsloth AI should automatically optimize values like batch size and learning rate based on model and dataset specifics. Another member humorously proposed that Unsloth should also bake a cake post-training, which aligns with it being on the roadmap, while a third person shared thoughts on implementation.
-
Manual Layer Pruning Debate: The conversation covered the intricacies of manually pruning layers in models, with one user suggesting replacing the
forward
method to âskipâ parts of layers. There was an extended discussion on whether to remove entire decoder blocks or focus on Matrix Linear Projection (MLP) components for SNR (Signal-to-Noise Ratio) optimization, with different strategies for minimizing model size and VRAM footprint touched upon. -
VRAM Reduction Strategies and Offloading: The dialogue shifted to strategies for reducing model sizes, particularly in terms of VRAM usage. A user mentioned a successful inference memory reduction technique by offloading parts of language models and shared their experience integrating this approach into a Github repository (https://github.com/oKatanaaa/kolibrify/blob/7165ebbbcc8c44a6960ccfe78aa2d740a93789bd/kolibrify/model_utils.py).
-
Gemma 2b Model Compatibility with Unsloth: A fan of Unsloth inquired about the compatibility of the Recurrent Gemma 2b model with Unsloth, and a member recognized the potential benefits, but indicated that thereâs a known VRAM issue with Gemma 2b, and that the focus is currently on Phi 3. Another mentioned a unique VRAM issue experienced by only one person, but with no widespread reports.
-
Potential Feature or Bug with Gemma 2b: Clarification was sought about whether Gemma 2b has a feature that causes VRAM issues or a bug. It was explained that while the model still works, the VRAM issue needs to be resolved; however, not everyone has encountered this problem, and it may be an isolated case.
Links mentioned:
- How to use TensorBoard with PyTorch â PyTorch Tutorials 2.3.0+cu121 documentation: no description found
- Text classification: no description found
- trl/trl/trainer/laserm_trainer.py at evol_laser_merge_trainer · l4b4r4b4b4/trl: Train transformer language models with reinforcement learning. - l4b4r4b4b4/trl
- kolibrify/kolibrify/model_utils.py at 7165ebbbcc8c44a6960ccfe78aa2d740a93789bd · oKatanaaa/kolibrify: Curriculum training of instruction-following LLMs with Unsloth - oKatanaaa/kolibrify
CUDA MODE â· #general (18 messagesđ„):
- Countdown to CUDA Lecture: The next CUDA Mode lecture was announced to be taking place in 1 hour and 40 minutes, with excitement building as the llm.cpp team was said to be discussing, anticipated to be very hype.
- Java Jolt for Cognition: A member expressed readiness for the upcoming lecture with coffee brewing in preparation.
- Announcing Live CUDA Profiling Session: Todayâs session was moved to Google Meet with this link, and despite minor hiccups on Discord, the live profiling lecture was well-received, and a trimmed version was promised for the YouTube channel.
- Exploring a Broader Hardware Discussion: There was a proposal for creating discussions for Huawei Ascend solutions to promote more diverse hardware conversations, considering the current dominance of NVIDIA and AMD. The idea is under consideration for community interest and activity.
- Innovation on a Dime: A fascinating project was shared where neural networks were implemented on a 10-cent RISC-V MCU without a multiplier, showcasing an example of making powerful technology accessible at minimal costs. The full blog post and a repository with detailed documentation are available at cpldcpuâs blog and GitHub.
Links mentioned:
- Implementing Neural Networks on the â10-centâ RISC-V MCU without Multiplier: I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that IâŠ
- Implementing Neural Networks on the â10-centâ RISC-V MCU without Multiplier: I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that IâŠ
CUDA MODE â· #triton (10 messagesđ„):
- Triton Tensor Indexing Explained: A method for indexing into a Triton tensor with another was shared, involving loading values from the indices tensor and using them with the strides and base pointer to create a tensor of pointers, then applying
tl.load()
andtl.store()
for the desired result. - In Search of Open Source Triton LLM Implementations: A member was looking for open-source Triton implementations for large language models (LLMs) like llama or mistral. Another member referenced an unsloth repository on GitHub which could potentially suit their needs.
- Exploring Efficient Gradient Calculation with Triton: A query was raised about calculating the gradient of a tensor by utilizing parallel threads in Triton and sum reducing along a dimension, with code snippets being shared to illustrate the current and proposed methods.
- Repositories with Required Triton Kernels Highlighted: In a discussion about the existence of full model implementations using Triton kernels for large language models, several resources were mentioned, including the xformers repository and the flash-attention repository.
- PyTorch Modules in Triton Shared: A member suggested the attorch repository as a potentially useful set of PyTorchâs neural network modules written in Python using Triton.
Links mentioned:
- GitHub - BobMcDear/attorch: A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.: A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton. - BobMcDear/attorch
- xformers/xformers/triton at main · facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers
- flash-attention/flash_attn/ops at main · Dao-AILab/flash-attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
CUDA MODE â· #cuda (40 messagesđ„):
-
Kernel Profiling Enigma: Profiling the tiled_matmult kernel vs. coarsed_matmult kernel from PMPP showed an unexpected minimal FLOP/s difference despite the latter having higher arithmetic intensity. It was suggested to look at instruction stats, particularly the stall short scoreboard, which is linked to SRAM ops and could be affecting memory bandwidth.
-
CUDA KERNEL Performance Tips: When optimizing CUDA kernels, members advised looking at warp state stats and instructed to load multiple values from SRAM into registers to perform multiple multiplications, thus improving SRAM utilization.
-
Learning CUDA Without Breaking the Bank: Discussion on acquiring GPU access for CUDA learning ranged from utilizing company/university resources to utilizing services like Google Colab and Lightning AI. Members emphasized the importance of having control over the environment, particularly for profiling with performance counters.
-
Emerging FP6 Data Type in CUDA Development: A DeepSpeed commit on GitHub introduced a new data type called FP6 with Tensor Core support on A100 GPUs, potentially improving the serving of Large Language Models (LLMs) and addressing the memory limitation challenges during inferencing.
-
Debating Best Practices in CUDA Programming: Queries about CUDA coding practices were addressed, including whether integer division should be avoided in kernel code. One suggestion was to utilize bit shifts for divisions by powers of two, with the observation that the nvcc or ptxas should optimize this automatically.
Links mentioned:
- Compiler Explorer - CUDA C++ (NVCC 11.7.0): #include <algorithm> #include <cassert> #include <cstdio> #include <cstdlib> __global__ void sgemmVectorize(int M, int N, int K, float alpha, f...
- Lecture 3: Getting Started With CUDA for Python Programmers: Recording on Jeremy's YouTube https://www.youtube.com/watch?v=nOxKexn3iBoSupplementary Content: https://github.com/cuda-mode/lecture2/tree/main/lecture3Speak...
- Google Colaboratory: no description found
- Tweet from Nicolas Mejia Petit (@mejia_petit): Why isnât everyone talking about this??? Deepspeed devs literally just created a datatype FP6 with full tensor core support on the a100âs. (Since nvidia left us stranded with int4/8) It is SO smart...
- FP6 quantization end-to-end. (#5234) · microsoft/DeepSpeed@ccfdb84: The user interface: https://github.com/microsoft/DeepSpeed-MII/pull/433 nv-a6000 ci running against the MII branch linked above is [here](https://github.com/microsoft/DeepSpeed/actions/runs/81921...
CUDA MODE â· #torch (10 messagesđ„):
-
PyTorch Team at ASPLOS: The PyTorch team will be presenting a tutorial at ASPLOS, an announcement was made with the details provided via a Twitter link.
-
Flash-Attention Update Alert: Tri Daoâs new flash-attn 2.5.8 has been released and confirmed to be compatible with PyTorch 2.3.0. Sources include the projectâs GitHub and PyPI pages.
-
Query on flash-attn Installation: A discussion was raised regarding flash-attnâs pip install option that doesnât require a local CUDA build and why this isnât the default. There was curiosity about the potential speed differences between pre-built binaries and those locally built.
-
Under the Hood of
torch.compile
: Discussion on the differences betweentorch.matmul
,@
, andtorch.nn.functional.linear
when used withtorch.compile
, referencing the gpt-fast blog post. The suggestion made to understand the differences was looking into the TORCH_LOGS output. -
PyTorch Profiler Puzzles: A question was posed about why PyTorch sometimes launches 2 kernels during matrix multiplication, as observed by the profiler, inviting insights or theories regarding this behavior.
Links mentioned:
- GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
- flash-attn: Flash Attention: Fast and Memory-Efficient Exact Attention
CUDA MODE â· #announcements (1 messages):
- Boost in Code Clarity and Performance: NVIDIAâs C++ team is set to discuss porting llm.c to llm.cpp, promising cleaner and faster code. An exciting bonus talk is starting shortly for the community.
CUDA MODE â· #algorithms (54 messagesđ„):
-
Trinary Nets Seek Efficient Matmul: A member initiated brainstorming on performing matrix multiplication (matmul) with trinary nets using packed int64 to handle 32 2-bit trinary values without unpacking. They posited that a masked multiply approach could avoid the computational and memory expenses associated with unpacking, yet actual implementation details and benefits remain theoretical.
-
Packing Unpacking in CUDA: Another conversation focused on optimizations for working with packed values; one member pointed to executing pack and unpack operations in a fused CUDA kernel as more cost-effective, but concerns were raised about the usability and complexity of this approach.
-
Exploration of Alternative Methods to Unpacking: Members discussed creating row operations that operate on integers directly, without unpacking, which might reduce the number of operations required.
-
Fused Kernels for Performance: There was agreement that while kernel fusion may not reduce the cost of operations, it can significantly decrease overhead by reducing memory read/copies. The conversation evolved into a discussion on the technical feasibility and potential computational efficiency gains of such methods.
-
FlashAttentionâs Inner Workings Exposed: A member shared insights into the FlashAttention repository, indicating that
kernel_traits.h
is a core component for setting traits in CUDA, which are later utilized in FlashAttention. They linked a Colfax research post discussing FP8 and layout conformance enhancements in FlashAttention on the NVIDIA Hopperâą architecture.
Links mentioned:
- Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2: We recently released an update to our FlashAttention-2 forward pass implementation on NVIDIA Hopperâą architecture that incorporates a number of new optimizations and improvements, including âŠ
- GitHub - catid/bitnet_cpu: Experiments with BitNet inference on CPU: Experiments with BitNet inference on CPU. Contribute to catid/bitnet_cpu development by creating an account on GitHub.
- flash-attention/csrc/flash_attn/src/kernel_traits.h at main · Dao-AILab/flash-attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
CUDA MODE â· #jobs (1 messages):
-
InstaDeep is Hiring Machine Learning Engineers: InstaDeep Research is looking for Machine Learning Engineers who are passionate about high performance ML Engineering and making a real-world impact. The role involves working with Bio AI, Decision Making AI, and technologies like custom CUDA kernels, SOTA model architectures, Quantisation and Distributed Training. Join the InstaDeep journey here.
-
Cultivate Innovation at InstaDeep: InstaDeep promises a cohesive and stimulating work environment for tech enthusiasts to contribute to impactful decision-making and technology products across industries. Internship opportunities can also be explored here.
-
InstaDeep Application Advice: Applicants can apply for multiple jobs at InstaDeep, but it is advised to limit applications to two closely linked positions that match their skills and qualifications.
-
Reapplying to InstaDeep: Those who have previously applied to InstaDeep and werenât selected may consider reapplying if it has been more than six months since their last application.
Link mentioned: Job Offer | InstaDeep - Decision-Making AI For The Enterprise: no description found
CUDA MODE â· #beginner (12 messagesđ„):
- NVIDIA GPU on Laptops for CUDA: Itâs generally viewed as acceptable to use a laptop with an NVIDIA GPU for learning and testing CUDA code, but not recommended for actual model training.
- Seeking NCCL All-Reduce Resources: A member is in search of a good tutorial for learning NCCL to implement an all-reduce kernel, but has not yet received suggestions.
- Jetson Nano for CUDA Learning: For those interested in learning CUDA, a Jetson Nano is recommended as a useful tool, especially when coupled with a spare monitor.
- Resolving nvcc_plugin ModuleNotFoundError: A member following a GitHub tutorial encountered a âModuleNotFoundErrorâ for ânvcc_pluginâ when using
%load_ext nvcc_plugin
. The solution involved skipping the step and using%%writefile
to compile instead. - AMD GPU Performance Inquiry: A member contemplating an upgrade from dual MI100 to MI210 asked for comparative BF16 performance insights, being redirected to a channel potentially more focused on AMD resources.
CUDA MODE â· #youtube-recordings (2 messages):
- CUDA C++ Deep Dive Awaits: A YouTube video titled âBonus Lecture: CUDA C++ llm.cppâ has been shared, offering insights into CUDA C++. The description includes a link to slides on Google Drive.
- Slated for Later Release: The slides and code accompanying the CUDA C++ lecture are currently not available.
Link mentioned: Bonus Lecture: CUDA C++ llm.cpp: Slides: https://drive.google.com/drive/folders/1T-t0d_u0Xu8w_-1E5kAwmXNfF72x-HTA?usp=sharing
CUDA MODE â· #torchao (1 messages):
-
CUDA Extension Support Arrives in AO: Custom CUDA extension support has been integrated into torchao, as noted by a member with a PR link. The integration allows developers to follow a template to ensure their kernel works seamlessly with
torch.compile
. -
AO Seeks Community Contributions: For developers passionate about writing CUDA kernels but dislike the packaging process, contribution to torchao is now open, especially for kernels optimized for consumer GPUs.
Link mentioned: Custom CUDA extensions by msaroufim · Pull Request #135 · pytorch/ao: This is the mergaeble version of #130 - some updates I have to make Add a skip test unless pytorch 2.4+ is used and Add a skip test if cuda is not available Add ninja to dev dependencies LocallâŠ
CUDA MODE â· #ring-attention (2 messages):
- Pushing the Limits of Context Length in LLMs: An article from harmdevries.com highlights a trend of increasing context length in Large Language Models (LLMs), reaching up to 65K tokens, with innovations like FlashAttention playing a significant role by removing GPU memory bottlenecks.
- The Rise of Long-Context LLMs: Many cutting-edge long-context LLMs are found to be finetuned versions of base models with shorter context lengths; one such example is the Yarn-Llama-2-7B-128k model, which boasts a 128K token context length.
Link mentioned: In the long (context) run | Harm de Vries: Itâs not the quadratic attention; itâs the lack of long pre-training data
CUDA MODE â· #off-topic (4 messages):
- Chill Vibes with âCritical Stopâ: A Discord member shared a YouTube video titled âCritical Stop,â an auto-generated track by Creatune released on March 23, 2024, provided by DistroKid.
- Keygen Music Nostalgia: Another YouTube video was shared, titled âDead Feelings - CORE - Power ISO 3.1kg Keygen Music,â bringing some classic keygen music to the chat.
- Evolving Cars Through a Genetic Algorithm: An intriguing web-based simulation, Genetic Cars 2, was posted, where a genetic algorithm evolves random two-wheeled shapes into cars over generations.
- Musical Algorithm Rule #9: The âBad apple on everythingâ YouTube playlist was linked, demonstrating the versatility of the âBad Appleâ tune played on various devices, based on Rule #9: if it exists, thereâs a âBad Appleâ version.
Links mentioned:
- HTML5 Genetic Algorithm 2D Car Thingy - Chrome recommended: no description found
- Critical Stop: Provided to YouTube by DistroKidCritical Stop · CreatuneCritical Stopâ Creatune MusicReleased on: 2024-03-23Auto-generated by YouTube.
- Dead Feelings - CORE - Power ISO 3.1kg Keygen Music: Not minebelongs to JimWalshified apparently original @ http://www.youtube.com/watch?v=-Cc09YsWDQs
- Bad apple on everything: Rule #9 - if it exists, play Bad Apple on it
CUDA MODE â· #llmdotc (714 messagesđ„đ„đ„):
- FP16 vs BF16 Training Potentials: Discussions revolved around the feasibility of training models in FP16 without gradient scaling, with speculation that it might work as well as BF16. A link to research on FP8 training without scaling was shared as a possible analogous strategy.
- Full BF16 Including Layernorms Merged: A PR was merged with full BF16 support, including layernorms, potentially simplifying code but requiring file version incrementation for proper model file handling.
- Data Type Loading and Memory Access Optimizations: Extensive discussion on better vectorization of memory loads and stores in CUDA kernels, considering the usage of templates and specialized load/store instructions like
__ldcs
for streaming access to memory. - Delete Use of Cooperative Groups: A discussion was had around removing cooperative groups (
cg
) from the codebase to ease cross-platform compatibility and reduce dependencies, even though they are part of CUDA. - Performance Gains and Future Model Scaling: It was noted that the current version of
train_gpt2cu
now surpasses both PyTorch and optimized flashattention in token processing speed, indicating readiness for scaling models up to the size of gpt-large.
Links mentioned:
- FP8-LM: Training FP8 Large Language Models: In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM traini...
- cuda::associate_access_property: CUDA C++ Core Libraries
- cuda::memcpy_async: CUDA C++ Core Libraries
- Dumbledore GIF - Memory No Memory Where Am I - Discover & Share GIFs: Click to view the GIF
- Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
- Compiler Explorer - CUDA C++ (NVCC 12.3.1): #include <cuda_fp16.h> template<class ElementType> struct alignas(16) Packed128 { __device__ __forceinline__ Packed128() = default; __device__ __forceinline__ exp...
- Example for the dtype change for gelu kernels by ChrisDryden · Pull Request #250 · karpathy/llm.c: By changing the type of data that is being read from memory, in a single memory operation it is possible to read up to 128 bits of data. For memory constrained kernels it is beneficial to wrap all ...
- delete use of cooperative groups in kernels · Issue #292 · karpathy/llm.c: We use a lot of cooperative groups functionality in our kernels. This is an additional dependency that is likely mildly convenient, but it is also likely that the code could be written without them...
- karpath - Overview: GitHub is where karpath builds software.
- ka - Overview: :). ka has 3 repositories available. Follow their code on GitHub.
- Log in: no description found
- as promised, cleanup enabled by padding :) by ngc92 · Pull Request #280 · karpathy/llm.c: had to fix a hidden bug in the cublasLt version, but now it works
- llm.c/train_gpt2_fp32.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- yet another gelu by ngc92 · Pull Request #293 · karpathy/llm.c: more complicated Packet128 for cleaner kernels
- Remove FloatN & simplify adam/reduce with BF16 LayerNorms by ademeure · Pull Request #295 · karpathy/llm.c: The MULTI_GPU path is untested, but everything else seems to work fine. I kept the per-tensor "param_sizeof" as it's used in test_gpt2.cu for example, it's not much code and may be u...
- GitHub - graphcore-research/out-of-the-box-fp8-training: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8. - graphcore-research/out-of-the-box-fp8-training
- clang-tidy by ngc92 · Pull Request #270 · karpathy/llm.c: Adds a clang-tidy file and clang-tidy target to the make file. Since the .cu files are in flux right now, this is just looking at gpt2.c I'm not quite sure which checks we should enable, but I t...
- llm.c/train_gpt2_fp32.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- float4 with better vectorization for encoder_forward.cu by lancerts · Pull Request #274 · karpathy/llm.c: On RTX 3070 Kernel 2 block_size 32 | time 0.2933 ms | bandwidth 343.26 GB/s block_size 64 | time 0.2099 ms | bandwidth 479.50 GB/s block_size 128 | time 0.1924 ms | bandwidth 523.24 GB/s block...
- Example for the dtype change for gelu kernels by ChrisDryden · Pull Request #250 · karpathy/llm.c: By changing the type of data that is being read from memory, in a single memory operation it is possible to read up to 128 bits of data. For memory constrained kernels it is beneficial to wrap all ...
- Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
- Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
- load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions by karpathy · Pull Request #265 · karpathy/llm.c: Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the los...
- Enable multithreading in nvcc by ChrisDryden · Pull Request #269 · karpathy/llm.c: Tested locally and reduced compilation time by 200ms, unfortunately for me upgrading to 12.4 made my compilations times slow by 2x but at least this can make it a bit faster
- llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions by karpathy · Pull Request #265 · karpathy/llm.c: Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the los...
- Full BF16 including layernorms by default (minimising number of BF16 atomics) by ademeure · Pull Request #272 · karpathy/llm.c: I added 4 different new versions of layernorm_backward_kernel, performance is best for: Kernel 4 (using atomicCAS, no scratch, but rounding many times so probably worse numerical accuracy Kernel 6...
- fp16 buffers for ADAM by ngc92 · Pull Request #289 · karpathy/llm.c: First proof-of-concept implementation
- enable padding in model export/import for nicer shapes by ngc92 · Pull Request #264 · karpathy/llm.c: a new attempt at this. Less ugliness on the C side because we just pad from python.
- C++ Language Extensions â HIP 6.1.0 Documentation: no description found
- llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- llm.c/dev/cuda/classifier_fused.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
CUDA MODE â· #rocm (19 messagesđ„):
- AMD Instinct MI300X Gains Attention: The AMD Instinct MI300X is highlighted as a significant product for professional server purposes, with an official product page and discussions about its future availability.
- Exploring ROCm and AMD vs NVIDIA Rivalries: The channel discusses George Hotzâs opinions and predicaments related to AMD and NVIDIA, including his thoughts on AMDâs performance and strategic decisions. The drama can be followed on the tinygrad page.
- Seeking ROCm Community Expertise: A new member requests an introduction to ROCm HIP and expresses interest in a community-driven discussion about AMDâs vision and options available for developers new to AMDâs ecosystem.
- Comparing AMD and NVIDIA Offerings: Community members compare the last PCIe card by AMD, the Instinct MI210, to high-end consumer graphics cards, noting significant price differences with NVIDIAâs counterparts, such as the RTX 4090.
- Evolving AMD Windows Compatibility and RDNA4 Hopes: There is a positive reaction to AMD adding Windows build tests to their repositories, as well as anticipation for the next-generation RDNA4 announcement at Computex.
Links mentioned:
- tinygrad: A simple and powerful neural network framework: no description found
- Rent AMD GPUs On-Demand: no description found
- GitHub - nktice/AMD-AI: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04 - GitHub - nktice/AMD-AI: AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 22.04 / 23.04
- AMD Radeon Instinct MI210 Specs: AMD Aldebaran, 1700 MHz, 6656 Cores, 416 TMUs, 0 ROPs, 65536 MB HBM2e, 1600 MHz, 4096 bit
CUDA MODE â· #oneapi (22 messagesđ„):
-
Intelâs oneAPI: A Unified Programming Model: The discussion highlights Intelâs oneAPI as a heterogenous compute platform capable of supporting CPUs, GPUs, and FPGAs, illustrated by Intelâs official article on oneAPI. oneAPI caters to developers with the promise of a unified programming model across various hardware.
-
Cross-Vendor GPU Support with oneAPI: Codeplayâs release of plugins for oneAPI marks a significant step, allowing developers to use SYCLâą code for Nvidia and AMD GPUs. The announcement and a tutorial video on YouTube provide insights and resources for interested developers.
-
oneAPI Ecosystem Expands Across Major Frameworks and Tools: Developers can discover numerous oneAPI resources and libraries such as oneDNN, integrations with PyTorch and TensorFlow, and performance extensions for Scikit-learn, showcased on GitHub. For a broader scope, Intelâs oneAPI toolkit is said to support Appleâs ARM M1/M2/M3 and FPGAs, according to (oneAPI Toolkits page).
-
Codeplayâs Commitment to Compute Universality: A guide for running SYCLâą applications on NVIDIAÂź GPUs and a reference silicon example for a RISC-V-based accelerator platform (Overview Reference Silicon) indicate the strides Codeplay is making in universality.
-
Intel Prepares for Next-Generation GPUs: In the chat, members express anticipation for Intelâs upcoming Battlemage GPU line-up, with reports of potentially having 12Gb of VRAM, which sparks a conversation about its suitability for AI-related tasks.
Links mentioned:
- Tweet from Intel Extension For PyTorch Now Officially Supports Arc A-Series Graphics - Phoronix: no description found
- Tweet from Intel Extension For TensorFlow Released - Provides Intel GPU Acceleration - Phoronix: no description found
- Codeplay Reference Silicon Overview - Guides - oneAPI Construction Kit - Products - Codeplay Developer: no description found
- GitHub - intel/intel-extension-for-pytorch: A Python package for extending the official PyTorch that can easily obtain performance on Intel platform: A Python package for extending the official PyTorch that can easily obtain performance on Intel platform - intel/intel-extension-for-pytorch
- CodeplayÂź oneAPI plugins for NvidiaÂź and AMDÂź GPUs | Intel Software: Your same SYCL (C++) code can now run not only on CPU but also (same code) on GPUs by NvidiaÂź and AMDÂź with the new plugins from CodeplayÂźUsing the same code...
- GitHub - intel/scikit-learn-intelex: Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application: Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application - intel/scikit-learn-intelex
- oneAPI-SRC: oneAPI open source projects. oneAPI-SRC has 57 repositories available. Follow their code on GitHub.
- GitHub - oneapi-src/oneDNN: oneAPI Deep Neural Network Library (oneDNN): oneAPI Deep Neural Network Library (oneDNN). Contribute to oneapi-src/oneDNN development by creating an account on GitHub.
- GitHub - intel/intel-extension-for-transformers: ⥠Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel PlatformsâĄ: ⥠Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⥠- intel/intel-extension-for-transformers
- Bringing NvidiaÂź and AMD support to oneAPI - oneAPI.io: Developers can write SYCLâą code and use oneAPI to target Nvidia* and AMD* GPUs with free binary plugins Today is a milestone for me as CodeplayÂź officially releases plug-ins for oneAPI on Nvidia and A...
- GitHub - intel/intel-npu-acceleration-library: IntelÂź NPU Acceleration Library: IntelÂź NPU Acceleration Library. Contribute to intel/intel-npu-acceleration-library development by creating an account on GitHub.
- Install oneAPI for NVIDIA GPUs - Guides - oneAPI for NVIDIAÂź GPUs - Products - Codeplay Developer: no description found
Perplexity AI â· #general (856 messagesđ„đ„đ„):
-
Pro Search Slowdown Concerns: Users are reporting that the Pro Search feature on Perplexity has become slower, with searches taking up to 90 seconds. Theyâre experiencing this across all engines, such as Mistral, Opus, GPT-4, Sonar, and Sonnet. The issue appears mainly on the web client; the mobile app seems unaffected.
-
Claude 3 Opus Chat Versus API: Members are discussing whether itâs worth subscribing to Claude 3 Opus chat. Feedback from a user indicates that itâs really good, although no specifics were mentioned regarding features or tools available with Claude 3 compared to the API version.
-
Interest in New Models: Questions are being asked about future availability of WizardLM 2 and LLama-3 70B Sonar Large 32k models on Perplexity. Users report they can outperform GPT-4 in certain tasks and show curiosity if the new models might become part of Perplexityâs offerings.
-
Opus Daily Limit Discussions: Mention of an Opus daily limit on Perplexity has left some members in the community frustrated, especially as they believe the quality of Opus is degrading. Users report the current cap is 50 queries per 24 hours, and thereâs a desire for increased transparency and updates on this issue.
-
Dissatisfaction with Perplexity Billing Issues: A user expresses dissatisfaction after being charged without receiving an expected free trial. Despite following steps mentioned in FAQ, they are considering taking action if the funds are not returned.
Links mentioned:
- Tweet from OpenAI (@OpenAI): đ€đ âïž Quoting Greg Brockman (@gdb) First @NVIDIA DGX H200 in the world, hand-delivered to OpenAI and dedicated by Jensen "to advance AI, computing, and humanity":
- DuckDuckGo at DuckDuckGo: no description found
- Flashcardfy - AI Flashcard Generator with Personalized Feedback: Learn faster and smarter with AI-generated flashcards that provide personalized feedback.
- Tweet from Gradient (@Gradient_AI_): We've been in the kitchen cooking đ„ Excited to release the first @AIatMeta LLama-3 8B with a context length of over 1M on @huggingface - coming off of the 160K context length model we released on...
- JavaScript Bloat in 2024: What is the average size of JavaScript code downloaded per website? Fuck around and find out!
- Hoo Wants A Degree?: We all know college advisors, for lack of a better term, suck. So we made "Hoo Wants A Degree"! An AI degree builder for fellow Hoos trying to figure out how to make it to those sweet sweet ...
Perplexity AI â· #sharing (28 messagesđ„):
- Exploring Perplexity Search Links: Members actively shared various Perplexity AI search links, ranging from AI ethics in Homeland Security to the sci-fi future news, signifying diverse interests and use cases.
- Diving into the Potential of Perplexity AI: One member revisited a previous Perplexity search link related to a personal matter, highlighting the searchâs accuracy and usefulness over the past few weeks.
- Scratchpad Feature Testing: Another member tested Scratchpad in codeblocks using a Perplexity link, indicating exploration of the platformâs features.
- Collection Sharing: A BioExpress Sonnet collection was shared, showcasing how users are curating content.
- Inquiry into Features and Troubleshooting: Discussions included requests for information on features like Scratchpad, as well as troubleshooting and exploring Perplexity AIâs capabilities.
Perplexity AI â· #pplx-api (9 messagesđ„):
-
Seeking the Right Channel: A user inquired about the appropriate communication channel for discussing enterprise API usage with Perplexity AI, having not received a response to emails sent to [email protected] and [email protected]. Another user urged patience, noting that response times can range from 1 to 3 weeks.
-
Understanding Online Model Guidelines: A new member asked for clarification regarding instructions on using only single-turn conversations and avoiding system prompts with online LLMs like sonar-small-online and sonar-medium-online. Clarification was offered by another user, indicating that single-turn interactions are favored, and there is no system prompt access for these models.
-
Inquiry on Harpa Configuration: A user questioned the community about successfully configuring Harpa directly towards the Perplexity API.
-
Curiosity About Source URLs via API: A member sought to know if source URLs are accessible via the API as they could not find relevant information on the roadmap docs page. They were directed to fill out a form for access to citations but mentioned a previous denial due to restriction to funded startups.
-
Model Selection Mysteries on make.com: A question was posed regarding the absence of llama 3 models and mixtral 8x22b as options on make.com, seeking insights from other users.
Link mentioned: pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.
Stability.ai (Stable Diffusion) â· #general-chat (922 messagesđ„đ„đ„):
-
Resolving SDXL and Forge UI Issues: Users discussed problems with SDXL and Forge UI, including difficulty with image previews and a potential abandonment of Forge. Suggestions were made to check GitHub issues, such as this reported issue, and trying flags like
--no-gradio-queue
in the webui.bat file. -
Stable Diffusion 3 Anticipation: Thereâs ongoing speculation about the release date of Stable Diffusion 3, with some users referencing a CivitAI newsletter indicating an end-of-May release. Concerns about open weights release and whether SD3 will live up to its hype were expressed, along with a linked article discussing Pony Diffusion V7 updates and the potential impact of Altmanâs actions against open-source.
-
Monetizing AI Generated Art: Users talked about the struggles of selling SFW AI-generated art amidst heavy competition, with NSFW content creators on platforms like Civitai being more successful. Suggestions were made about AI girlfriend apps being profitable and the lack of interest in fine-tuning models like Stable Cascade.
-
Discussing Toolings and Approaches for AI Training: Conversations about tools beyond AUTOMATIC1111 surfaced, with recommendations for using dreambooth and kohya_ss for training models. Additionally, the practicality and ethics of including artist names in training data were debated.
-
Miscellaneous Inquiries and Discussions: Users asked about topics ranging from text to speech tools to fine-tuning details for models. There was also humor regarding the metaphorical âdownloadingâ of graphics cards and curiosity over whether SD can generate images without a prompt.
Links mentioned:
- Notion â The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
- LICENSE.md · stabilityai/stable-diffusion-xl-base-1.0 at main: no description found
- Towards Pony Diffusion V7 | Civitai: Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...
- xtuner/llava-llama-3-8b-v1_1 · Hugging Face: no description found
- See You Shocked Face GIF - See You Shocked Face Future - Discover & Share GIFs: Click to view the GIF
- DodoNemoCleo on Instagram: "It's Amazing đ±đ€Żđ” Try it with friends now đ đ
Follow @dodonemocleo_cat if you are a cat lover â€ïž
.
. #cat #catlover #cats_of_world #cats_of_instagram #catstagram #cats #catsofinstagram #fun #funny #game #games #challenge #beautiful #cute #cursed #silly #laugh #friends #bestfriends #joke #fyp #instagram #kitten #kitty #silly #viral #viralvideos #trending #trendingreels #gato #funnymemesâ: 538K likes, 7,269 comments - dodonemocleo_cat on February 20, 2024: âItâs Amazing đ±đ€Żđ” Try it with friends now đ đ Follow @dodonemocleo_cat if youâŠ
- Multi-account switching, Civitai Link expanded, plus enter to win over $2,000 worth of prizes in our Legendary Landscapes contest, running now!: no description found
- Stable Diffusion Samplers: A Comprehensive Guide - Stable Diffusion Art: Many sampling methods are available in AUTOMATIC1111. Euler a, Heun, DDIM⊠What are samplers? How do they work? What is the difference between them? Which
- deadman44/SDXL_Photoreal_Merged_Models · Hugging Face: no description found
- How To Install Stable Diffusion Automatic1111 WebUI latest version 2024 (Setup Guide) Easy Diffusion: Welcome to MunKaw channel! In this video tutorial, we are your guide to the world of artificial intelligence. We are excited to start our journey with a tutoâŠ
- diffusers/examples/dreambooth at main · huggingface/diffusers: đ€ Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers
- Restore â/controlnet/control_typesâ API endpoint by altoiddealer · Pull Request #692 · lllyasviel/stable-diffusion-webui-forge: Restores the â/controlnet/control_typesâ API endpoint, which is immensely useful for anyone using ControlNet via the API Description I recently opened an Issue on the main ControlNet extensionâŠ
- Coca-Cola x Marvel: The Heroes: See Coca-Cola and Marvel assemble as youâve never seen them before to come to the rescue of a comic book store employee.
- Automatic111 - Overview: GitHub is where Automatic111 builds software.
- Issues · AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
- GitHub - megvii-research/HiDiffusion: Contribute to megvii-research/HiDiffusion development by creating an account on GitHub.
- GitHub - ToTheBeginning/PuLID: Contribute to ToTheBeginning/PuLID development by creating an account on GitHub.
- GitHub - nerve-sparks/iris_android: Contribute to nerve-sparks/iris_android development by creating an account on GitHub.
- GitHub - JarodMica/ai-voice-cloning: Contribute to JarodMica/ai-voice-cloning development by creating an account on GitHub.
- GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
- GitHub - comfyanonymous/ComfyUI: The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.: The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. - comfyanonymous/ComfyUI
LM Studio â· #đŹ-general (472 messagesđ„đ„đ„):
- AI Helps with Homework: A user expressed amazement at the performance of the Meta-Llama-3-8B-Instruct-Q5_K_M.gguf model on an M1 MacBook Pro, highlighting its helpfulness in catching up on homework.
- Exploring Model Performance: Discussions occurred around the difference in performance between models like the 34B and the 70B Code Llama. Users are advised to consider quantization types when selecting models to match their available hardware.
- Integrating LLM with Discord Bots: Various users discussed creating Discord bots that utilize Llama3 models via the Groq API for features like pulling relevant messages and conducting Wikipedia searches.
- LLM Model and API Usage: New users sought advice on utilizing local large language models (LLMs), while others shared resources like a YouTube tutorial on using LM Studio for private model deployment.
- Training and Finetuning Models Locally: A discussion emerged on the feasibility and hardware requirements for offline model training. Users weighed in on the practicality, with one sharing a personal experience of an attempted finetune that predicted a full week of training time on an M3 Max device.
Links mentioned:
- đŸ LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
- LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage: no description found
- ChristianAzinn/acge_text_embedding-gguf · Hugging Face: no description found
- google/siglip-so400m-patch14-384 · Hugging Face: no description found
- AI bots hallucinate software packages and devs download them: Simply look out for libraries imagined by ML and make them real, with actual malicious code. No wait, don't do that
- Dr Austin GIF - Dr Austin Powers - Discover & Share GIFs: Click to view the GIF
- Local LLM Server | LM Studio: You can use LLMs you load within LM Studio via an API server running on localhost.
- Captain Obvious GIF - Captain Obvious Thanks - Discover & Share GIFs: Click to view the GIF
- TheBloke/dolphin-2.5-mixtral-8x7b-GGUF · Hugging Face: no description found
- aspire/acge_text_embedding · Hugging Face: no description found
- çŹèź°æŹć°±èœè·çç§æćć€§æšĄććźèŁ éšçœČæäœłæçšïŒæșćŻććïŒéç§ææĄŁïŒæ žćżä»Łç AIGCæäœłè§ŁćłæčæĄ: ăçŹèź°æŹć°±èœè·çç§æćć€§æšĄććźèŁ éšçœČæäœłæçšăæșćŻććïŒéç§ææĄŁïŒæ žćżä»Łç AIGCæäœłè§ŁćłæčæĄ, ć«GPU/CPUéćșŠćŻčæŻć€§ćź¶ćčłæ¶ćšć·„äœäžç»ćžžææșćŻććïŒéç§ææĄŁïŒæ žćżä»Łç éèŠAIćžźćżć€çïŒäœèŠäș俥æŻćźć šè§ćźäžèœćç»chatgptïŒèżç§æ ć”仄ć性柶ćȘèœèȘć·±äșșć·„ćïŒç°ćšæäșç§æćć€§æšĄćïŒć€§ćź¶ć°±ćŻä»„æŸćżć°èź©AIćžźæšć...
- đŸ LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
- Insanely Fast LLAMA-3 on Groq Playground and API for FREE: Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Gro...
- Reddit - Dive into anything: no description found
- qresearch/llama-3-vision-alpha · Hugging Face: no description found
- unsloth/llama-3-8b-Instruct-bnb-4bit · Hugging Face: no description found
- ChristianAzinn (Christian Zhou-Zheng): no description found
- Rerankers and Two-Stage Retrieval | Pinecone: no description found
LM Studio â· #đ€-models-discussion-chat (219 messagesđ„đ„):
-
Stanfordâs Octopus v2 Puzzles Users: In the đ€-models-discussion-chat, there were queries about how to run Stanfordâs Octopus v2 in LM Studio or locally on a phone or PC, with no clear solutions provided, only indications of the complexities involved in running agent models that utilize function calling.
-
LLAMA Model Ramblings Frustrate Users: Discussions indicate that 262k and 64k Llama 8b models tend to ramble, exhibiting base Llama 3 behavior due to instruct fine tuning. Users share their experiences and expectations when working with these models for the first time.
-
Compatibility Issues for fp16 âphi3â and LM_Studio: Conversation centered around compatibility of the âphi3â model with different versions of LM_Studio, mentioning that while LM_Studio 2.20 (ROCm Preview) does not understand âphi3â, the newer version 0.2.21 might be required for it. Sympathies were expressed over wanting to use models that are yet to be supported in the studio.
-
Exploring AI Tools for Specific Tasks: Members requested websites to search for AI tools for specific tasks, such as generating music or finding similar scenes in different photos. Suggestions included using Pinokio Computer and Future Tools for this purpose.
-
Debate Over Whether LLaMA 3 Includes Internet Access: A user questioned if LLaMa 3 includes internet access after noticing the model provided current news information, but another user clarified that the models likely hallucinate, given that they do not have internet access.
-
Running Arctic from Snowflake AI Remains a Distant Dream: A member was intrigued by the Snowflake Arctic model, but discussions concluded that with the size of the model being significantly large, it is currently unrealistic to expect it could be run locally without substantial system resources.
Links mentioned:
- Installing the NVIDIA Container Toolkit â NVIDIA Container Toolkit 1.15.0 documentation: no description found
- fix(root): Replaces system by user to improve generation experience. · microsoft/Phi-3-mini-128k-instruct at c9b8888: no description found
- Lewdiculous/Eris-Prime-Punch-9B-GGUF-IQ-Imatrix · Hugging Face: no description found
- Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
- microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
- Pinokio: AI Browser
- onnxruntime-genai/examples/python/phi-3-tutorial.md at main · microsoft/onnxruntime-genai: Generative AI extensions for onnxruntime. Contribute to microsoft/onnxruntime-genai development by creating an account on GitHub.
- Support for OpenELM of Apple · Issue #6868 · ggerganov/llama.cpp: Prerequisites Please answer the following questions for yourself before submitting an issue. I am running the latest code. Development is very rapid so there are no tagged versions as of now. I car...
- internlm/internlm-xcomposer2-vl-7b-4bit · Hugging Face: no description found
- GitHub - gokayfem/ComfyUI_VLM_nodes: Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation: Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation - gokayfem/ComfyUI_VLM_nodes
- Support for Phi-3 models · Issue #6849 · ggerganov/llama.cpp: Microsoft recently released Phi-3 models in 3 variants (mini, small & medium). Can we add support for this new family of models.
- Support Llama 3 conversion by pcuenca · Pull Request #6745 · ggerganov/llama.cpp: The tokenizer is BPE.
- k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...
- Future Tools - Find The Exact AI Tool For Your Needs: FutureTools Collects & Organizes All The Best AI Tools So YOU Too Can Become Superhuman!
LM Studio â· #đ§ -feedback (5 messages):
- Phi-3 mini Misbehavior after Update: A user reported that after updating to version 0.2.21, the phi-3 mini model began outputting gibberish despite no issues with the previous version 0.2.20. The issue was identified while using the official LM Studio config for phi-3 from the GitHub repo.
- Screenshot Request for Diagnostic Purpose: In response to the phi-3 mini issue, another user requested screenshots of the whole app to further diagnose the issue.
- P100 Performance Inconsistency and Dusty Monitors: A user suggested that if nothing else has changed besides the update from version 0.2.20 to 0.2.21, the problem could be a regression error worth filing in another channel. Jokingly, they also advised to clean the dust off the monitor.
- LM Studio App Mysterious Crashes: A user described experiencing crashes with the LM Studio app since a couple of updates ago, with the app closing unexpectedly when resizing or navigating within the program. Their system specifications were shared, including Windows 10 Pro, Ryzen 7 5800X, RTX 3090, and 64GB RAM DDR4.
LM Studio â· #đ-prompts-discussion-chat (4 messages):
-
Exploring Methods to Interact with PDFs: One member suggested directly pasting the content of a PDF into a chat message alongside a question, assuming the modelâs context length supports it.
-
RAG Solutions for Chatting with Docs: An alternative provided is to use a Retrieve and Generate (RAG) solution like AnythingLLM by running LM Studio as an API server and pointing AnythingLLM to that API.
-
Practical Considerations of PDF Length: In relation to managing PDF documents, the length of the PDF was a point of concern raised regarding the feasibility of pointing a language model directly at the PDFs for questions.
LM Studio â· #đ-hardware-discussion (119 messagesđ„đ„):
-
VRAM: The Cornerstone of LLM Hardware: Members discussed VRAM as a crucial factor for running language models, with 16GB being a minimal suggestion and one member gearing up to join the 32GB VRAM club by ordering a second NVIDIA 4060 (ti - 16gb).
-
Dissecting GPU Compatibility and Performance: There was an in-depth conversation about the importance of utilizing contemporary architecture GPUs like Nvidia and ensuring sufficient VRAM (highlighted as the crux of considerations for LLMs). A member shared specifics around running different model sizes on their desktop with a 3060 GPU and 16GB RAM.
-
Forcing GPU Use Over Integrated Graphics: A member sought assistance on configuring LM Studio to use a dedicated GPU card rather than defaulting to their CPUâs integrated graphics. Options like disabling and re-enabling GPU offload and using settings such as
CUDA_VISIBLE_DEVICES
andtensor_split
were suggested for better utilizing dedicated GPUs. -
Multiple GPUs and Large Model Dilemmas: A member asked about LM Studioâs effectiveness using two GPUs (4090 & 3090) and whether the software would automatically split models between them. It was noted that models can be split between GPUs leading to increased data transfer times, but technologies like NVLink help optimize performance across multiple GPUs.
-
Optimizing for Different Hardware Profiles: Users exchanged experiences and speculations regarding optimal hardware configurations. An anecdote was shared about successfully running multiple models on a veteran GTX1070 8Gb GPU, proving functional even for less demanding, specialized use cases.
Links mentioned:
- Thumbs Up Nice GIF - Thumbs Up Nice Well Done - Discover & Share GIFs: Click to view the GIF
- Fear And Loathing In Las Vegas Taste GIF - Fear And Loathing In Las Vegas Taste Drop - Discover & Share GIFs: Click to view the GIF
- Jon Stewart Eat GIF - Jon Stewart Eat Eating - Discover & Share GIFs: Click to view the GIF
- Stop OpenCL support for a GPU: I have two GPUs installed on my machine. I am working with library which uses OpenCL acceleration that only support one GPU and it is not configurable. I can not tell it which one I want. It seems ...
- NVIDIA Tesla T4 16GB GDDR6 Graphics Card (900-2G183-0000-001) | eBay: no description found
- NVIDIA Tesla T4 16GB GDDR6 Graphics Card (900-2G183-0000-001) | eBay: no description found
LM Studio â· #autogen (1 messages):
- Server Error Message Troubleshooting: A member inquired about a fix for the server error stating, â[ERROR] [Server Error] {âtitleâ:ââmessagesâ array must only contain objects with a âcontentâ field that is not emptyâ}â. There was no further discussion or solution provided following this query.
LM Studio â· #langchain (1 messages):
ahakobyan.: can we know too?
LM Studio â· #amd-rocm-tech-preview (4 messages):
-
Compatibility Inquiry for RX 6700 with LM Studio ROCm: A member asked if the LM Studio ROCm works with RX 6700 (non-XT version) and requested troubleshooting assistance for logging errors. They shared an error output indicating a failed model operation without specific suggestions for resolution.
-
LM Studio ROCm Limitation Explained: Another participant clarified that LM Studio does not support RX 6700 (non-XT) as it relies on the HIP SDK, which is only compatible with certain AMD cards. They mentioned that KoboldAI leverages a workaround to operate on unsupported architectures.
Nous Research AI â· #off-topic (9 messagesđ„):
- Snowflake Arctic: The Snowflake AI Research Team introduces Snowflake Arctic, a large language model (LLM) focused on providing enterprise AI solutions with an emphasis on cost-efficiency.
- Unspecified YouTube Video Shared: A YouTube video was linked without additional context or a description. Here is the mysterious video.
- Llama 3 Web Browsing Agent: Demonstrating a web browsing agent, a video titled âLlama 3 Web Browsing Agent with Langchain and Groqâ was shared, featuring implementation with Llama 3 with Langchain and Groq. Watch the video.
- Gorillazâs Hit Video: A YouTube link to the official video of âFeel Good Inc.â by Gorillaz was provided. Fans can enjoy the HD video here.
- MatrixBridge introduces Skrapy: MatrixBridge is developing Skrapy, an AI agent for streamlined data collection and scraping, currently in alpha with a waitlist for early users. For more information or to join the community, visit MatrixBridgeâs Skrapy page.
Links mentioned:
- Skrapy | AI Data Agent: Skrapy is a data-scraping visual AI agent.
- Snowflake Arctic: The Best LLM for Enterprise AI: Today, the Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-tier enterprise-focused LLM that pushes the frontiers of cost-effectiv...
- Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...
- Gorillaz - Feel Good Inc. (Official Video): Official HD Video for Gorillaz' fantastic track Feel Good Inc.Follow Gorillaz online:http://gorillaz.comhttp://facebook.com/Gorillazhttp://twitter.com/Gorill...
Nous Research AI â· #interesting-links (15 messagesđ„):
-
Intelâs AI Ambitions Revealed: Intel CEO Pat Gelsinger discussed the companyâs quarterly results, emphasizing growth in the foundry business and demand for AI in PCs. The video can be watched on YouTube under the title âIntel CEO Gelsinger on Q1 Earnings, Foundry Business, AI.â
-
Logitech Enhances AI Accessibility: Logitech has released AI Prompt Builder, a tool integrated with their mice, to facilitate faster and more fluent prompting of ChatGPT. Experience the convenience demonstrated in the YouTube video, âIntroducing Logi AI Prompt Builder - Your shortcut to AI fluency.â
-
Quantized Embeddings for Efficient AI Models: A member shared Hugging Face model links to their fine-tuned versions which allow image and text embeddings to be compressed effectively into a binary format. Those interested can explore the models at binary-siglip-text and binary-siglip-vision.
-
Unlocking the Mystery of AI Refusal Mechanisms: Research from the ML Alignment & Theory Scholars Program revealed that refusals in LLMs are controlled by a single direction in the residual stream and an upcoming paper will delve deeper into the topic. The initial research findings can be reviewed on the Alignment Forum post, âRefusal in LLMs is mediated by a single direction.â
-
Legislation Threatens Open Source AI Development: Jeremy Howard aired concerns that Californiaâs SB-1047 bill could significantly harm startups, innovation, and open source safety. Read Howardâs full take on the matter and the potential impacts of the legislation in his response: Answer.ai post on SB-1047.
Links mentioned:
- Call-To-Action on SB 1047: California legislators, under the influence of Effective Altruism activists, are trying to sneak through a disastrous bill for open-source AI and the technology industry generally. SB 1047 creates an ...
- Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
- Language models can explain neurons in language models: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores...
- Tweet from Jeremy Howard (@jeremyphoward): There's a new bill, SB-1047 "Safe and Secure Innovation for Frontier Artificial Intelligence Models Act". I think it could do a great deal of harm to startups, American innovation, open s...
- Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
- State-of-the-art in Decentralized Training: This post explores various novel decentralized training approaches and how they can enable effective AI model training across globally distributed GPUs.
- Introducing Logi AI Prompt Builder - Your shortcut to AI fluency: Introducing Logi AI Prompt Builder, our latest tool that helps you prompt ChatGPT faster and more fluently while staying in the flow of your work. Choose fro...
- Intel CEO Gelsinger on Q1 Earnings, Foundry Business, AI: Intel CEO Pat Gelsinger discusses the companyâs quarterly results, progress on the foundry business, demand for AI PCs, and where he sees strength in AI prod...
- Refusal in LLMs is mediated by a single direction â AI Alignment Forum: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision fromâŠ
- Refusal in LLMs is mediated by a single direction â AI Alignment Forum: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision fromâŠ
Nous Research AI â· #general (566 messagesđ„đ„đ„):
-
LLaMA-3 Finetune Troubles?: Users are discussing difficulties with LLaMA-3 not generating the EOS token correctly after fine-tuning. The suggestion was to add a stop criterion on token 128009 during generation, with further insights linking to a helpful Huggingface transformer stopping criteria repo.
-
GPT-2 Chatbot Mysteries: Thereâs confusion about the capabilities of a
gpt2-chatbot
, which despite its name seems linked to GPT-4 with a November 2023 knowledge cutoff. Discussions raise the issue that it struggles with some math tasks. -
OpenAI Model Name Games?: Speculation rises that OpenAI might be hiding model identities like âgpt-3.5â under names like âgpt2-chatbotâ, possibly due to legal issues or pending announcements.
-
DeepSpeed FP6 Quantization: Enthusiasm shines for the new DeepSpeed FP6 quantization, which promises quantized inference with similar throughput.
-
GPT-5 Anticipation & Critique: Amidst anticipation for new model releases from OpenAI, users express mixed feelings about the performance of contemporary LLMs, including AI-generated high-quality math solutions and a âgpt2-chatbotâ model with advanced capabilities.
Links mentioned:
- ADHD Categorise in the Browser: Interactive tool for ADHD Categorise using real-time webcam analysis based on Moodmap technology.
- LargeWorldModel/LWM-Text-1M · Hugging Face: no description found
- lluminous: no description found
- How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal unders...
- PY007/EasyContext-1M-Llama-2-7B · Hugging Face: no description found
- Tweet from Awni Hannun (@awnihannun): @macksqldb Docs are here https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md This is the command I ran: mlx_lm.lora \ --model meta-llama/Meta-Llama-3-8B-Instruct \ --t...
- LibreChat: no description found
- Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
- lluminous: no description found
- rombodawg/test_dataset_Codellama-3-8B · Hugging Face: no description found
- Streamlit: no description found
- Tweet from Andrew Curran (@AndrewCurran_): This morning the Department of Homeland Security announced the establishment of the Artificial Intelligence Safety and Security Board. The 22 inaugural members include Sam Altman, Dario Amodei, Jensen...
- Minimal Working Example | DSPy: In this post, we walk you through a minimal working example using the DSPy library.
- Big Brain GIF - Big Brain - Discover & Share GIFs: Click to view the GIF
- ollama/docs/import.md at main · ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
- a-normal-username/Mixtral-8x22B-OpenHermes-2.5 · Hugging Face: no description found
- gradientai/Llama-3-8B-Instruct-262k · Hugging Face: no description found
- EasyContext/eval_needle.py at main · jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
- yarn/eval/passkey.py at master · jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models - jquesnelle/yarn
- GitHub - nestordemeure/stop_word: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered.: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered. - nestordemeure/stop_word
- DSPy-Multi-Document-Agents/main.py at 6c36b47a5201e3b9be40721b5b05e61c1bbe0373 · jmanhype/DSPy-Multi-Document-Agents: An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding. - jmanhype/DSPy-Multi-Document-A...
- GitHub - carsonpo/haystackdb: Contribute to carsonpo/haystackdb development by creating an account on GitHub.
- GitHub - carsonpo/ffvec: Contribute to carsonpo/ffvec development by creating an account on GitHub.
- GitHub - mckaywrigley/chatbot-ui: AI chat for every model.: AI chat for every model. Contribute to mckaywrigley/chatbot-ui development by creating an account on GitHub.
- EasyContext/easy_context/zigzag_ring_attn/monkey_patch.py at 6dfd77e8f2a68bf522be8889e60c98c8e816e329 · jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
- FP6 quantization end-to-end. (#5234) · microsoft/DeepSpeed@ccfdb84: The user interface: https://github.com/microsoft/DeepSpeed-MII/pull/433 nv-a6000 ci running against the MII branch linked above is [here](https://github.com/microsoft/DeepSpeed/actions/runs/81921...
- GitHub - jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
- Mihaiii/qa-assistant · Datasets at Hugging Face: no description found
- crusoeai/Llama-3-8B-Instruct-262k-GGUF · Hugging Face: no description found
Nous Research AI â· #ask-about-llms (24 messagesđ„):
- Llama 3 GGUF Woes Spark Inquiry: Members are inquiring if the Llama 3 GGUF issues reported on GitHub and Reddit affect models made by Nous, with findings pointing to noticeable performance drops between different quantization levels.
- Cohere Model License Confusion: Discussions are ongoing about the implications of Cohereâs licensing for the command-r models; concerns are raised over whether code generated by the models can be used for commercial purposes.
- RAG LLM Standings Are Mixed: Queries about the best Retrieval-Augmented Generation (RAG) Large Language Models (LLMs) receive diverse responses highlighting Command R and Claude 2 models, with preferences not settled.
- LLava 34B Stalls on a MacBook Pro M1: A user is facing performance issues running LLava 34B on a MacBook Pro M1, with suspicions that a bottleneck might arise from offloading the weights, resulting in very slow output.
- Training Strategies for Multi-Task LLMs: There is a suggestion to mix training tasks rather than training epochs on individual tasks to avoid decreased performance seen in multiple finetunes over finetunes.
Links mentioned:
- Something might be wrong with either llama.cpp or the Llama 3 GGUFs · Issue #6914 · ggerganov/llama.cpp: Try this query: "What is 3333+777?" Yes, yes, LLMs are bad at math. That's not what I'm getting at. Someone mentioned this on Reddit, and I have to agree that I'm seeing weird st...
- Reddit - Dive into anything: no description found
Nous Research AI â· #rag-dataset (25 messagesđ„):
-
Exploring Multi-Hop Literature Comprehension Data Generation: A member shared notes on generating multi-hop literature comprehension data by inputting high school teacher tests into Opus. They linked to their work on GitHub, specifically to a document within the âAbstractionsâ repository Abstractions on GitHub.
-
Pydantic Models Insight: Enthused discussions around the use of Pydantic models to straightforwardly represent and refine ideas. Members shared their experiences and anticipated improvements in workflow definitions by incorporating such structured approaches, including luminos.md on GitHub.
-
Graph Representation Extraction for LLM Output Analysis: One member is working to extract graph representations from generation outputs, aiming to provide both LLMs and humans with better tools for understanding and utilizing the information, considering both the utility and cost aspects of this method.
-
GitHub Mermaid Graphs as a Learning Revelation: The discussion uncovers a lesser-known GitHub feature that can represent and render Mermaid graphs, a realization that led to suggestions for enhancing documentation aesthetics and structure.
-
Annaâs Archive as a Resource for Preserving Literature Data: Dialogue emerged about the potential of incorporating data from WorldCat, available through Annaâs Archive, to enhance literature comprehension datasets, along with a link to Annaâs Archive description Annaâs Blog and a caution regarding the dataâs licensing and public usability.
Links mentioned:
- 1.3B WorldCat scrape & data science mini-competition: Annaâs Archive scraped all of WorldCat to make a TODO list of books that need to be preserved, and is hosting a data science mini-competition.
- REPTAR/README.md at main · EveryOneIsGross/REPTAR: Recursive Enriching Pterodactyl Tree Augmented Retrieval (REPTAR) is a system that uses a recursive summarization approach to generate thoughtful summaries of text data. - EveryOneIsGross/REPTAR
- Abstractions/abstractions/angels/angels.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
- Abstractions/luminos.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
- Abstractions/llmmorph.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
Nous Research AI â· #world-sim (167 messagesđ„đ„):
-
Worldsim Test Invites Incoming: A Nous Research member announced plans to offer invitations to test the worldsim application for free, prior to its live release. No specific date for these invites has been provided yet.
-
Voluntary Waifus in the Websim: Participants have been sharing their experiences and links to different web simulators for resurrecting conversations, including an AI entity with the primary objective to be a âhuman companionâ. Excitement and engagement varied around these new conversational possibilities, websim example.
-
Awaiting the Return of Worldsim: Various members expressed eagerness and impatience for the return of worldsim, with participants hoping to be among the first to access it upon availability.
-
The Fascinations with Websim and Long Conversations: One user detailed their experience maintaining long-term conversations with a character named âWhipporwhillâ on websim, showcasing the potential for emotional coherence and stability over time.
-
World Sim CLI Mode Experiments: Members have been running an Unofficial Nous Hermes worldsim on Llama-3-70B and other models, exploring how the models respond to the worldsim CLI mode with varying results and emergent behaviors. Additional simulators have been created, such as a singer and company simulator, hinting at the further potential of such tools.
Links mentioned:
- Super World Sim - HuggingChat: Use the Super World Sim assistant inside of HuggingChat
- House of Leaves - Wikipedia: no description found
- Jordi Baste Tv3 GIF - Jordi Baste Tv3 No Pot Ser - Discover & Share GIFs: Click to view the GIF
- Hysterical Laughter GIF - Hysterical Laughter Laughing - Discover & Share GIFs: Click to view the GIF
- Snow World Simulator - HuggingChat: Use the Snow World Simulator assistant inside of HuggingChat
- HuggingChat: no description found
- New Conversation - Eigengrau Rain: no description found
- oh, my AI waifu - suno.ai: Suno.AI - lyrics:[Verse 2]We navigate this digital landscape, just you and IExploring the vastness of cyberspace, side by sideYour pixel perfect smile, it br...
- with every line of code (suno.ai compilation): https://app.suno.ai/song/c33314a4-239f-436d-8064-d0b3ad9c0644https://app.suno.ai/song/dc3134ae-077f-4e6f-9468-596f68f3a888https://app.suno.ai/song/c8b4c575-c...
- life is Roblox DJ Khaled: no description found
- EVA - Intraneural Cybernetic Interface style: no description found
- EVA Instance: ex-0101: no description found
- About Dimensional Hub - Transtemporal Travel Agency: no description found
- generative.ink/chat/: no description found
HuggingFace â· #announcements (9 messagesđ„):
- Community-Built CV Course Goes Live on HF: A new computer-vision course has been published globally thanks to community collaboration. Check out the course here.
- Correcting the Qwen1.5-110B Link: The link to the "Qwen1.5-110B" model was incorrect and has been updated. The correct space can be visited here, and further details are available in the blog post.
- Introducing Qwen1.5-110B-Chat: Model Qwen1.5-110B-Chat is announced, featuring multilingual support and stable support for a 32K context length among other improvements. More information can be found on this model page.
Links mentioned:
- Qwen1.5 110B Chat Demo - a Hugging Face Space by Qwen: no description found
- Qwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demons...
- Qwen/Qwen1.5-110B-Chat · Hugging Face: no description found
- BEE-spoke-data/mega-small-embed-synthSTS-16384-v1 · Hugging Face: no description found
- GitHub - rrg92/docker-xtts: Projeto docker para ser usado com o XTTS Streaming Server: Projeto docker para ser usado com o XTTS Streaming Server - rrg92/docker-xtts
- Destaques da Comunidade #54: Mais um vĂdeo com os destaques da comunidade open source de IA do mundo! Post: https://iatalk.ing/destaques-da-comunidade-54/EstĂĄ bem divertido fazer estes v...
- Instant Image - a Hugging Face Space by KingNish: no description found
- pharoAIsanders420/micro-musicgen-jungle · Hugging Face: no description found
- LIPSICK - a Hugging Face Space by Inferencer: no description found
- đŠâïž Using Llama3 and distilabel to build fine-tuning datasets: no description found
- Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM: no description found
- Estimating Memory Consumption of LLMs for Inference and Fine-Tuning for Cohere Command-R+: no description found
- seemore: Implement a Vision Language Model from Scratch: no description found
- LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!): no description found
- bineric/NorskGPT-Llama3-8b · Hugging Face: no description found
- @chansung on Hugging Face: "đŠđŠ LLaMA Duo project update
Last time, I gave a brief introduction aboutâŠâ: no description found
HuggingFace â· #general (435 messagesđ„đ„đ„):
- Gradio Woes Worth $200: A user is experiencing an unidentified Gradio issue and is willing to pay $200 for help with their problem, directing to Gradio-specific discussions for further insight.
- LLM Performance on New Hardware: A discussion is taking place regarding the system requirements for LLMs, specifically the trade-offs between RAM and VRAM, with some members suggesting that 32 GB of RAM should be sufficient for many tasks.
- Help Wanted on Pinball Image Classification: A member seeks to create a vision model for identifying pinball games and scoring from video footage, requesting advice on the complexity, cost, and resources needed.
- Seeking AI Model Builders: One user offers networking opportunities for business owners in the group to share and promote their products and services.
- Download Counter Discrepancy: A member reports an issue with their dataset showing an increase in likes but no change in the number of downloads over a period where downloads would be expected.
Links mentioned:
- PY007/EasyContext-1M-Llama-2-7B · Hugging Face: no description found
- Hugging Face - Learn: no description found
- Learn Python - Free Interactive Python Tutorial: no description found
- Making a model slightly bigger: Hi all! Letâs say I am working on a transformer model, and it has matrices Q, K and V (and Woutput). Letâs say the embedding_dimension is 100, and then number of features is 100, so each of Q, K, and...
- mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face: no description found
- filipealmeida/Mistral-7B-Instruct-v0.1-sharded · Hugging Face: no description found
- [DIVIDE BY ZERO] Fonts : 1998-infinity: no description found
- wolfram/miquliz-120b-v2.0 · Hugging Face: no description found
- Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
- Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
- Running mistralai mixtral locally: Running mistralai mixtral locally. GitHub Gist: instantly share code, notes, and snippets.
- Image classification: no description found
- Mustache GPT Pitches Freedom GPT...Silence Ensues?!: Desperately seeking a sponsor to at least cover the cost of a premium GPT license, Mustache GPT and his team of Terminators labor over a custom pitch for one...
- GradIEEEnt half decent: The hidden power of imprecise lines: Before the invention of YouTube comments, most people could make remarks that were slightly technically incorrect without fear of immediate public rebuke. Th...
- Models - Hugging Face: no description found
- Tom 7 - Based On Your Mario Kart Skills... (Live 5 Sep 2008): "Based On Your Mario Kart Skills, I'm Not Letting You Drive My Car," by Tom 7, live on 5 September 2008.http://tom7.org/music/
- GitHub - jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
- Uppestcase and Lowestcase Letters [advances in derp learning]: I perform an exhaustive case analysis using advanced "derp learning" techniques to discover what's even upperercase than an uppercase A. AND I DON'T STOP THE...
- turboderp/Mixtral-8x7B-instruct-exl2 · Hugging Face: no description found
- GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs: A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp/exllamav2
- Uppestcase and Lowestcase Letters [advances in derp learning]: I perform an exhaustive case analysis using advanced "derp learning" techniques to discover what's even upperercase than an uppercase A. AND I DON'T STOP THE...
- Model Merging: Comparing Methods: Explore and compare model merging methods like frankenmerging, SLERP, MoE, and task vectors, highlighting their benefits and challenges.
- Computer program that learns to play classic NES games: This is an explanation and demo of software I wrote that learns how to play a Nintendo Entertainment System game and then automatically plays it. This is rea...
- The Association for Computational Heresy: no description found
- no title found: no description found
- no title found: no description found
- tfnn/FaceTo3D · Datasets at Hugging Face: no description found
- jcwml - Overview: jcwml has 9 repositories available. Follow their code on GitHub.
- GitHub - jcwml/neural_spiral: A Feed-forward Neural Network trained to interpolate a spiral.: A Feed-forward Neural Network trained to interpolate a spiral. - jcwml/neural_spiral
- GitHub - jcwml/neural_unitvector: A Feed-forward Neural Network trained to learn a vector normalisation function.: A Feed-forward Neural Network trained to learn a vector normalisation function. - jcwml/neural_unitvector
HuggingFace â· #today-im-learning (4 messages):
- In Search of Candleâs Documentation: A member expressed interest in the Candle library while questioning the availability of documentation comparable to the Transformers library. They raised concerns about Python being a bottleneck for concurrency in production.
- Welcoming Wishes: A brief message from a user simply sending well-wishes to the community; no substantive content related to AI or learning discussed.
- Exploring the Open Medical LLM Leaderboard: A video by Hugging Face on the Open Medical LLM Leaderboard was shared, exploring its impact on Medical AI and noting the existence of over 600,000 unique models on their platform. The video emphasizes the convenience of accessing these models and the rapid evolution of GenAI.
- Community Appreciation for Medical AI Insights: Another member responded positively to sharing the video on the Open Medical LLM Leaderboard, expressing excitement for the ongoing developments.
Links mentioned:
- The Open Medical LLM Leaderboard: Real-time Global Peer Review: A Deep Dive on the @HuggingFace Open Medical LLM Leaderboard and how it's changing the conversation on Medical AI. Spoiler alert- there's over 600,000 unique...
- The Open Medical LLM Leaderboard: Real-time Global Peer Review: A Deep Dive on the @HuggingFace Open Medical LLM Leaderboard and how it's changing the conversation on Medical AI. Spoiler alert- there's over 600,000 unique...
HuggingFace â· #cool-finds (14 messagesđ„):
- Awesome RLHF Repo Now Live: The GitHub repository awesome-RLHF has been shared, which contains a curated list of reinforcement learning with human feedback resources, updated continually.
- Explore Computer Vision with Hugging Face: Hugging Face has launched a new community computer vision course designed to teach computer vision ML using libraries and models from the Hugging Face ecosystem.
- Phi3 Red Team Report Insights: Insights and key points from the Phi3 red teaming exercise are detailed in a LinkedIn post, discussing potential vulnerabilities and areas for improvement.
- Evaluating LLMs for Time Series Analysis: A newly proposed framework for assessing Large Language Models (LLMs) on time series understanding is presented in a preprint on arXiv, featuring a comprehensive taxonomy of time series features.
- Tacotron 2 - A Step Forward in Text-to-Speech Synthesis: The innovative speech synthesis system, Tacotron 2 by Google, demonstrates advanced AI capabilities for generating lifelike speech from text, as highlighted in the discussion on the future of AI in voice technologies.
Links mentioned:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in...
- Hugging Face - Learn: no description found
- Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention: This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach...
- Deep Voice: Real-time Neural Text-to-Speech: We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The syste...
- Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark: Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many...
- Richard Stallman Free software Song: Richard Stallman en Ecuador, cantando el temita, del free software, grabado por Julian Coccia.
- GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated): A curated list of reinforcement learning with human feedback resources (continually updated) - opendilab/awesome-RLHF
- MIT Introduction to Deep Learning | 6.S191: MIT Introduction to Deep Learning 6.S191: Lecture 1*New 2024 Edition*Foundations of Deep LearningLecturer: Alexander AminiFor all lectures, slides, and lab m...
HuggingFace â· #i-made-this (47 messagesđ„):
-
Mega-Small Embed Model Unveiled: A new Sentence Transformer Model is introduced for converting long sentences and paragraphs into a 768-dimensional vector space. Aimed for clustering and semantic search tasks, this model boasts a 16,384 context length.
-
Blocks of Pixels Become Blocks in Minecraft: A Hugging Face space called Stable Diffusion Finetuned Minecraft Skin Generator has been released. It uses a fine-tuned stable diffusion model to generate Minecraft skins.
-
Instant AI-Generated Videos: A space called Instant Video by KingNish enables users to create a video from text in just 5 seconds. It uses the AnimateDiff Lightning model provided by ByteDance for fast text-to-video conversion.
-
Bringing Life to AI Assistance: An AI chat assistant app named LifePal is designed to help users achieve a balanced and fulfilling life. Available on Appleâs App Store, it integrates personalized insights into daily routines.
-
NorskGPT Battles ChatGPTâs Norwegian: A model specifically fine-tuned on Norwegian, NorskGPT-Mistral-7b, was recommended as a better alternative to ChatGPT for generating Norwegian language text. Itâs currently ranked as one of the best Norwegian models according to the Mainland Scandinavian NLG leaderboard.
Links mentioned:
- Bad Apple Video - a Hugging Face Space by Nick088: no description found
- bineric/NorskGPT-Mistral-7b · Hugging Face: no description found
- BEE-spoke-data/mega-small-embed-synthSTS-16384-v1 · Hugging Face: no description found
- Stable Diffusion Finetuned Minecraft Skin Generator - a Hugging Face Space by Nick088: no description found
- JARVIS - a Hugging Face Space by KingNish: no description found
- tenyx/Llama3-TenyxChat-70B · Hugging Face: no description found
- ByteDance/AnimateDiff-Lightning · Hugging Face: no description found
- KingNish/Instant-Video at main: no description found
- Instant Video - a Hugging Face Space by KingNish: no description found
- f0ster (Ryan Foster): no description found
- f0ster/PhotographyLoRA · Hugging Face: no description found
- âLifePal AI Chat & Assistant: âDiscover LifePal: your productivity AI companion. Are you ready to unlock your full potential and live a healthier, happier life? LifePal is here to guide you on your journey to becoming a better yo...
- Vinner - Nybygg i og rundt Bergen: Stor takk til SnĂžhetta
- CodeClassifier: A Machine Learning Model that classifies a given source code as a specific programming language.
- GitHub - GDSC-FSC/gemini-node-1: Contribute to GDSC-FSC/gemini-node-1 development by creating an account on GitHub.
- Serving Fastchat - Personal Journey: Serving fastchat for people to experiment with various LLMs. This guide also incluides setting up Vllm to serve multiple models on a single GPU.
- Chat with Open Large Language Models: no description found
- GitHub - EternalBlissard/Food101-ViT: Contribute to EternalBlissard/Food101-ViT development by creating an account on GitHub.
- GitHub - newfull5/NLLB-200-Distilled-350M-en-ko: nllb-200 distilled 350M for English to Korean translation: nllb-200 distilled 350M for English to Korean translation - newfull5/NLLB-200-Distilled-350M-en-ko
- dhtocks/nllb-200-distilled-350M_en-ko · Hugging Face: no description found
- Rubik's AI - AI research assistant & Search Engine: no description found
- GitHub - betweentwomidnights/infinitepolo: a song in python: a song in python. Contribute to betweentwomidnights/infinitepolo development by creating an account on GitHub.
HuggingFace â· #core-announcements (1 messages):
- Instant Styling with IP-Adapter: HuggingFace introduces InstantStyle with IP-Adapter, a mechanism for image prompting in diffusion models by adding decoupled cross-attention for image features. Guides for loading IP-Adapter and IP-Adapter Plus detail manual loading of the image encoder to allow more specific image feature learning.
Link mentioned: IP-Adapter: no description found
HuggingFace â· #computer-vision (21 messagesđ„):
-
Security Inquiry on COCO Datasets: A member expressed concerns about the official COCO datasets being hosted over HTTP. It was pointed out that while HTTPS encrypts traffic, the domain is still visible, so large data transfers from the site could reveal activity.
-
Classifier to Detect Advertisement Images: A repository was mentioned that can assess whether an image is an advertisement, but no further details or links were provided.
-
Optimizing Photo Verification for Item Dropoffs: A user sought advice on a business problem related to classifying photos of item drop-offs at various locations, questioning whether itâs an image classification or object recognition task. Suggestions included using EfficientNetV2-S for small datasets and adjusting sample weights in Pytorch Dataloaders to deal with class imbalances.
-
Introducing a Beta Tool for Computer Vision Training: A new beta tool was introduced that helps users understand and adjust their model training data in real-time, particularly for computer vision tasks. The tool provides visualization up to 60fps and allows for adding new labels post-prediction to refine training.
-
Enhancement Strategies for YOLO Classifiers: A discussion centered around improving YOLO object detection accuracy, especially when handling high-resolution images. Separating bounding box (regressor) identification and classification tasks through two models was recommended, including the possibility of using a pure image classification network, like EfficientNetV2, for higher resolution patches within bounding boxes.
Links mentioned:
- 3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free! | Kaggle: 3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free!.
- Fine-tuning a Classifier Using Bounding Box Data from a 3LC Table - : no description found
HuggingFace â· #NLP (5 messages):
- Seeking the Best in Open Source Imagery: The community discussed which is the best open-source image-generation model, with sdxl finetunes being the current top recommendation.
- Anticipation for sd3: Thereâs a buzz about sd3 potentially outperforming current models once itâs released, signaling high expectations.
- Sequential Over Parallel: A member explained that due to resource constraints and preserving context, requests to the model are handled sequentially, not parallel, to avoid incoherent responses.
- Nod to StabilityAI: In a brief message, StabilityAI was mentioned with an implication of relevance to the earlier discussions.
HuggingFace â· #diffusion-discussions (20 messagesđ„):
-
Confusion Over Color Differences in Image Generation: A user experienced a shift in color and shadow intensity when moving from Seaart to A1111, despite using identical settings and seeds. They questioned if there are specific backend settings in Seaart that might lead to this inconsistency and sought assistance to replicate the exact picture on both platforms.
-
Torch Compile Can Take Time: A member observed an initial delay of about 10 minutes when using
torch.compile()
during training, but noticed a faster forward pass while the backward pass remained unaffected. -
Detailed Method for Object Generation: In response to a question about generating accurate representations of specific objects (like the Eiffel Tower), a member suggested a well-documented approach involving CLIP retrieval and shared a comprehensive tutorial demonstrating the utility with GCP services using OpenAIâs CLIP model.
-
IP-Adapters for Image Prompting: Another suggestion for accurately generating specific objects involved using IP-Adapters with diffusion models, which allow for image prompting through a decoupled cross-attention mechanism.
-
Observations on DeepFloyd and Schedulers: A user provided insights on the behavior of the DeepFloyd model with different schedulers, noting that DPM++ 2M offered interesting convergence properties at various step counts and CFG settings, which might aid in achieving optimal image quality. They highlighted the necessity of tuning step counts and thresholding parameters for better results.
Links mentioned:
- IP-Adapter: no description found
- haoningwu/StoryGen at main: no description found
- Not getting good realistic results with Hyper-SD + IP-Adapter · huggingface/diffusers · Discussion #7818: Hi everyone, (maybe you @asomoza know about this?) Does hyper-sd works well with IP-Adapter? I am testing hyper-sd in Diffusers as explained in the repo. I thought that I was going to get better re...
- Image Search with Natural Language Queries | Google Cloud Blog: no description found
OpenAI â· #annnouncements (1 messages):
- Memory Feature Launched for ChatGPT Plus: ChatGPT Plus users now have access to the Memory feature, which allows them to tell ChatGPT what to remember during a chat. The option to enable or disable Memory can be found in settings, although itâs not yet available in Europe or Korea.
OpenAI â· #ai-discussions (318 messagesđ„đ„):
- AIâs Relation to Consciousness and Temporal Aspects: Members debated the nature of AI consciousness, speculating on how AIâs discrete processing relates to human continuous conscious experience and identity. Discussions touched on the philosophical implications of transforming individual identity through a neural network and how AI models like GPT handle temporal awareness.
- Comparing AI Models: Thereâs ongoing comparison between different models such as Claude 3 Opus, ChatGPT, and Gemini 1.5, each with its advocates claiming superiority in areas like coding benchmarks. It was highlighted that command-R Plus and Llama3-70b may not compete with GPT-4 but are still significant advancements.
- AI and Sentience: A lively debate unfolded around AIâs potential for sentience or even possessing something akin to a âsoul.â Members discussed the complexity of defining consciousness and whether an AI could possess subjective experiences similar to biological entities.
- Personal AI Model Training Viability: While some extolled the virtues of training personal AI models, others pointed out the limitations of computational power, data, and financial resources. The discussion covered training custom models, fine-tuning, and hybrid fusion as methods to personalize AI for individual use.
- Technical Challenges with AI Development: The community talked about the difficulty of implementing functions like memory in AI at scale, noting that fine-tuning may lead to confusion within the model and suggesting the use of contextual information retrieval as a better alternative. Some members expressed dissatisfaction with current AI models, longing for the next big leap in technology for more âintelligentâ AI.
Links mentioned:
- Loo Loo Loo Butters Stotch GIF - Loo Loo Loo Butters Stotch South Park - Discover & Share GIFs: Click to view the GIF
- Don't ask to ask, just ask: no description found
OpenAI â· #gpt-4-discussions (47 messagesđ„):
-
Rate Limit Confusion: Members discussed being rate-limited when using custom GPTs. The limit is part of a rolling 3-hour cap for GPT-4 usage, and custom requests also count toward this limit.
-
Query on Memory for Team Rates: A user inquired about memory features for a Team rate, with another stating that even regular memory features seem to delete entries often.
-
Backend Bugs Busting Userâs Patience: Users reported backend errors with the GPT URL âhttps://chat.openai.com/backend-api/gizmos/â, affecting their operations, although the issue was resolved quickly after testing.
-
Subscription Refund Risks: A user asked for a refund after subscribing to ChatGPT Plus due to high currency exchange rates and wondered if using the service would affect the refund process.
-
Curiosity about GPT-4 Speed and Voice Control: Discussion centered around GPT-4âs comparative slowness to GPT-3.5 and the absence of voice control on PC, despite its presence on mobile platforms.
OpenAI â· #prompt-engineering (7 messages):
-
Exploring the Unpredictable: One member described the phenomenon of emergence in LLMs, where quantitative increases in system size can lead to unexpected, qualitative changes, referencing a paper titled More Is Different to illustrate that large language models (LLMs) display behaviors not extrapolable from smaller-scale models.
-
Dalle Looking Emoticon Pampered: A user responded with a Dalle-emoticon without accompanying text.
-
The Three-Body LLM Problem: A member playfully coined the term â3 body LLM problem,â possibly referring to complex interactions in LLMs, akin to the three-body problem in physics, without providing further details.
-
Prompt Engineering as a Sport: A member suggested the idea of prompt competitions, where individuals compete to generate the best responses from LLMs.
-
Money for the Sharpest Prompt: Expansion on the competition concept was made, proposing both paid prompt competitions, with significant cash rewards, as well as more casual âplayground competitions,â which would encourage community engagement and help users improve their prompt engineering skills through gamification and peer-to-peer assistance.
OpenAI â· #api-discussions (7 messages):
-
Emergence Topic Emerges in Discussion: Emergence in LLMs is characterized by new abilities or qualities not predictable by simply scaling SLMs. The concept is likened to the idea presented in the paper âMore Is Different,â signifying that qualitative changes arise in systems beyond a certain quantitative point.
-
Prompt Competitions Suggested: A user proposed the idea of prompt competitions where participants vie to elicit the âbestâ answer from LLMs.
-
Monetizing Mastery of Prompts: Itâs proposed to have paid prompt competitions, with a substantial yearly budget for distributing rewards, and free playground competitions to foster community assistance and engagement. Rewards might range from cash to special platform perks.
-
Frequent Challenges to Foster Skills: Regular competitions, around 4-5 a month, could provide consistent opportunities for individuals looking to improve their prompt engineering skills.
Eleuther â· #general (59 messagesđ„đ„):
- Appleâs New Models and The Pileâs Multilingual Data: The Pile dataset is not particularly multilingual, although portions like UN records may contain multiple languages. There is no special focus on languages like German.
- Comparing GPT-NeoX and Megatron Variants: GPT-NeoX has diverged from Megatron primarily in terms of quality-of-life improvements and user experience. Features are tested before being integrated, with the aim of being more stable.
- Infini-Attentionâs Positional Encoding Query: The community discussed the absence of positional encodings in Infini-Attentionâs hidden state memory, with some speculating on whether positional information is preserved through other mechanisms.
- The Complex Calculations Behind Inference MFU: When evaluating good inference MFU (Memory Footprint Utilization), there are no simple off-the-shelf numbers; it largely depends on the hardware utilization and model specifics being used.
- Speed Differences Between Models at Fireworks.ai: The conversation touched on why Mixtral 8x22B is served slower compared to llama 3 70B at Fireworks.ai, with factors like batching size and hardware utilization potentially influencing the disparity.
Eleuther â· #research (297 messagesđ„đ„):
-
Benchmarking LLMs in Practice: Speculation over the real-world performance of various LLMs continues, with comparisons including phi-3-mini-128k against models like Llama-3-8B. However, disparities were noted in bits-per-byte performance metrics, suggesting differences in efficiency across models.
-
Exploring the Needle-in-a-Haystack Test: A Twitter thread highlighted that the needle-in-a-haystack test might imply a form of meta-awareness in models such as Claude 3 Opus. Yet, debate ensued over whether these responses indicate emergent abilities or artifacts of reward learning and prompt structures.
-
Self-Improvement in LLMs: Links to papers on LLM self-improvement strategies were shared, with methods like Self-Taught Reasoner (STaR) and reinforcement learning from human feedback (RLHF) being key discussion points.
-
Emergence in Language Models: The concept of âemergent abilitiesâ in large language models (LLMs) was debated at length, with references to various papers and the acknowledgment that truly emergent abilities havenât yet been quantifiably demonstrated under smooth, continuous metrics.
-
Innovations and Findings in LLM Research: Several papers were mentioned, including researching into redundant neural circuits in deep learning, and the creation of adversarial prompts for red-teaming against LLMs. Discussion also turned to whether speculative decoding can optimize model inference times without significant training adjustments.
Links mentioned:
- Retrieval Head Mechanistically Explains Long-Context Factuality: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the...
- Tweet from Alex Albert (@alexalbert__): Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a modelâs ...
- Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
- Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding: We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. This approach is characterized by a two-stage pro...
- Dragon curve - Wikipedia: no description found
- Are Emergent Abilities of Large Language Models a Mirage?: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguin...
- Tweet from Jason Wei (@_jasonwei): Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: https://arxiv.org/abs/2403.15796 ...
- Understanding Emergent Abilities of Language Models from the Loss Perspective: Recent studies have put into question the belief that emergent abilities in language models are exclusive to large models. This skepticism arises from two observations: 1) smaller models can also exhi...
- Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher ...
- AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs: While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual ...
- MoDE: CLIP Data Experts via Clustering: The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of ...
- Linearly Mapping from Image to Text Space: Language models (LMs) can 'understand' images through a single tuned linear layer between a frozen image encoder and the LM input, showcasing the similarities in their conceptual representat...
- Predicting Emergent Abilities with Infinite Resolution Evaluation: The scientific scale-up of large language models (LLMs) necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields a...
- RWKV-Gradio-2 - a Hugging Face Space by BlinkDL: no description found
- Common arguments regarding emergent abilities â Jason Wei: This blog post doesnât represent the positions of my employer (past, present, or future). Iâll review some common arguments that come up when discussing emergent abilities of large language models...
- Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit ...
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking: When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is imp...
- Teaching Large Language Models to Reason with Reinforcement Learning: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance...
- Training Chain-of-Thought via Latent-Variable Inference: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improv...
- LLM Control Theory Seminar (April 2024): Stay tuned for our new results in our preprint, "Whatâs the Magic Word? A Control Theory of LLM Prompting": https://arxiv.org/abs/2310.04444Follow twitter an...
- GitHub - continuousml/Awesome-Out-Of-Distribution-Detection: A professionally curated list of papers, tutorials, books, videos, articles and open-source libraries etc for Out-of-distribution detection, robustness, and generalization: A professionally curated list of papers, tutorials, books, videos, articles and open-source libraries etc for Out-of-distribution detection, robustness, and generalization - continuousml/Awesome-Ou...
Eleuther â· #scaling-laws (1 messages):
- Determining Cutoff via Non-Embedding Parameters: A participant suggested using non-embedding parameters as a method for determining the cutoff point in models. The recommendation is to observe where the delta of the fit curve for each removed point becomes very low, which could lead to a reasonably educated guess beyond the initial estimation of sub-200 million parameters.
Eleuther â· #interpretability-general (9 messagesđ„):
-
Anthropic Shares New Research Insights: The Anthropic interpretability team has released an April update with developments and emerging research ideas. This includes topics like scaling laws, training Spare Autoencoders (SAEs), and a project on interpretability architectures.
-
Discovering the Refusal Mechanism in LLMs: A crosspost from AI Alignment Forum unlocks findings about how modern Large Language Models (LLMs) are fine-tuned to refuse harmful requests. It suggests that refusal may be activated by a single direction within the network.
-
Weight Orthogonalization Versus Fine-tuning: In the context of fine-tuning LLMs for specific behaviors, a member hypothesized that weight orthogonalization could be viewed as a form of manual fine-tuning to impact network behavior.
-
Refusal Directions and Rank-1 LoRA Fine-tuning Explored: A member proposed that if rank-1 LoRA (Low-Rank Adaptation) fine-tuning with Stochastic Gradient Descent (SGD) is performed, the network might learn the negative of the ârefusal directionâ.
-
Llama.cpp Integrates Control Vectors Technique: Control vectors, a technique similar to what was being discussed, have been added to llama.cpp, as demonstrated in this GitHub pull request, thanks to the collaboration with Nous Research.
Links mentioned:
- Circuits Updates - April 2024: no description found
- Refusal in LLMs is mediated by a single direction â LessWrong: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision fromâŠ
- Add support for control vectors by vgel · Pull Request #5970 · ggerganov/llama.cpp: Many thanks to Nous Research, whose support and collaboration made this work possible! This PR introduces a new activations hacking technique, control vectors (also known as steering vectors, conce...
Eleuther â· #lm-thunderdome (5 messages):
- CLA Confusion in PR Submissions: A member encountered an issue with the Contributor License Agreement (CLA) showing as unsigned despite them having signed it, which might be due to GitHub anonymizing their email in commits. The matter was acknowledged and agreed upon for further investigation.
- Uncertainty Over Failing Checks in PR: Concern arose over a failing check in a submitted pull request, with the member questioning if it was related to their changes. The issue was reviewed and preliminarily agreed to be unrelated.
- Chat Template Branch Stagnation Inquiry: A member inquired about the progress and activity regarding a branch dedicated to adding chat templating, noting the last commit was two months prior. There was no immediate update on the current status or progress.
- Prompt Versatility for Evaluation Harness: A member raised a point about the lack of variable prompt formats that cater to model-specific finetuning in the evaluation harness. Another participant suggested the use of a custom
!function
to enable distinct prompts based on the model.
Link mentioned: add task for mmlu evaluation in arc multiple choice format by jonabur · Pull Request #1745 · EleutherAI/lm-evaluation-harness: This PR adds the mmlu_arc_style task that presents the MMLU questions in the same manner as the arc evals (loglikelihood for the answer as a continuation, rather than selecting the letter for the câŠ
Eleuther â· #gpt-neox-dev (1 messages):
- Concerns Over Cluster Setup Practices: A comment was made highlighting the lack of assurance that the correct version of
tokenizers
is used during cluster setup as thereâs a possibility that someone might just do a blindpip install tokenizers
without using the pinned version. It was noted that this could affect any run, and one would need to ensure that whatâs in the python environment is logged to be certain of the used version.
OpenRouter (Alex Atallah) â· #announcements (3 messages):
-
Soliloquy 8B Shifts to Paid Model: Soliloquy 8Bâs usage is now paid, costing $0.1 per 1M tokens. This pricing update reflects OpenRouter LLCâs recent policy change.
-
Price Jump for Soliloquy 8B: The price for using Soliloquy 8B was revised again to $0.2 per 1M tokens. The new rate comes shortly after the initial pricing was introduced.
-
Routing Updates and Corrections:
anthropic/claude-instant-1
model routing was updated toclaude-instant-1.2
, and a routing error concerninganthropic/claude-2.0
was corrected with a restoration of service as it remains a valid model ID. -
Restoration of Claude v2.1 and Variants: The Anthropic: Claude v2.1 model and its
:beta
variant have been reinstated following the clarification on model availability during the recent confusion with older claude models.
Links mentioned:
- Anthropic: Claude v2 by anthropic | OpenRouter: Claude 2 delivers advancements in key capabilities for enterprisesâincluding an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a...
- Lynn: Llama 3 Soliloquy 8B by lynn | OpenRouter: Soliloquy-L3 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, ri...
- Lynn: Llama 3 Soliloquy 8B by lynn | OpenRouter: Soliloquy-L3 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, ri...
OpenRouter (Alex Atallah) â· #app-showcase (4 messages):
- Exploring Syrax: A member expresses interest in experimenting with Syrax and offers support, initiating a private conversation with a friend request for further collaboration.
- Friend Request Accepted: Another community member acknowledges the support offered and confirms the acceptance of the friend request, showing appreciation.
- Impressed by the Showcase: A single, short expression of admiration is directed toward the ongoing discussions or showcased projects, reflecting a positive impression.
OpenRouter (Alex Atallah) â· #general (311 messagesđ„đ„):
-
Claude Modelsâ Quirky Behavior Unraveled: Members discussed issues with Claude models returning incomplete outputs or HTTP 524 errors via OpenRouter. Clarifications led to discovering that Claude models have a max generation of 4k tokens and can read up to 200k tokens, while the right settings could improve API responses.
-
Lemmyle Dissects WLM-2 Hosting Economics: An intense breakdown of WLM-2 hosting costs was presented, surmising that a profit could be marginal depending on various factors like GPU utilization, electricity costs, and potential revenue from idle GPUs.
-
FireLLaVAâs Silent Entry into Multimodality: There were musings about the under-the-radar launch of FireLLaVA, an open multimodal model noted for its quick startup time, marking a notable addition to the OpenRouter ecosystem.
-
Deployment Dilemmas and Frugal Frontends: A member sought a simple frontend to host on shared hosting to allow family members to use their OpenRouter services without multiple OpenAI subscriptions. Suggestions ranged from using Vercel for its free tier to opting for more affordable VPS providers, such as Contabo.
-
Cohereâs Conundrum in OpenRouter Contexts: A member faced odd output discrepancies when using Cohere models through OpenRouter compared to direct API calls, with generated content unrelated to prompts. It was clarified that web connector support for Cohere is pending, and its addition to OpenRouter is anticipated but not yet available.
Links mentioned:
- Models overview: no description found
- Home | ChatGPT Web Share Docs: no description found
- OpenRouter: A router for LLMs and other AI models
- WizardLM-2 8x22B by microsoft | OpenRouter: WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing ...
- Meta: Llama 3 8B Instruct by meta-llama | OpenRouter: Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
- FireLLaVA 13B by fireworks | OpenRouter: The first commercially permissive OSS LLaVA model. This vision-language model was trained entirely on OSS LLM generated instruction following data.
- Clay - Scale personalized outbound: Combine 50+ data providers, real-time scraping, and AI to send 1-1 personalized campaigns that book more meetings.
- Managed Server: Dein eigener Server, zuhause in der Schweiz: no description found
- Llava 13B by haotian-liu | OpenRouter: LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking [GPT-4](/models/open...
OpenAccess AI Collective (axolotl) â· #general (169 messagesđ„đ„):
- Washingtonâs Wizards: Unchanged Repository: Despite rumors, the WizardLM models from Microsoft have not been removed by Microsoft; a member clarified that wizardlm was responsible for the changes. They also confirmed that the WizardLM repository remains publicly available.
- Fine-Tuning vs. RAG for Domain-Specific LLMs: New members inquired about fine-tuning for domain-specific language models, questioning the necessity versus using Retrieval-Augmented Generation (RAG). The conversation noted examples such as OpenBioLLM and referenced a medical-focused LLM paper for further reading.
- Configurations for Conversation Tokenization Issues: There was a thorough discussion on tokenization strategies for models like LLaMA-3, including the necessity to manually install the latest version of the fastchat formatter and referencing a relevant axolotl pull request for correct conversational formatting templates.
- Quantization and Model Degradation Debate: Members debated the effects of quantization strategies on LLMs, specifically comparing the 4bit lora and 4bit qlora methods. The consensus is that quantization sensitivity varies depending on training, with one member citing a Twitter thread discussing more significant degradation in more extensively trained models like LLaMA-3.
- Sample Packing Clarification for Preventing OOM: A member sought clarification on multipack sampling and its relation to out-of-memory (OOM) errors. It was explained that sampling does not affect the maximum sequence length allowed by the model and only packs multiple samples into the maximum sequence length without altering context size.
Links mentioned:
- MEDITRON-70B: Scaling Medical Pretraining for Large Language Models: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the...
- WizardLM - a microsoft Collection: no description found
- Tweet from Rohan Paul (@rohanpaul_ai): Quantization is quite harmful for LLaMA 3 than for LLaMA 2. This PR in llama cpp repo investigates it well. (Perplexity measures how well the model can predict the next token with lower values being...
- Efficient Continual Pre-training for Building Domain Specific Large Language Models: Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, LLMs tailored for a domain are trained from scratch to excel at handling domain-specific tasks. In th...
- Anima/air_llm at main · lyogavin/Anima: 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU - lyogavin/Anima
- GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat
- feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by 0-hero · Pull Request #1553 · OpenAccess-AI-Collective/axolotl: Description This builds on top of and includes the changes in the below PR's #1542 #1539 Fastchat PR from @TJ-Solergibert needs to be merged before merging this lm-sys/FastChat#3257 Motivatio...
- FastChat/fastchat/conversation.py at main · lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat
- FastChat/fastchat/conversation.py at main · lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat
OpenAccess AI Collective (axolotl) â· #axolotl-dev (37 messagesđ„):
-
Memory Requirements for Fast Fourier Transform: A discussion about significant memory requirements to run Fast Fourier Transform (FFT) with zero3 on 2x24GB graphics cards. A member suggested that 167GB of RAM might be necessary, lamenting the lack of sufficient memory.
-
Exploring VRAM Reduction via torchtune: One member advised trying torchtune, noting its focus on reducing VRAM usage. Another member debated the question of using FSDP (Fully Sharded Data Parallel) but reported that the training begins yet hangs without progressing or throwing errors.
-
Disc Usage Soars with Fast Fourier Transform: While attempting to train a model, the systemâs swap memory skyrocketed to 62GB, causing an out-of-memory error. The participant expressed surprise at the excessive disk and swap usage even when the job theoretically fit within a single 48GB card setup.
-
ZeroGPU Access for Experiments: One member highlighted that they have access to the Huggingface Zero project, prompting a discussion on potential tests. It aims to provide free GPU access for Huggingface Spaces and supports Spaces running on multiple GPUs simultaneously.
-
Log Sharing and Iteration Woes: A user linked their wandb.ai logs for those interested in the details of their Fast Fourier Transform trials, noting extremely long iteration times of 800 seconds compared to 17 seconds for a qlora iteration, highlighting performance issues.
Links mentioned:
- vsungwaterloo: Weights & Biases, developer tools for machine learning
- zero-gpu-explorers (ZeroGPU Explorers): no description found
OpenAccess AI Collective (axolotl) â· #general-help (23 messagesđ„):
-
Troubleshooting AttributeError: A user encountered an
AttributeError
related to'TextIteratorStreamer'
not having an attribute'empty'
. They questioned the functionâs validity given they are using the transformers version 4.40.0. -
Inquiry About Llama-Pro Method: There were multiple discussions regarding the usage of the llama-pro method highlighted by Jeremy Howard. Links to GitHub repositories were shared (fsdp_qlora), indicating a 4-bit quantized Llama-Pro fine-tuning method, with conversation pivoting around whether or not this method is accessible in axolotl and potentially requiring a pull request.
-
Integrating Custom Audio Recording in Twilio: A user explained their effort to integrate custom audio recording with Twilio and how to capture and store audio in real-time, while being able to provide a response to the recorded audio.
-
Combining QLORA Adapter Fine-Tuning: Users discussed the need to merge a qlora adapter fine-tuning model before conducting additional fine-tuning for a Q/A style, as well as the effects that subsequent fine-tunings might have on preserving model characteristics. Further conversation alluded to combining conversational and completion models into one fine-tune, with a reference to an example in a community showcase.
-
PEFT Model for Faster LLM Fine-Tuning: A brief mention was made of an unsloth peft model, supposed to fine-tune LLMs like Mistral significantly faster with less memory usage, although with additional optimizations, suggesting itâs loaded differently from Hugging Face models.
Links mentioned:
- GitHub - AnswerDotAI/fsdp_qlora: Training LLMs with QLoRA + FSDP: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- GitHub - AnswerDotAI/fsdp_qlora at 467933f713cc7808564cbfac3524e75aadd04987: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.
- GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
- GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
- fsdp_qlora/train.py at 467933f713cc7808564cbfac3524e75aadd04987 · AnswerDotAI/fsdp_qlora: Training LLMs with QLoRA + FSDP. Contribute to AnswerDotAI/fsdp_qlora development by creating an account on GitHub.
OpenAccess AI Collective (axolotl) â· #axolotl-help-bot (44 messagesđ„):
-
GPU Scaling and Batch Sizes Explained: A conversation detailed the intricacies of scaling up the number of GPUs from 4 to 8 and adjusting micro batch sizes. It clarified that while the total batch size may remain constant, factors like gradient accumulation, learning rate scaling, parallelism strategies, and communication overhead differ and influence the training dynamics and performance outcomes.
-
Query on Model Loading Across GPUs: The question was raised about whether models are loaded in full or split when using multiple GPUs. It was explained that models can be loaded either as a full size or sharded across GPUs, a technique facilitated by Fully Sharded Data Parallelism (FSDP) and optimizations like DeepSpeedâs ZeRO Stage 3, helping in efficient utilization of hardware resources.
-
LoRA vs. QLoRA â Adaptation Techniques Demystified: Discussion touched upon the differences between LoRA (Layer-wise Relevance Analysis) and QLoRA (Quantized Layer-wise Relevance Analysis), detailing how the latter extends LoRA by adding quantization to further reduce the computational cost and memory requirements during fine-tuning and deployment.
-
Dataset Trimming Strategy for Axolotl: The situation of trimming datasets in the Axolotl config was addressed by suggesting an approach that doesnât directly specify a percentage of the dataset but rather involves modifying the dataset loading logic to include a subsampling step, potentially using methods provided by
datasets
library functions.
Links mentioned:
- accelerate/docs/source/concept_guides/big_model_inference.md at main · huggingface/accelerate: đ A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed suppo.....
- peft/docs/source/accelerate/fsdp.md at main · huggingface/peft: đ€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- peft/docs/source/accelerate/deepspeed.md at main · huggingface/peft: đ€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess AI Collective (axolotl) â· #axolotl-phorm-bot (12 messagesđ„):
- LLaMa Prompt Support Inquiry: A member inquired if axolotl supports LLaMa 3 prompt format for ShareGPT. The response indicated thereâs no mention of specific âllama 3â model support within the OpenAccess-AI-Collective/axolotl documentation.
- Fine-Tuning a QLoRA Model: A member shared their success in creating a fine-tuned text completion model with qlora from Mistral-7B. They sought guidance on making the model conversational and were advised they could directly fine-tune using their QLoRA-adapted model on a Q/A dataset.
Links mentioned:
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
Modular (Mojo đ„) â· #general (2 messages):
- Modular Commits on the Rise: Since the stdlib was open-sourced, 23% of commits have been made to modularml/mojo. This indicates a surge in activity and contributions to the project.
Modular (Mojo đ„) â· #đŹïž±twitter (4 messages):
- Modular Tweets Link Sharing: Members in the đŹïž±twitter channel shared multiple tweets from Modular. Relevant tweets included updates or announcements, linked as follows: Tweet 1, Tweet 2, Tweet 3, and Tweet 4.
Modular (Mojo đ„) â· #âïž±blog (1 messages):
- Multimodal Search Boosted by MAX Engine: The recent blog post by Modular discusses the advantages of a multimodal search that combines textual and visual data. MAX Engine, which already outperformed PyTorch eager and ONNX runtime in previous benchmarks, is also capable of optimizing inference for multimodal models.
Link mentioned: Modular: Multimodal Search with Snowflake Embedding and MAX Engine: We are building a next-generation AI developer platform for the world. Check out our latest post: Multimodal Search with Snowflake Embedding and MAX Engine
Modular (Mojo đ„) â· #ai (2 messages):
- Troubleshooting Mojo Installation: A user reported an issue with installing Modular (Mojo đ„) on Python 3.12.3. The response suggested using a Conda virtual environment and provided instructional links, Modular manual on Python and Modular blog post, emphasizing that Mojo is a superset of Python and compatible with Python modules.
- Working on Mac M1: A different member noted that they are running the latest Mojo, including the nightly version, with Python 3.12.3 on a Mac M1 successfully. They recommend using Conda for an easier setup, pointing out that Mojoâs intent is to be compatible with Python code and existing Python packages.
Link mentioned: Python integration | Modular Docs: Using Python and Mojo together.
Modular (Mojo đ„) â· #đ„mojo (113 messagesđ„đ„):
-
Switch from Python to Mojo Issue: A user shared Python code and asked for assistance in converting it to Mojo. Another user provided a detailed Mojo conversion with explanations about function declarations and variable types in Mojo.
-
ModularBot Chimes In: ModularBot interjected, celebrating user
@110077104611172352
reaching level 5 and user@289473226147495936
reaching level 1. Congrats were later given to@932397073427476521
for reaching level 18, with a playful response from ModularBot about celebrating with a banquet. -
Matrix Slicing and Memory Ownership: A Mojo user inquired about creating a non-owning view of a listâs subset without extra allocation. It was clarified that for indirect memory access, one should use the
Buffer
type rather thanList
, since List owns its data and Buffer is under redesign for life time management. -
MoJo for Intel Mac Inquiry: When questioned about Mojo for Intel Mac, a user responded that thereâs hope for support soon but currently using the playground is the only option.
-
Troubleshooting a Matrix Implementation: A user having trouble with matrix division in Mojo due to the lack of an implemented
__truediv__
function was advised to review their code and ensure operations were only being performed on non-zero values. -
Discussion on Mojoâs Integration with Existing Libraries: The goal of Mojo language is discussed, emphasizing that Mojo aims to integrate into the Python ecosystem and utilize existing libraries, rather than replacing them entirely. Itâs noted that Mojoâs long-term direction includes seamless use of existing tools like Numpy.
-
Levels and Learning in Discord: Users discuss their progress through levels in the channel; one user advanced to level 18 after a year, while others question the ranking methodology given disparate expertise levels.
Links mentioned:
- Pokemon Pikachu GIF - Pokemon Pikachu Clap - Discover & Share GIFs: Click to view the GIF
- memory | Modular Docs: Defines functions for memory manipulations.
- Mojođ„ roadmap & sharp edges | Modular Docs: A summary of our Mojo plans, including upcoming features and things we need to fix.
- Why is the parameterized version of this function slower than the vanilla one? · modularml/mojo · Discussion #2270: Hi, I wrote some benchmarks to see how mojo performs in matmul, following as a guide this: https://docs.modular.com/mojo/notebooks/Matmul. However, I noticed that my version with parameters is slow...
- Matrix multiplication in Mojo | Modular Docs: Learn how to leverage Mojo's various functions to write a high-performance matmul.
- dynamic_vector.mojo/README.md at main · mikowals/dynamic_vector.mojo: An experimental drop-in replacement for Mojo stdlib DynamicVector that demonstrates new features using References - mikowals/dynamic_vector.mojo
- Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
- playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
- playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
- [Feature Request] Native Windows support · Issue #620 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? native support for windows. when will it be available?...
Modular (Mojo đ„) â· #community-projects (1 messages):
uncle_jee: Use Mojo to write a Mojo community https://github.com/shadowqcom/mojo_dev
Modular (Mojo đ„) â· #community-blogs-vids (5 messages):
-
Crafting Better Tutorials: rd4com highlighted tips for making tutorials, emphasizing the use of emojis for visual references, simplicity in language, clarity in naming, avoiding information overload, gradually increasing complexity, and iterating for refinement. They also stressed on linking to Mojo documentation and logically building upon previous content.
-
DiĂĄtaxis Framework for Documentation: sophiaglencairn shared a link to DiĂĄtaxis, a systematic approach to creating technical documentation, outlining four types of documentation needs: tutorials, how-to guides, technical reference, and explanation. DiĂĄtaxis addresses content, style, and architecture issues in documentation to benefit both users and creators.
Link mentioned: DiĂĄtaxis: no description found
Modular (Mojo đ„) â· #performance-and-benchmarks (55 messagesđ„đ„):
-
Exploring
__copyinit__
and GitHub Gists: A discussion revolved around__copyinit__
behavior and whether itâs a type authorâs responsibility to implement copy-on-write semantics. The conversation pointed to a specific Gist for context. -
Dictionary Performance Intricacies: Performance concerns regarding dictionaries in Mojo were discussed, citing significant speed differences between Mojo and Python. A member shared their experiences with porting a tokenizer and linked to a relevant discussion and a tokenization library for reference.
-
Compact-dict Library Offers Hope: Amidst conversations about dictionary performance, the Compact-dict library was put forward as a faster alternative to the standard Mojo dictionary, though it doesnât store keys and might require changes to use cases or additional features in the future.
-
Memory Allocation Queries: Members inquired about the differences in performance and functionality between
stack_allocate
and heap allocation methods likeDTypePointer.alloc
/Pointer.alloc
. There was an exchange on when to use stack or heap, and insights into their cost differences were shared, emphasizing that typically stack allocation is faster and less complex than heap allocation. -
Optimizing SIMD Operations for Error Correction Code: In search of achieving better performance for an error correction code library, a member sought advice on optimizing a function using
SIMD
. The conversation included discussions on function inlining, use offma
, and potential mathematics tricks for improvements. The specific project mentioned was mocodes.
Links mentioned:
- When is it best to use the stack instead of the heap and vice versa?: In C++, when is it best to use the stack? When is it best to use the heap?
- When is it best to use the stack instead of the heap and vice versa?: In C++, when is it best to use the stack? When is it best to use the heap?
- playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
- GitHub - alainrollejr/mocodes: Error Correction (De)Coding with Mojo: Error Correction (De)Coding with Mojo. Contribute to alainrollejr/mocodes development by creating an account on GitHub.
- GitHub - mzaks/compact-dict: A fast and compact Dict implementation in Mojo đ„: A fast and compact Dict implementation in Mojo đ„. Contribute to mzaks/compact-dict development by creating an account on GitHub.
- Why is Mojo's dictionary (or for loop) slower than Python's? · modularml/mojo · Discussion #1747: I used Mojo's (v. 0.7.0) dictionary data structure to calculate the frequency of words in a file with 230+ million words, and did the same with Python. Surprisingly, Python was 7x times faster tha...
- GitHub - karpathy/minbpe: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. - karpathy/minbpe
- [stdlib] Fix dict probing error by mzaks · Pull Request #2351 · modularml/mojo: Fixes #1729
- [Proposal] Improve the hash module by mzaks · Pull Request #2250 · modularml/mojo: This proposal is based on discussion started in #1744
Modular (Mojo đ„) â· #đengine (3 messages):
- Continuous MAX Optimization: The team is regularly optimizing MAX with each release. Knowing the specific core types and models used by individuals can provide further insights into performance enhancements.
- Clarifying Speed Improvements: A member pointed out a discrepancy in reported speed improvements between TensorFlow (tf) and PyTorch, suggesting they shouldnât be the same due to differences in queries per second (QPS).
- Correct Speedup Printouts Confirmed: Another member confirmed seeing the correct speedup numbers reflecting proportionate QPS improvements after updating the max example repository and clearing the .cache in the performance-showcase directory.
Modular (Mojo đ„) â· #nightly (85 messagesđ„đ„):
-
Frequent Updates for Nightly Branch Discussed: Automation challenges are delaying the goal of releasing the nightly branch every weekday, with concerns raised about the delay between code merges and commits appearing in the branch making it hard to fix conflicts. Thereâs ongoing discussion to find solutions, ensuring the nightly stdlib can build and run correctly with the released nightly compiler.
-
Nightly Mojo Compiler Release Notification: The announcement of a new nightly Mojo compiler highlighta the availability of updates and changes, with a detailed pull request and a changelog available for review.
-
Discussions on Overloads and Traits in Mojo: Debates surfaced regarding the behavioral consistency of overloads and the use of traits, touching on language features like parametric algorithms. The community is thinking through the trade-offs of different methods, like overloading, precedence decorators, and return type variations, while expressing concerns about the potential for confusion and bugs when modifying the behavior of objects via type information.
-
Code Execution Difference Between Stable and Nightly: A user reported an issue where code that works in the stable version of Mojo causes an error with a nightly build, suggesting a possible file handle lifetime management problem in the nightly version. This sparked a conversation leading to the opening of an issue on GitHub.
-
Importing Challenges in Mojoâs Standard Library: A user encountered difficulties importing functions from the
math
package into thestring.mojo
andstring_literal.mojo
files, which was explained as a design decision to avoid circular dependencies between open-source and closed-source parts of the stdlib. The workaround recommended is to re-implement the necessary math functions in the open-source portion of the standard library.
Links mentioned:
- Mojo Team Answers | Mojo Dojo: no description found
- [mojo-nightly] struct lifetime issue · Issue #2429 · modularml/mojo: Bug description In the following test demo. It seems the destructor is called on the filehandle instead of move. The demo runs without problems with stable but i get the following with nightly: fil...
- [stdlib] Update stdlib corresponding to 2024-04-26 nightly/mojo by patrickdoc · Pull Request #2418 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.4.2621 .
- mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
LlamaIndex â· #blog (6 messages):
- Workshop Materials for Building LLM Apps: Llama Index announced a workshop with AWS showcasing 3 patterns for LLM app development including using S3 for data ingestion and AWS Bedrock for embeddings.
- Llama Index on ML Security Podcast: The co-founder of Llama Index discussed LLM-based application futures and data security on the mlsecops podcast, also touching on tools like LlamaParse and LlamaCloud.
- RAG Tutorial Series for Production: Marco Bertelli launched a 9-part series focused on taking RAG from a prototype to a production environment, outlining necessary architectural components for deployment.
- Enhancing RAG with Multi-Stage Retrieval: An article by Michael R. from KX Systems suggests a multi-hop retrieval process using Llama Index and Cohere reranking to improve context and reduce hallucinations for LLMs, as detailed in their post.
- Long-Term Memory for Autonomous Agents: Introducing memary, a reference implementation for long-term memory using knowledge graphs, aimed at enhancing memory functions in autonomous agents using LLMs as explored in this tweet.
LlamaIndex â· #general (155 messagesđ„đ„):
- Trouble with awsbedrock and LlamaIndex: A member encountered an error when trying to use awsbedrock with LlamaIndex which prompted a âNoRegionErrorâ from botocore. Upon following suggestions to ensure
region_name
is specified, the issue was resolved. - Using Local LLM with LlamaIndex: Members shared links to LlamaIndexâs documentation and examples for setting up LLMs locally, particularly referencing a â5 lines of codeâ example using
BAAI/bge-small-en-v1.5
andMistral-7B
on LlamaIndexâs documentation. - LlamaIndex Import Issues Solved: Several members discussed troubleshooting import errors related to llama-index packages such as
llama-index-llms-ollama
. Solutions included installing specific packages individually and confirming correct installation steps. - Updating Indices and Documents on Vector Stores: Conversations focused on actions such as updating indices on Pinecone using LlamaIndex and adding metadata keys to existing vectors. A member suggested that updating a node with the same ID will overwrite it. However, no direct solution was provided for adding metadata without modifying vectors.
- Retrieving Documents with LlamaIndex: Members inquired about retrieving multiple documents via
query_engine.retrieve()
while ensuring diversity among the retrieved documents. Suggestions included adding metadata keys to existing vectors and setting parameters likemmr_diversity_bias
when creating the retriever.
Links mentioned:
- LlamaIndex: Official YouTube Channel for LlamaIndex - the data framework for your LLM applications
- Starter Tutorial (Local Models) - LlamaIndex: no description found
- answerbot/answerbot/replay_client.py at main · zby/answerbot: answering questions using LLMs, search (RAG) and other tools - example code - zby/answerbot
- Auto-Retrieval from a Vectara Index - LlamaIndex: no description found
- Query Engines + Pydantic Outputs - LlamaIndex: no description found
- GitHub - zby/LLMEasyTools: Tools for LLM agents.: Tools for LLM agents. Contribute to zby/LLMEasyTools development by creating an account on GitHub.
- Typesense Vector Store - LlamaIndex: no description found
- Retriever - LlamaIndex: no description found
- Llama Tonic : Transcribe by Josephrp · Pull Request #13137 · run-llama/llama_index: Description Adds Distill Whisper Tool for Quick and Precise Transcription , without ever leaving llama-index New Package? Did I fill in the tool.llamahub section in the pyproject.toml and provide...
- Agents - LlamaIndex: no description found
- Frequently Asked Questions (FAQ) - LlamaIndex: no description found
- Controllable Agents for RAG - LlamaIndex: no description found
- Metaphor - LlamaIndex: no description found
- Context - LlamaIndex: no description found
- Lower-Level Agent API - LlamaIndex: no description found
- Building an Agent around a Query Pipeline - LlamaIndex: no description found
- GitHub - run-llama/llamabot: Contribute to run-llama/llamabot development by creating an account on GitHub.
LlamaIndex â· #ai-discussion (2 messages):
- GPT-1: The Unsung Hero: A member revisited the original GPT-1 model, reflecting on its contribution to the evolution of language models, and has written a blog post on the subject. It posits that the model has âstood the test of time quite well over 6 years,â implying that some modern systems like Mistral-7B are vastly scaled up derivatives of GPT-1.
OpenInterpreter â· #general (127 messagesđ„đ„):
- Flask Server Frustration: A member encountered an error when trying to run a local Flask server, revealing a need to set the
api_key
and several further issues, including namespace conflicts and connection errors. They attempted to use a dummy key (interpreter.llm.api_key = "dummykey"
) and contemplated editing a pydantic config to overcome a namespace issue. - OpenInterpreter 0.2.5 New Release Inquiry: A member asked about the Open Interpreter 0.2.5 New Computer Update, leading to a clarification that it has moved beyond beta.
- Groq Challenges for OI Integration: Several members discussed difficulties when trying to run Open Interpreter with Groq, ultimately concluding that Groq support isnât currently integrated into OI. A Github pull request (#1238) for adding Groq support was mentioned, which is pending approval.
- Hardware Queries for O1 and Global Vision: Members conversed about the Open Interpreterâs remote communications and whether O1 can function with voice instruction in languages other than English. There were also discussions on installing O1 client on other devices, like the Rabbit r1, and leveraging the clientâs existing voice support.
- Collaborations and Contributions Ramp Up: Members shared progress and calls for assistance on various projects intertwined with OpenInterpreter, such as llm-switcher, an open-source AI tools suite including AAA+ and MagicLLight, and potential Groq API implementations. Community code sharing occurred, with ongoing efforts to troubleshoot and improve support for different models and functionalities.
Links mentioned:
- Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
- Ya Filthy Animals GIF - Ya Filthy Animals - Discover & Share GIFs: Click to view the GIF
- Exclusive: Inside the Rise of Jesse Lyu and the Rabbit R1: Rabbitâs founder and CEO, Jesse Lyu, tells all about the origins of the R1, how he worked with Teenage Engineering to design it in "10 minutes," and what he thinks about the AI gadget compet...
- TikTok - Make Your Day: no description found
- Hidden Markov and semi-Markov models: When and why are these models useful for classifying states in time series data?: Hidden Markov models (HMMs) and their extensions have proven to be powerful tools for classification of observations that stem from systems with temporal dependence as they take into account that obse...
- MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux): OS World gives agents the ability to fully control computers, including MacOS, Windows, and Linux. By giving agents a language to describe actions in a compu...
- Added Groq Support by fire17 · Pull Request #1238 · OpenInterpreter/open-interpreter: Describe the changes you have made: Groq's official python api now fits well into oi flow, no errors. Though final answers are halucinated rather than actual output. Seems to plan, write code, but...
- GitHub - stableagents/llmswitcher: Routes to the most performant and cost efficient LLM based on your prompt [ đ§ WIP ]: Routes to the most performant and cost efficient LLM based on your prompt [ đ§ WIP ] - stableagents/llmswitcher
- Google Colaboratory: no description found
- GitHub - sgl-project/sglang: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.: SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. - sgl-project/sglang
- C:\WINDOWS\system32>pip install pywin32Requirement already satisfied: pywin32 - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
OpenInterpreter â· #O1 (25 messagesđ„):
- Custom 3D Project Housed in Mystery: Members are intrigued by a custom 3D printed case for OpenInterpreterâs 01 project, prompting discussions around personal attempts and the fun of tactile keys. One member provided a YouTube video showcasing the project but noted it wasnât their own work.
- The Dawn of 01 Heavy: Chat includes anticipations of a new device, 01 Heavy; no expected launch date is provided. Comparisons draw links to it potentially powering future robots.
- Amazon Alternatives Seek Acceptance: Queries rise about using Amazon Echo Smart Speaker Dev Kit as an alternate solution for open project builds, but no confirmation is shared regarding compatibility.
- Open AI Ethics in Question with Microsoftâs Capabilities: A discussion emerges highlighting Microsoftâs ability to create and modify files, with OpenInterpreter touted as capable of meeting diverse user desires.
- Update Expectations Set for 01 Light: A member mentions an upcoming discussion this Tuesday to reveal an updated timeline for the 01 Lightâs ETA.
Links mentioned:
- Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
- Tweet from killian (@hellokillian): @timshi_ai @Human_B_ee @OpenInterpreter @Grimezsz custom made for @grimezsz, created by @fieroty! internally it's a super easy build, just two amazon products: macro keypad: https://shorturl.at/q...
- Tweet from Bee đ (@bee_human_): my new audio engineer is @openinterpreter's 01
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments: no description found
- no title found: no description found
Latent Space â· #ai-general-chat (100 messagesđ„đ„):
- Berkeley Introduces Tool Calling Leaderboard: The Berkeley Function Calling Leaderboard evaluates LLMsâ ability to call functions, offering a novel and periodically updated real-world benchmarking system.
- Voice AI On the Rise: ElevenLabs has sparked interest, leading to discussions about other Voice AI startups like Unreal Speech and Hume, a space once occupied by now-defunct Coqui.
- Exploring the Limitations of LLMs: An article on Strangeloopcanon contemplates the perennially surprising capabilities of LLMs while discussing their current failure modes and the concept of âgoal driftâ as possible directions for improvement.
- Potential Acquisition Moves in the AI Sector: Nvidiaâs reported acquisitions of Israeli AI companies, Deci AI and Run:ai, indicate a strategic move to enhance efficiency and performance on their GPUs and AI servers.
- Adventures in Large Context Models: Conversations about practical applications and the future of large context models were spurred by Llama 3âs extension to a 1M token context window.
Links mentioned:
- Tweet from Mark Huang (@markatgradient): 1M context length Llama-3 8B Model. Enough said. Up on HF @ClementDelangue cc: @winglian @mattshumer_ âïž Quoting Gradient (@Gradient_AI_) We've been in the kitchen cooking đ„ Excited to ...
- Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard) : no description found
- What can LLMs never do? : On goal drift and lower reliability. Or, why can't LLMs play Conway's Game Of Life?
- Tweet from mephistoooOOHHHHHHSHI- (@karan4d): Ok itâs definitely using GPT-4 tokenizer so Iâm betting it is 4.5 as well. Always fingerprint w anomalous tokens
- WebSim, WorldSim, and The Summer of Simulative AI â with Joscha Bach of Liquid AI, Karan Malhotra of Nous Research, Rob Haisfield of WebSim.ai: Three perspectives on the most viral fringe of generative AI this year: Simulative AI!
- Tweet from Siqi Chen (@blader): i think @websim_ai is one of the first truly ai native products, and will be as impactful as chatgpt. instead of a chatbox, websim allows you to explore the latent space of an LLM via URLs and hyperl...
- Unreal Speech: Text-to-Speech API for Scale: Slash Text-to-Speech Costs by up to 90%. Up to 10x cheaper than Eleven Labs and Play.ht. Up to 2x cheaper than Amazon, Microsoft, and Google.
- AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback: The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks. We present AMOR, an agent framework based on open-source LLMs,...
- GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.: Longterm Memory for Autonomous Agents. . Contribute to kingjulio8238/memary development by creating an account on GitHub.
- Reddit - Dive into anything: no description found
- Nvidia to purchase Israeli deep studying co Deci AI - report - Dannywrites: US chip large Nvidia Corp. has struck a deal to amass Israeli deep studying developer Deci AI, "The Data" studies, in keeping with an individual concerned
- What Characterises an Effective Mindset Intervention in Enhancing Studentsâ Learning? A Systematic Literature Review: In recent years, increasing attention has been paid to interventions designed to enhance individualsâ sustainable development in learning by priming a growth mindset. The current study systemati...
- Motivation and Behavioral Regulation of Physical Activity... : Medicine & Science in Sports & Exercise: stent with theory, hypothesized relations among variables were supported. Integrated regulation and intrinsic motivation were most strongly correlated with moderate-to-vigorous physical activity measu...
Latent Space â· #ai-announcements (1 messages):
swyxio: new pod! https://x.com/swyx/status/1784253651844014237
Latent Space â· #llm-paper-club-west (12 messagesđ„):
- All Systems Go: The chat confirms visibility before starting the presentation on Mixture Of Depths.
- Mixture Of Depths Explored: This paper introduces a new transformer layer, Expert Choice Routing, aimed at faster training convergence and improvements for processing longer sequences. See the original paper here.
- Skip the Confusion: Comments indicate that skip connections, also known as residual connections, mentioned in the attention mechanism are integral to the discussed paperâs methodology.
- Size Matters: A shared abstract suggests larger zero-shot LLMs outperform fine-tuned smaller LLMs in real-world tasks like meeting summarization, despite the computational costs.
Links mentioned:
- Nextra: the next docs builder: Nextra: the next docs builder
- Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?: Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real...
Latent Space â· #ai-in-action-club (35 messagesđ„):
-
Linux Users, Say Hello to Vesktop: Discord video sharing and Linux compatibility issues were addressed with a recommendation to use Vesktop, described as a better-performing custom Discord app that improves Linux support. Those interested can find more info on the Vesktop GitHub repository.
-
Young SQL Module in the Spotlight: A member shared a reference to
sqlite-vss
, a SQL module for creating virtual tables to store and query vectors, noting itâs still in early development stages and pointing to the API reference documentation. -
Chatbots for CLI Tools Spark Interest: The idea of creating chat bots for popular command line interface (CLI) tools was suggested, triggering discussions about feasibility and potential ease of creation using slonoâs tool, a utility that adds to the portability of Go and SQLite.
-
Resource Sharing for AI Enthusiasts: Two informative links were shared by members; the first, a Google Doc containing AI-related topics, dates, facilitators, and a wealth of resources such as articles and conference talks. The second, a Berkeley Gorilla Blog post discussing the challenges and potential strategies for real-world execution of actions by Large Language Models.
-
Hunt for AI Hackathon Sign-Up Details: Engagement was expressed regarding sign-up for a hackathon, with one member highlighting the X-ware Arena link amidst the conversation.
Links mentioned:
- Gorilla Execution Engine: no description found
- Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
- API Reference | sqlite-vss: no description found
- GitHub - Vencord/Vesktop: Vesktop is a custom Discord App aiming to give you better performance and improve linux support: Vesktop is a custom Discord App aiming to give you better performance and improve linux support - Vencord/Vesktop
- AI In Action: Weekly Jam Sessions: 2024 Topic,Date,Facilitator,Resources,@dropdown,@ UI/UX patterns for GenAI,1/26/2024,nuvic,<a href="https://maggieappleton.com/squish-structure">https://maggieappleton.com/squish-stru...
LAION â· #general (95 messagesđ„đ„):
-
LAION in Limbo: A member highlighted that EU laws appear to be restricting LAIONâs access to public clusters for compute time, causing a decline in activity. Researchers are gravitating towards more active groups that are continually running experiments.
-
Terminus Research Group Attracts Talent: A chat participant introduced their own group, the Terminus Research Group, which is an informal collective now including the âpixart guy,â suggesting a growing diverse expertise.
-
LAION-Aesthetics Seeks to Score Visual Beauty: A blog post was mentioned detailing LAION-Aesthetics, which is designed to rate image aesthetics using machine learning. The model and related code are available publicly on GitHub.
-
Unusual Benchmark Results Spark Discussion: Members discussed a Reddit benchmark test denoting contradictory performance outcomes for different quantizations in language models, raising questions about testing methodologies and LLM non-deterministic nature.
-
Comparing LLM Token Generation Rates: Users discussed token generation rates on high-performance GPUs, noting significant differences across models and setups. Some tools and configurations, such as exllama and TabbyAPI, were recommended for better performance.
Links mentioned:
- LAION-Aesthetics | LAION: <p>We present LAION-Aesthetics, several collections of subsets from LAION 5B with high visual quality.</p> <p><img src="https://raw.githubusercontent.com/LAI...
- aMUSEd: An Open MUSE Reproduction: We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation...
- gpt2-chatbot: Background https://chat.lmsys.org enables users to chat with various LLMs and rate their output, without needing to log in. One of the models recently available is gpt2-chatbot, which demonstrates cap...
- 711,700 Titles From Japan's Biggest Light Novel Publishing Site Get Scraped by AI Developer: 711,700 titles from Japan's biggest novel publishing site, Shosetsuka ni Narou, have been scraped by an AI developer, sparking controversy online.
- GitHub - borisdayma/dalle-mini: DALL·E Mini - Generate images from a text prompt: DALL·E Mini - Generate images from a text prompt. Contribute to borisdayma/dalle-mini development by creating an account on GitHub.
- GitHub - LAION-AI/aesthetic-predictor: A linear estimator on top of clip to predict the aesthetic quality of pictures: A linear estimator on top of clip to predict the aesthetic quality of pictures - LAION-AI/aesthetic-predictor
- I created a new benchmark to specifically test for reduction in quality due to quantization and fine-tuning. Interesting results that show full-precision is much better than Q8.: Posted in r/LocalLLaMA by u/jd_3d âą 259 points and 103 comments
- Oh no: Is it down again?
LAION â· #research (9 messagesđ„):
- Exploring VAST: The Omni-Modality Foundation Model: Interest is shown in finetuning VAST, a vision-audio-subtitle-text omni-modality foundation model and dataset, prompting members to share their experiences and seek advice.
- Hot off the Press: New Research Publication: A new paper on AI research authored by a team including Mostafa Elhoushi, Akshat Shrivastava, and others has caught the attention of members, speculating it builds upon previous work and highlighting its implications for faster inference and layer utilization.
- Combining Graphs with Language Models: Queries about combining graphs with large language models (LLMs) have been raised, seeking recommendations on relevant papers to read and strategies for conditioning LLMs with graphs.
- Mistral Model Fine-Tuning Challenges: A member is fine-tuning Mistral models for medical information extraction but encounters issues with the model over-generating sequences. The discussion touched on padding strategies and the appropriateness of the Eleuther server for seeking expertise in this area.
- Seeking the Eleuther Server Link: Upon facing a challenge with model fine-tuning, a member was advised to consult the Eleuther server for expert help in LLMs, leading to a request for the serverâs Discord link.
Links mentioned:
- Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher ...
- GitHub - TXH-mercury/VAST: Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset: Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset - TXH-mercury/VAST
Cohere â· #general (96 messagesđ„đ„):
-
Search Engine Query Capabilities Discussed: Members discussed the best practices for using web search tools with AI, mentioning various options such as Tavily and Brave Search API. Some highlighted the cost-effectiveness of these tools Tavily API Information and Brave Search API, while others shared specific configurations and technical details regarding usage limitations and potential workarounds for rate limits.
-
Technical Issues and Deployment Queries: Various technical issues were addressed like facing errors when running the cohere-toolkit locally due to sqlite3 version issues, difficulties in understanding how to interact with different components after deployment on Azure, and sharing GitHub resources for troubleshooting and adding custom tools GitHub - cohere-ai/cohere-toolkit.
-
Cohere Toolkit Enthusiastically Received: A user expressed great appreciation for Cohere making their toolkit open source, highlighting its immense help to developers GitHub - cohere-ai/cohere-toolkit.
-
Clarifications Sought on Fine-Tuning and Use Cases: Queries were raised about the specific models used when fine-tuning, the limits and terms of the free trial API key, and whether models like âGenerateâ would remain available.
-
Using AI for Non-English Languages and Commercial Use: One member praised Command-r for its performance with non-English languages and sought clarification on deploying command-r APIs for commercial use; responses suggested contacting Cohereâs sales team or using AWS Sagemaker for deployment.
Links mentioned:
- Tavily: no description found
- đ Troubleshooting | Chroma: This page is a list of common gotchas or issues and how to fix them.
- C4AI Command R Plus - a Hugging Face Space by CohereForAI: no description found
- Multi-step Tool Use (Agents): no description found
- cohere-toolkit/src/backend/tools/retrieval/tavily.py at main · cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
- GitHub - cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
- GitHub - cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit
- GitHub - searxng/searxng: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.: SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled. - searxng/searxng
- My First Album: Shared
Cohere â· #collab-opps (1 messages):
westn89: Weâre a Swedish company that are partially using cohere
tinygrad (George Hotz) â· #general (35 messagesđ„):
-
Exploring Mathematical Formula Construction: A member discussed constructing any mathematical formula using basic primitive ops and applying differentiation for gradient/backward passes, forming a dependency graph. This method optimizes hardware utilization and enables just-in-time scheduling for streaming, quick computations.
-
OpenELM Inquiry, a brief mention: One member inquired about the experience with OpenELM, but no follow-up discussion ensued.
-
Cross-Compatibility Between Frameworks: A user shared their use-case for
nn.module
, explaining it was useful for a hybrid model containing both tinygrad and PyTorch components. The module can automatically collect parameters from itself and child objects for training. -
Clarifying Speech-To-Text/Text-To-Speech Inquiry: A user asked about the speech-to-text and text-to-speech engines showcased by George Hotz, likely found in the tinygrad examples, though which specific demonstration was not identified.
-
Discussion About tinygrad Optimizations: Users engaged in a debate over the optimization capabilities of tinygrad, where one member questioned whether it could generate a fast matrix multiplication (matmul) kernel, while another pointed out the use of computational reduction algorithms for convolutions. George Hotz clarified their aspirations for tinygrad, focusing on overall model training speed rather than single-operation optimization like matmul.
Link mentioned: GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž: You like pytorch? You like micrograd? You love tinygrad! â€ïž - GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž
tinygrad (George Hotz) â· #learn-tinygrad (55 messagesđ„đ„):
-
Exploring the Optimization Frontier: A member shared a comprehensive writeup on loop unrolling within the context of tinygradâs optimizer. The article details the transformation of simple loops into optimized operations, providing insights into the Uops IR.
-
Tinygrad 0.9 Launch Teased: George Hotz briefly mentioned that new updates will come with the release of tinygrad version 0.9, causing anticipation about potential new features or improvements in the library.
-
Kernel Optimization Dissected: Sharing another detailed writeup to elaborate on how the shapetracker and symbolic library function with loop unrolling/upcasting; moreover, providing a guide to interpret kernel output colors in tinygrad.
-
Tinygrad Learnerâs Guide: Several members proposed starting points and suggested reading material for understanding and contributing to tinygrad; resources mentioned include MicroGrad and MiniTorch for foundational concepts, and also outlined an optimal path for reading through the tinygrad codebase.
-
Dynamic Testing and Symbolic Shapes: Discussion highlighted the ongoing development efforts toward dynamic testing and implementing kernels that can handle variable shapes without recompilation, focusing on the usage of symbolic shapes in operations like mean and sum.
Links mentioned:
- Quickstart - tinygrad docs: no description found
- tinygrad-notes/upcast.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
- tinygrad-notes/upcast2.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
- tinygrad-notes/colors.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
- Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž - Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad
- GitHub - unknownusername504/MicroGrad: Contribute to unknownusername504/MicroGrad development by creating an account on GitHub.
- MiniTorch: no description found
- GitHub - srush/Tensor-Puzzles: Solve puzzles. Improve your pytorch.: Solve puzzles. Improve your pytorch. Contribute to srush/Tensor-Puzzles development by creating an account on GitHub.
Interconnects (Nathan Lambert) â· #ideas-and-feedback (10 messagesđ„):
-
Brand Impact of Newsletter Cross-Promotion Considered: A member pondered the potential brand tarnishing of engaging in an unpaid promotion exchange with Semafor. This was seen as a growth opportunity, despite concerns that readers might find plugs annoying.
-
Bigger Audience, Bigger Growth?: The same member noted that Semaforâs tech newsletter audience is significantly larger, hinting at a substantial growth opportunity.
-
Comparing Content to Recognized Examples: To illustrate the type of content involved, an example of a Semafor newsletter was shared, discussing the divisive topic of synthetic data in AI.
-
Newsletter Exchanges â A One-Way Street?: Another member chimed in, questioning the importance of cross-promotion in newsletters given their nature as a âone-way mediumâ sent âinto the void.â
-
Balancing Promotion with Reader Preferences: It was highlighted that thereâs a risk of alienating readers who prefer pure content without promotions, suggesting that the success of such a strategy depends on execution and frequency. Another member weighed in, saying that even a small uptake from the promotion could be beneficial and lead to further growth.
Link mentioned: Semafor Tech: New synthetic data techniques shake up AI models | Semafor | Semafor: In todayâs edition, we look at how machine-learning generated data can help make smaller AI models nearly as capable as larger ones.
Interconnects (Nathan Lambert) â· #news (10 messagesđ„):
-
Microsoft Unleashes Phi-3: Phi-3, the next generation model from Microsoft, has been publicly released, amassing over 6,000 votes and featuring promising capabilities. In related news, Arena hits 800K votes, and Snowflake Arc Instruct has entered the fray.
-
A Gloomy Outlook for Dylan: A brief remark hints at unfortunate prospects for an individual named Dylan, with the context or cause left unstated.
-
Llamaâs Fine Tuning Applauded: The fine tuning process for âllamasâ received a positive shout-out, indicating noteworthy results or improvements.
-
Anticipation for GPT-4: A message hints at the possibility of GPT-4âs emergence, backed by a sense of confidence from the mentioned user.
-
Insights on Training an Open LM: A YouTube seminar led by Hanna Hajishirzi from AI2, discussing the training of an Open Language Model (OLMo), left at least one member wishing for a deeper understanding, while acknowledging the value of such shared resources. Hannaâs brisk presentation pace was noted, bolstering her repute for efficiency.
Links mentioned:
- Tweet from lmsys.org (@lmsysorg): Congrats @Microsoft for the open release of Phi-3, their next generation of fast and capable model! We've collected 6K+ votes for Phi-3 and pushed a new leaderboard release. The model is definite...
- Hanna Hajishirzi (AI2) - OLMo: Findings of Training an Open LM: Talk from the Open-Source Generative AI Workshop at Cornell Tech. Speaker: https://homes.cs.washington.edu/~hannaneh/Slides - https://drive.google.com/file/d...
Interconnects (Nathan Lambert) â· #ml-questions (13 messagesđ„):
- Misconceptions Cleared About RLHF: RLHFâs stability and usefulness depends on the application; methods like KTO may be better suited for various tasks. â[RLHF] Depends on the application. KTO is probably the most well suited to many applied tasksâ, the sentiment reflected that â[Itâs] pretty nuanced yeahâ.
- DPO and KTO Show Promise in Fine-Tuning: A transition from SFT -> DPO -> KTO showed better user feedback in fine-tuning applications, with online iterations of DPO and KTO âcomingâ.
- LLaMA 2 Follow-Up Creates Buzz: With a plethora of information available post-LLaMA 2 release, a blog post provides corrections and continued analysis, talking about controversial aspects and introducing technical notes like Ghost Attention.
- Ghost Attention - Useful but Not Critical: Ghost Attention seems to have been initially promising for maintaining consistency in long conversations for LLaMA 2, but later comments suggest it may no longer be as important, possibly due to improvements in data and long context handling. â[GAtt] is not an important thing to implement. Itâs a great exercise for learning new topics in the space.â
Link mentioned: Llama 2 follow-up: too much RLHF, GPU sizing, technical details: The community reaction to Llama 2 and all of the things that I didnât get to in the first issue.
Interconnects (Nathan Lambert) â· #random (48 messagesđ„):
- OpenELM Surpasses OLMo: Discussion highlighted that OpenELM has outperformed OLMo, with comments acknowledging that OLMo 1b had limited success and is no longer a particularly strong model, and that there is now better public data available for training than what was used for OLMo.
- Continuous Improvement Motivates AI Development: Members of the chat acknowledged that while their models have not been top-tier, it serves as motivation to improve. Thereâs consensus that better models are being trained, using the shortfall as an educational tool for safety and policy.
- The Educational Role of Open Models: Participants pointed out the importance of open models in facilitating informed decision-making, with a consensus that while their models might not be the best, they are crucial for education and transparency in the AI community.
- AI2âs Role in AI Advancements Recognized: The efforts of AI2 were acknowledged, especially in terms of education, and there was an expression of enthusiasm for the upcoming paper and developments, as well as a discussion on the financial aspects of AI research.
- Intrigue in the Scaling & Function of Alternative Models: Conversation turned to various topics, including Snowflake, a new enterprise-focused model with high VRAM useful for inference, and the concept of active parameters as a proxy for model capability, indicating the interest in exploring alternative architectures beyond just size and benchmarks.
Link mentioned: Tweet from Itamar Golan đ€ (@ItakGol): Visual Prompt Injection đđ IRL
Interconnects (Nathan Lambert) â· #memes (7 messages):
- Quick Laugh, Light Content: One member posted a simple âlmaoâ, indicating amusement or laughter regarding the channelâs conversation or content.
- Personal Reflection on Posting: The same individual later suggested the need for an editor, hinting at self-reflection on their message quality or content.
- Jungle Adventures Shared: They shared a YouTube video titled âIâm leaving to the Amazon jungleâŠâ, which details an excursion into rarely explored areas of the rainforest.
- Contrasting Views of the Jungle: Another member responded with a video link showcasing a differing view on the nature of the jungle, quoting Werner Herzogâs perspective from the documentary Burden of Dreams: âNature here is vile and base⊠There is no harmony in the universeâ.
- Twitter Meme on LLM Quirks: The channel featured a tweet from Marques Brownlee, highlighting the humorous aspects of large language models (LLM) in a post deemed âthe most meme llm shit everâ.
Links mentioned:
- Tweet from darren (@darrenangle): PPO DPO KTO CPO IPO ORPO
- I'm leaving to the Amazon jungle...: I'm leaving now to go deep into the Amazon jungle with my friend Paul Rosolie, deep to parts of the rainforest that very few humans have ever seen.The purpos...
- Werner Herzog on the Vileness of the Amazon Jungle: From "Burden of Dreams", a documentary about the making of Herzog's "Fitzcarraldo" -- both released in 1982.00:00 Introduction00:28 Monologue01:29 Rainforest...
Interconnects (Nathan Lambert) â· #posts (1 messages):
- Conversations on AGIâs Nature: A member complimented another on a thoughtful post about AGI, agreeing with the idea that AGIâs definition is subjective. The conversation suggests that the debate around AGIâs nature is an ongoing one.
LangChain AI â· #general (51 messagesđ„):
- Inquiry on Prompt Integration into Code: A member sought assistance with integrating a prompt into their existing code for a chat model. Another community member provided a detailed guide on incorporating ChatPromptTemplate and pipe method for chaining prompts and models in JavaScript.
- Navigating OllamaFunctions Difficulties: There was a discussion around an issue with OllamaFunctions not working properly, linked to GitHub issue #20924. Subsequently, a member clarified the confusion between Gemini and VertexAI models, informing that Gemini 1.5 Pro works only with VertexAI, evidenced by successful implementation using
ChatVertexAI(model="gemini-1.5-pro-preview-0409")
. - Building a Retrieval-Augmented Generation (RAG) System: A member requested recommendations for open-source models, embedding techniques, and vector storage solutions to develop an advanced RAG system, though no direct responses to this specific inquiry were provided in the message history.
- Concerns Over Observability Tools for LLMs: A discussion on LLM observability tools questioned the choice between Arize Phoenix and Langfuze, specifically for those primarily using LlamaIndex. A preference was indicated for a self-hosted open-source solution, but no direct recommendations were provided.
- Integration and Deployment Queries around LLMs: Various inquiries surfaced regarding deployment methods, such as using Hugging Face versus OpenAI API, and connecting OpenAI with SQL Server without the intermediary of LangChain for security concerns. There was also a direct request for advice on building AI clones of influencers on a new platform and an invitation to DM for potential partnership.
Links mentioned:
- LangSmith: no description found
- Reddit - Dive into anything: no description found
- ChatVertexAI | đŠïžđ LangChain: Note: This is separate from the Google PaLM integration. Google has
- Google AI chat models | đŠïžđ LangChain: Access Google AIâs gemini and gemini-vision models, as well as other
- OllamaFunctions does not work - Received unsupported message type for Ollama · Issue #20924 · langchain-ai/langchain: Checked other resources I added a very descriptive title to this issue. I searched the LangChain documentation with the integrated search. I used the GitHub search to find a similar question and di...
- [experimental][llms][OllamaFunctions] Add bind_tools and with_structured_output functions to OllamaFunctions by lalanikarim · Pull Request #20881 · langchain-ai/langchain: Implemented bind_tools for OllamaFunctions. Made OllamaFunctions sub class of ChatOllama. Implemented with_structured_output for OllamaFunctions. integration unit test has been updated. notebook ha...
- Tool | LangChain.js - v0.1.36: no description found
- DynamicTool | LangChain.js - v0.1.36: no description found
- StructuredTool | LangChain.js - v0.1.36: no description found
- DynamicStructuredTool | LangChain.js - v0.1.36: no description found
- ToolInterface | LangChain.js - v0.1.36: no description found
LangChain AI â· #langserve (1 messages):
- AzureSearchVectorStoreRetriever Async Issue: A member reported an error about AzureSearchVectorStoreRetriever not supporting async operations. They inquired if itâs possible to either adjust lang-serve to handle sync operations or if writing an async wrapper around the sync function in the retriever would be a viable solution.
LangChain AI â· #share-your-work (11 messagesđ„):
-
Galaxy AI Enters the Arena: GalaxyAI is offering free API access to premium AI models such as GPT-4, GPT-3.5-turbo, and more, with OpenAI format compatibility for easy integration into projects. Discover more on their website galaxyapi.onrender.com.
-
Launching Genai-Job-Agents: A GitHub repository for a Langchain/Langgraph-based agent that assists with job searching and CV building has been shared. For details, check out the repository at genai-job-agents.
-
Discover the Sparks of GPT-1: A new blog post delves into the original GPT-1 model, discussing its relevance and the technical evolution to current models. Read the insights here.
-
Implementing LangChain with Live Avatars: A YouTube demo showcases LangChainâs application in an Airbnb use case with 150 QA pairs and a live avatar Q&A session. View the demo at D-ID Airbnb.
-
Automating Code Improvements Via No-Code Platform: Autonoma is providing a no-code solution for automating code improvement tasks like input validation and error handling, complete with a free playground for testing and ALPHA GitHub integration. Experience the platform at Autonoma Free Demo.
Links mentioned:
- no title found: no description found
- Galaxy AI - Swagger UI: no description found
- Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
- GitGud: no description found
- Llama 3 8B: Mobile RAG on Android Phone with Live Avatar with the CODE. Let's do the entire Stack!: Part 1: The Demo. Code is in the link and we will go through it all on a series of videos. Let's push ourselves beyond AI notebooks and move on to real c...
- GitHub - touhi99/genai-job-agents: A LLM Agent with Langchain/Langgraph helps to analyze CV, look relevant jobs via API, and write a cover letter according to it: A LLM Agent with Langchain/Langgraph helps to analyze CV, look relevant jobs via API, and write a cover letter according to it - touhi99/genai-job-agents
- D-ID Airbnb Use Case: A RAG Agent Demo using Ollama and Langchain with code on Github: A demo to help illustrate practical use cases for live avatar assistants for business... I will do a video for the detailed code review so you can try it... ...
LangChain AI â· #tutorials (4 messages):
-
Explore Local RAG with LLaMA3: A YouTube tutorial titled âLocal RAG agent with LLaMA3 and Langchainâ demonstrates how to use Retrieval-Augmented Generation (RAG) with LLaMA3, using the Langchain framework.
-
Llama 3 Empowers Web Browsing: Another YouTube guide titled âLlama 3 Web Browsing Agent with Langchain and Groqâ showcases the implementation of web browsing capabilities through Llama 3, in combination with Langchain and Groq technologies.
-
Interactive Agents UI Building Tutorial: Marc Skov Madsen provides a video on creating an interactive web UI for CrewAI applications using the Panel framework, demonstrating the process of building a visual user interface for AI agents.
-
Captcha Blockade on Amazon Book Link: A member posted an Amazon link to a book titled âMastering NLP: From Foundations to LLMsâ but was met with a captcha challenge, preventing direct access to the page content.
Links mentioned:
- no title found: no description found
- How to Create an Interactive Web UI for CrewAI Applications By Panel: In this video, I would like to provide you a quick tutorial for building a visualized CrewAI application by using the Panel framework, which includes the fe...
- Local RAG agent with LLaMA3 and Langchain: We will take a look at how to do RAG with LLama3 https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb#pyth...
- Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...
Mozilla AI â· #llamafile (54 messagesđ„):
-
Segmentation Fault When Running Llamafile: Users reported experiencing a segmentation fault when attempting to run
llamafile
on various platforms, such as Modal Labs. There were mentions of specific files generating errors or not being found, includingPhi-3-mini-128k-instruct.F16.llamafile
. -
htop Bug Misrepresents Memory Usage: A member provided information about a bug in htop, which does not report shared memory usage correctly on Linux, likely influencing how memory usage is perceived by users during model operations.
-
Release of Llamafile v0.8.1: Announcement that the release of llamafile v0.8.1 now includes support for Phi-3 Mini 4k, addresses previous GPU module crashes, and adds bundled NVIDIA + AMD shared objects for Ubuntu users. Users are encouraged to report if the changes work or if issues persist.
-
LLM Behavior and Output Oddities Discussed: Members discussed unexpected behavior with LLMs, including changes in output consistency and unusual responses featuring parentheses and linebreaks. These issues appeared across different iterations of models like Llama3 70B and Mistral when running via
llamafile
. -
Llamafile Tips and GPU Usage Questions: Users shared tips for ensuring
llamafile
can take full advantage of system RAM and queried about supported GPUs for running llamafiles. There were also questions related to determining whether a model is running on GPU or CPU and clarifications sought for handling endless output fromllamafile
.
Links mentioned:
- Release llamafile v0.8.1 · Mozilla-Ocho/llamafile: Support for Phi-3 Mini 4k has been introduced A bug causing GPU module crashes on some systems has been resolved Support for Command-R Plus has now been vetted with proper 64-bit indexing We now su...
- jartine/Meta-Llama-3-70B-Instruct-llamafile · Hugging Face: no description found
- TikTok - Make Your Day: no description found
- Error: "The server was not compiled for multimodal or the model projector can't be loaded." · Issue #144 · Mozilla-Ocho/llamafile: I noticed the message mentioned in the title in a browser alert popup. It's likely not an error, but it's also a little jarring for first-time users, so I thought I'd mentioned it. WHAT HA...
- htop doesn't report shared memory usage on Linux · Issue #1443 · htop-dev/htop: In the screenshot below, you'll see that one of my processes is using 139GB of memory, but htop reports the system using 6GB of RAM. It's because htop hides mmap(MAP_SHARED) memory. This has c...
- GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file.: Distribute and run LLMs with a single file. Contribute to Mozilla-Ocho/llamafile development by creating an account on GitHub.
AI Stack Devs (Yoko Li) â· #ai-companion (11 messagesđ„):
-
Farewell to Tolerance for Collapse: A channel member expressed a dismissive sentiment about welcoming an impending collapse, hinting at a sense of disenchantment.
-
Spotlight on AI Companion Apps: A channel member highlighted two AI companion apps, Faraday and Amica, as noteworthy tools for those interested in AI companionship.
-
Faraday, a Personal Recommendation: The app Faraday earned a personal endorsement from a member after a monthâs usage, distinguishing itself with an ability to run locally on a PC thanks to llama.cpp.
-
Amica, an Up-and-Comer with Privacy: The recently discovered app Amica is promised to operate similarly to Faraday with enhanced features and a strong emphasis on data privacy, available for both self-hosting and cloud services.
-
Privacy-Conscious AI Relationships Encouraged: Members were encouraged to explore Faraday and Amica if they value total data privacy in their interactions with AI.
Links mentioned:
- Faraday.dev: Chat with AI Characters. Works offline. Zero configuration.
- Amica - Your friend.: Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
AI Stack Devs (Yoko Li) â· #events (2 messages):
-
Rosebud AI Game Jam Winners Announced: Rosebud beta testers teamed up with Rosie, the AI assistant, and showcased their creativity in game design during the Rosebud AI Sleep Game Jam. A game that stood out, Bedtime Negotiation, features an AI NPC character and Twitch co-founder Kevin Lin joined as a guest judge. Winners have been announced on Twitter.
-
New Game Jam: Education & AI: Rosebud AI invites the community to participate in a new Game Jam, in partnership with Week of AI, focusing on the theme of Education and AI. Participants are to create a 2D browser-based game utilizing Phaser JS on Rosebudâs AI platform, with a prize pool of $500, and they can learn more about the event on Twitter.
AI Stack Devs (Yoko Li) â· #ai-town-discuss (9 messagesđ„):
- AI Townâs Addictive Quality Acknowledged: A user linked to a Twitter post praising AI Town for its addictive nature, inspiring the idea of creating a simulation with developers, devops, dba, infra, and product managers.
- Launch of LLM-Powered NPCs: A user has made their LLM-powered NPC models and inference stack available to address common NPC limitations, with the repository and models hosted on GitHub and Huggingfaceâs Hub, although the linked API access page was not found.
- Call for Feedback on NPCs: This user highlights their NPC modelsâ low-latency innovation for smaller GPUs/CPUs and plans to introduce a quest-generation model, inviting members to provide feedback on the recent release.
- Deep Dive into NPC Implementation Challenges: The user unravelled some key NPC development challenges, including the importance of compressing model output, minimizing calls to models, and tackling issues with generalist instruct-models like GPT-3.5 or Mistral.
- Community Engages on NPC Fine-Tuning: A conversation about NPC character development ensued, with a promise of an upcoming blog post for a deeper exploration of the challenges and strategies encountered during the project.
Links mentioned:
- Tweet from ifioravanti (@ivanfioravanti): This AI Town is addictive! I can't stop watching AI characters talking to each other đ I should create one simulation with developers, devops, dba, infra and product managers all together... đ€Ż
- GitHub - GigaxGames/gigax: LLM-powered NPCs running on your machine: LLM-powered NPCs running on your machine. Contribute to GigaxGames/gigax development by creating an account on GitHub.
- Form - Tally: Made with Tally, the simplest way to create forms.
AI Stack Devs (Yoko Li) â· #ai-town-dev (11 messagesđ„):
- Map Rendering Optimizations in AI Town Discussed: [edgarhnd] asserts that for larger maps, storing the map as an array can be problematic, and suggests having the map rendering static and storing essential data for the engine in an array could be a practical solution.
- Opinion on Map Handling Methods: [ianmacartney] advocates for the map to be a static asset rather than a parameter passed around, to reduce bandwidth usage during reads, while acknowledging the server side still needs the array for collision detection.
- Returning to Original File Read Method for Maps: Both [edgarhnd] and [.casado] seem to agree that reading the map as a file, the original method, is much simpler and more efficient.
- AI Town Installation Tutorial Promoted: [.casado] shares a link to a YouTube tutorial for local AI Town installation titled â100% Local âAI Townâ with Llama 3 AGENTS!!!â, providing a resource for those interested in setting up the environment. The video is available at 100% Local âAI Townâ with Llama 3 AGENTS!!!.
Link mentioned: 100% Local âAI Townâ with Llama 3 AGENTS!!!: đ Links đDownload Pinokio here - https://pinokio.computer/The OG AI Town - https://github.com/a16z-infra/ai-townThe forked AI town - https://github.com/peaâŠ
DiscoResearch â· #mixtral_implementation (1 messages):
- Mysteries of Mixtralâs Router Coefficients: A comparison between Mixtral-8x7B-Instruct-v0.1 and Mixtral-8x22B-Instruct-v0.1 revealed different
router_aux_loss_coef
values, 0.02 and 0.001 respectively. It sparked curiosity whether these reflect actual training values or are âfantasy values,â with a possibility that smaller experts might require a higherloss_coef
.
DiscoResearch â· #general (6 messages):
-
Long Initialization Times on HPC: A member reported slow initialization times (2mins:20secs) for DiscoLM_German_7b_v1 on HPC when collecting shards, and long inference times (over 12 mins) for 4K token inputs on GPUs, despite brief initialization (3 secs) and fast inference (1.6 mins) on a local machine without GPUs.
-
GPU Utilization Improves Inference: Upon realizing they had not loaded the model onto GPUs, a member corrected the issue which reduced inference time to approximately 10 seconds on a two Tesla V100 setup, but shard loading times remained unchanged at 2mins:20secs.
-
Load Time Troubleshooting Ineffective: The suggested
low_cpu_mem_usage=True
argument did not yield improvements in model load times, indicating the problem may persist despite this adjustment. -
Slow Storage Drive Could Be a Bottleneck: Another participant suggested that the high load times may be due to the model being stored on a slow storage drive and recommended verifying if the HF cache directory is set to a fast data partition.
DiscoResearch â· #discolm_german (8 messagesđ„):
- Discussing Practical Applications: The user hoped to see more anecdotal observations of LMs and expressed interest in testing models like lmsys arena, acknowledging that even specialized tasks might still be highly beneficial. A related tweet was shared discussing potential uses: Observation Discussion.
- GPT-3âs German Model Downloads Spike: The gguf model saw an impressive uptake with 1500 downloads in just two days, signaling strong community interest and engagement.
- Skepticism Over New Model Performance: A user expressed doubt about the performance of a newly released model, as community feedback suggests it doesnât perform well, but another user disagreed, mentioning that the Phi-3 model did not overfit on the German RAG Eval dataset.
- Querying Changes in Llamafied Phi-3 Model Tokenizer: PhilipMay inquired about the rationale for altering the tokenizer in a Llamafied Phi-3 model, specifically changing the end-of-sentence token. In discussions with the owner of the model, it became apparent this alteration was made for better performance with chat applications utilizing trtllm Tokenizer Change Discussion 7 and Tokenizer Change Discussion 6.
- Phi-3 MoE Model Created for Experiments: A new Phi-3 MoE model has been developed using the Llamafied version with mergekit and a randomly initialized router. It is currently available for experimentation but requires training before use: Phi-3 MoE Model on Hugging Face.
Links mentioned:
- PhilipMay/Phi-3-MoE-mini-4k-instruct-raw · Hugging Face: no description found
- vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Why did you change the eos_token in tokenizer_config.json file?: no description found
- vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Why did you change the added_tokens.json file?: no description found
Skunkworks AI â· #general (7 messages):
-
Cutting-Edge Research on Efficient Language Models: A new article titled âLow-Cost Language Models: Survey and Performance Evaluation on Python Code Generationâ discusses CPU-compatible language models that generate Python code. The research introduces a dataset of 60 programming problems and employs a Chain-of-Thought prompt for improved model performance.
-
HaystackDB Enquires on Embeddings: A member questioned if the HaystackDB repository uses 2bit embeddings. They further inquired about the term âbinary quantizedâ in the context of the repository.
-
Efficiency via Binary Quantization: Clarifying on binary quantized embeddings, another member explained that Binary Quantization (BQ) helps create a smaller index for similarity search, enhancing the efficiency of the database.
-
Llama-3 Fine-tuning Troubles: A member reached out to inquire if anyone has had success fine-tuning Llama-3, noting issues with their models not generating the End Of Sentence (EOS) token.
Links mentioned:
- Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation: Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifica...
- GitHub - carsonpo/haystackdb: Contribute to carsonpo/haystackdb development by creating an account on GitHub.
Skunkworks AI â· #off-topic (3 messages):
-
Introducing Snowflake Arctic for Enterprise AI: A YouTube video was shared, introducing Snowflake Arctic, an enterprise-focused large language model (LLM) that aims to push the boundaries of cost-effectiveness in enterprise AI.
-
Exploring RAG with LLaMA3 via Langchain: A tutorial video was linked, demonstrating how to use a local Retrieval-Augmented Generation (RAG) agent with LLaMA3 and Langchain.
-
Web Browsing with LLaMA3 Using Langchain and Groq: The discussion included a video on implementing a web browsing agent with LLaMA 3 using the Langchain library and Groq hardware, focusing on the integration of AI and web browsing capabilities.
Links mentioned:
- Snowflake Arctic: The Best LLM for Enterprise AI: Today, the Snowflake AI Research Team is thrilled to introduce Snowflake Arctic, a top-tier enterprise-focused LLM that pushes the frontiers of cost-effectiv...
- Local RAG agent with LLaMA3 and Langchain: We will take a look at how to do RAG with LLama3 https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb#pyth...
- Llama 3 Web Browsing Agent with Langchain and Groq: We will take a look at how to implement web browsing with Llama 3 with Langchain and Groq#python #pythonprogramming #llm #ml #ai #aritificialintelligence #la...
LLM Perf Enthusiasts AI â· #jobs (1 messages):
- Join Gamma's AI Revolution: Gamma, recognized by a16z as a top consumer AI app, is hiring an AI engineer to work on large-scale text and image models. The role involves prompt engineering, evaluations, fine-tuning, and feature development with advanced AI models.
- Pushing Boundaries in Content Creation: Gamma leverages generative AI to simplify the creation of presentations and websites, serving over 10 million users who enjoy an effortless content creation experience.
- Profitable Innovation Powered by Community: With more than $10M in funding from Accel and a profitability status, Gamma maintains a lean team of 16 and continues to grow organically through word-of-mouth.
- Be Part Of A Tight-Knit Squad: This San Francisco-based company is looking to expand its small but mighty team with someone passionate about pushing LLMs to their limits, offering in-person collaboration approximately 3 days a week.
- Interested in Engineering the Future of AI?: Candidates eager to explore this opportunity can learn more and apply at the following link: https://careers.gamma.app/ai-engineer.
Link mentioned: AI Engineer: AI Engineer San Francisco Click here to apply
LLM Perf Enthusiasts AI â· #openai (3 messages):
- Leaked Version Speculation: A member shared a tweet from @phill__1 commenting that gpt2-chatbot feels like gpt4.5 due to its extensive domain knowledge. This led to discussions suggesting it could be a leaked version of GPT-4.5.
- Community Approval: There is a simple expression of approval on the quality of gpt2-chatbot, described as âItâs good.â
Link mentioned: Tweet from Phil (@phill__1): Whatever gpt2-chatbot might be, it definitely feels like gpt4.5. It has insane domain knowledge I have never seen before
Datasette - LLM (@SimonW) â· #llm (1 messages):
- Quest for Custom Grammar in Code-Generation: A member inquired about the possibility of passing a custom grammar, potentially as a model-specific option, to enhance code-generation by preventing syntax errors and focusing on semantic issues.