AI News for 6/12/2024-6/13/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (414 channels, and 3646 messages) for you. Estimated reading time saved (at 200wpm): 404 minutes. You can now tag @smol_ai for AINews discussions!
Lots of fun image-to-video and canvas-to-math demos flying around today, but not much technical detail, so we turn elsewhere, to Bryan Catanzaro of NVIDIA calling attention to their new paper studying Mamba models:
As Eugene Cheah remarked in the Latent Space Discord, this is the third team (after Jamba and Zamba that has independently found the result that mixing Mamba and Transformer blocks does better than either can alone. And the paper does conclude empirically that the optimal amount of Attention is <20%, being FAR from all you need.
{% if medium == āwebā %}
Table of Contents
[TOC]
{% else %}
The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!
{% endif %}
AI Twitter Recap
all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
LLM Capabilities and Evaluation
- Mixture-of-Agents Enhances LLM Performance: @bindureddy noted that Mixture-of-Agents (MoA) uses multiple LLMs in a layered architecture to iteratively enhance generation quality, with the MoA setup using open-source LLMs scoring 65.1% on AlpacaEval 2.0 compared to GPT-4 Omniās 57.5%.
- LiveBench AI Benchmark: @bindureddy and @ylecun announced LiveBench AI, a new LLM benchmark with challenges that canāt be memorized. It evaluates LLMs on reasoning, coding, writing, and data analysis, aiming to provide an independent, objective ranking.
- Mamba-2-Hybrid Outperforms Transformer: @ctnzr shared that an 8B-3.5T hybrid SSM model using 7% attention gets better accuracy than an 8B-3.5T Transformer on the same dataset, with MMLU jumping from 50 to 53.6% while having the same training efficiency and lower inference cost.
- GPT-4 Outperforms at Temperature=1: @corbtt found that GPT-4 is āsmarterā at temperature=1 than temperature=0, even on deterministic tasks, based on their evaluations.
- Qwen 72B Leads Open-Source Models: @bindureddy noted that Qwen 72B is the best performing open-source model on LiveBench AI.
LLM Training and Fine-Tuning
- Memory Tuning for 95%+ Accuracy: @realSharonZhou announced @LaminiAI Memory Tuning, which uses multiple LLMs as a Mixture-of-Experts to iteratively enhance a base LLM. A Fortune 500 customer case study showed 95% accuracy on a SQL agent task, up from 50% with instruction fine-tuning alone.
- Sakanaās Evolutionary LLM Optimization: @andrew_n_carr highlighted Sakana AI Labās work using evolutionary strategies to discover new loss functions for preference optimization, outperforming DPO.
Multimodal and Video Models
- Luma Labs Dream Machine: @karpathy and others noted the impressive text-to-video capabilities of Luma Labsā new Dream Machine model, which can extend images into videos.
- MMWorld Benchmark: @_akhaliq introduced MMWorld, a benchmark for evaluating multimodal language models on multi-discipline, multi-faceted video understanding tasks.
- Table-LLaVa for Multimodal Tables: @omarsar0 shared the Table-LLaVa 7B model for multimodal table understanding, which is competitive with GPT-4V and outperforms existing MLLMs on multiple benchmarks.
Open-Source Models and Datasets
- LLaMA-3 for Image Captioning: @_akhaliq and @arankomatsuzaki highlighted a paper that fine-tunes LLaVA-1.5 to recaption 1.3B images from the DataComp-1B dataset using LLaMA-3, showing benefits for training vision-language models.
- Stable Diffusion 3 Release: @ClementDelangue and others noted the release of Stable Diffusion 3 by Stability AI, which quickly became the #1 trending model on Hugging Face.
- Hugging Face Acquires Argilla: @_philschmid and @osanseviero announced that Argilla, a leading company in dataset creation and open-source contributions, is joining Hugging Face to enhance dataset creation and iteration.
AI Reddit Recap
Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!
Stable Diffusion 3 Medium Release
- Resource-efficient model: In /r/StableDiffusion, Stable Diffusion 3 Medium weights were released, a 2B parameter model that is resource-efficient and capable of running on consumer GPUs.
- Improvements over previous models: SD3 Medium overcomes common artifacts in hands and faces, understands complex prompts, and achieves high quality text rendering.
- New licensing terms: In /r/OpenAI, Stability AI announced new licensing terms for SD3: free for non-commercial use, $20/month Creator License for limited commercial use, and custom pricing for full commercial use.
- Mixed initial feedback: First testers are reporting mixed experiences, with some facing issues replicating results and others giving positive feedback on prompt adherence, detail richness, and lighting/colors.
Issues and Limitations of SD3 Medium
- Struggles with human anatomy: In /r/StableDiffusion, users report that SD3 Medium struggles with human anatomy, especially when generating images of people lying down or in certain poses. This issue is further discussed with nuanced thoughts on the modelās limitations.
- Heavy censorship: The model appears to have been heavily censored, resulting in poor performance when generating nude or suggestive content.
- Difficulty with artistic styles: SD3 Medium has difficulty adhering to artistic styles and concepts, often producing photorealistic images instead.
Comparisons with Other Models
- Varying strengths and weaknesses: Comparisons between SD3 Medium, SDXL, and other models like Stable Cascade and PixArt Sigma show varying strengths and weaknesses across different types of images (photorealism, paintings, landscapes, comic art). Additional comparison sets further highlight these differences.
- Outperforms in specific areas: SD3 Medium outperforms other models in certain areas, such as generating images of clouds or text, but falls short in others like human anatomy.
Community Reactions and Speculation
- Disappointment with release: Many users in /r/StableDiffusion express disappointment with the SD3 Medium release, citing issues with anatomy, censorship, and lack of artistic style. Some even call it a ājokeā.
- Speculation on causes: Some users speculate that the poor performance may be due to bugs in adopting the weights or the model architecture.
- Reliance on fine-tuning: Others suggest that the community will need to rely on fine-tuning and custom datasets to improve SD3ās capabilities, as was done with previous models.
Memes and Humor
- Poking fun at shortcomings: Users share memes and humorous images poking fun at SD3ās shortcomings, particularly its inability to generate anatomically correct humans. Some even sarcastically claim āhuge successā with the model.
AI Discord Recap
A summary of Summaries of Summaries
-
Stable Diffusion 3 Faces Scrutiny but Offers Alternatives:
- SD3 Faces Criticism for Model Quality: Users conveyed dissatisfaction with SD3āhighlighting anatomical inaccuracies and prompt issuesāwhile medium models can be downloaded on Huggingface.
- Preferred Interfaces & Tools Discussed: ComfyUI emerged as the favored interface, with suggested samplers like uni_pc and ddim_uniform for optimal performance. Alternatives like Juggernaut Reborn and Playground are highlighted for their specific capabilities.
-
Boosting AI Performance and Infrastructure Insights:
- LLM Performance Boosted by Higher Model Rank: Shifting from rank 16 to 128 resolved Qwen2-1.5bās gibberish output, aligning it with llama-3 caliber outputs.
- Perplexity AIās Efficient LLM Use: Quick results are achieved by leveraging NVIDIA A100 GPUs, AWS p4d instances, and TensorRT-LLM optimizations.
-
Innovations in Fine-Tuning and Quantization:
- Fine-Tuning LLMs with New Models: The discussion covered the legal aspects of using GPT-generated data, referencing OpenAIās business terms. Experimentations with ToolkenGPT show creative approaches to synthetic data for fine-tuning.
- CUDA Quantization Project discussions: Projects like the BiLLM showcase rapid quantization of large models, essential for efficient AI deployments.
-
Model Management and Deployment Techniques:
- Strategies for Handling Large Embeddings: Queries about 170,000 embedding indexes led to recommendations on using Qdrant or FAISS for faster retrieval. Specific fixes for erroneous queries were shared here.
- Docker and GPU Configuration Troubleshooting: Users dealing with Docker GPU detection on WSL found solutions by consulting the official NVIDIA toolkit guide.
-
AI Community Trends and Updates:
- OpenAIās Revenue Milestone and Focus Shift: OpenAIās revenue doubled, reflecting sales direct from ChatGPT and other services, not primarily facilitated by Microsoft (source).
- Partnerships and Conferences Engage Community: Aleph Alpha and Silo AI joined forces to advance European AI (read more), and Qwakās free virtual conference promises deep dives into AI mechanisms and networking opportunities.
PART 1: High level Discord summaries
Stability.ai (Stable Diffusion) Discord
SD3ās Rocky Release: Users have expressed dissatisfaction with Stable Diffusion 3 (SD3), citing issues such as anatomical inaccuracies and non-compliance with prompts compared to SDXL and SD1.5. Despite the critiques, the medium model of SD3 is now downloadable on Huggingface, requiring form completion for access.
Preferred Interfaces and Samplers: ComfyUI is currently the go-to interface for running SD3, and users are advising against Euler samplers. The favored samplers for peak performace with SD3 are uni_pc and ddim_uniform.
Exploring Alternatives: Participants in the channel have highlighted alternative models and tools like Juggernaut Reborn and Divinie Animemix to achieve more realism or anime style, respectively. Other resources include Playground and StableSwarm for managing and deploying models.
Keep Discussions Relevant: Moderators have had to direct conversations back on topic after detours into global politics and personal anecdotes sidetracked from the technical AI discussions.
Big Models, Bigger Needs: The 10GB model of SD3 was mentioned as a very sought-after option among the community, showing the desire for larger, more powerful models despite the mixed reception of the SD3 release.
Unsloth AI (Daniel Han) Discord
-
Boosting Model Performance with Higher Rank: Increasing the model rank from 16 to 128 resolved issues with Qwen2-1.5b producing gibberish during training, aligning the output quality with results from llama-3 training.
-
scGPTās Limited Practical Applications: Despite interesting prompting and tokenizer implementation, scGPT, a custom transformer written in PyTorch, is deemed impractical for use outside of an academic setting.
-
Embracing Unsloth for Efficient Inference: Implementing Unsloth has significantly reduced memory usage during both training and inference activities, offering a more memory-efficient solution for artificial intelligence models.
-
Mixture of Agents (MoA) Disappoints: The MoA approach by Together AI, meant to layer language model agents, has been criticized for being overly complex and seemingly more of a showpiece than a practical tool.
-
Advancing Docker Integration for LLMs: AI engineers are recommending the creation of command-line interface (CLI) tools for facilitating workflows and better integrating notebooks with frameworks like ZenML for substantial outcomes in real-world applications.
HuggingFace Discord
SD3 Revolutionizes Stable Diffusion: Stable Diffusion 3 (SD3) has dropped with a plethora of enhancements - now sporting three formidable text encoders (CLIP L/14, OpenCLIP bigG/14, T5-v1.1-XXl), a Multimodal Diffusion Transformer, and a 16 channel AutoEncoder. Details of SD3ās implementation can be found on the Hugging Face blog.
Navigating SD3 Challenges: Users encountered difficulties with SD3 on different platforms, with recommendations such as applying pipe.enable_model_cpu_offload()
for faster inference and ensuring dependencies like sentencepiece
are installed. GPU setup tips include using RTX 4090, employing fp16 precision, and making sure paths are correctly formulated.
Hugging Face Extends Family With Argilla: In an exciting turn of events, Hugging Face welcomes Argilla into its fold, a move celebrated by the community for the potential to advance open-source AI initiatives and new collaborations.
Community and Support in Action: From universities, such as the newly created University of Glasgow organization on Hugging Face, to individual contributions like Google Colab tutorials for LLM, members have been contributing resources and sourcing support for their various AI undertakings.
Enriched Learning Through Shared Resources: Members are actively exchanging knowledge, with highlighted assets including a tutorial for LLM setup on Google Colab, a proposed reading group discussion on the MaPO technique for text-to-image models, and an Academic paper on NLP elucidating PCFGs.
OpenAI Discord
-
Path to AI Expertise: Aspiring AI engineers were directed towards resources by Andrej Karpathy and a YouTube series by sentdex on creating deep learning chatbots. The discussions revolved around the necessary knowledge and skillsets for AI engineering careers.
-
GPT-4.5 Turbo Speculations Ignite Debate: A debated topic was a leaked mention of GPT-4.5 Turbo, speculated to have a 256k context window and a knowledge cutoff in June 2024. It stirred speculation on its potential continuous pretraining feature.
-
Demystifying ChatGPTās Storage Strategy: It was suggested that better memory management within ChatGPT could involve techniques for collective memory summary and cleanup to resolve current limitations.
-
Teamwork Makes the Dream Work: Key points about ChatGPT Team accounts were shared, emphasizing the double prompt limit and the financial commitment for multiple team seats when not billed annually.
-
Breaking Down Big Data: There was advice on managing substantial text data, like 300MB files, by chunking and trimming them down for practicality. Useful tools and guides were linked, including a forum post with practical tips for large documents and a notebook on handling lengthy texts through embeddings.
LLM Finetuning (Hamel + Dan) Discord
-
Fine-Tuning LLMs: Adding New Knowledge: AI enthusiasts discussed fine-tuning large language models (LLMs) like āNous Hermesā to introduce new knowledge, despite costs. A legal debate ensued concerning the use of GPT-generated data, with users consulting OpenAIās business terms; a separate mention was made of generating synthetic data referenced in the ToolkenGPT paper.
-
Technical Glitches and Advice: In the realm of LLMs, users reported preprocessing errors with models like llama-3-8b and mistral-7b. On the practical side, members traded tips on maintaining SSH connections via
nohup
, with recommendations found on this SuperUser thread. -
Innovative Model Frameworks on Spotlight: Model frameworks gained attention, with LangChain and LangGraph sparking diverse opinions. The introduction of the glaive-function-calling-v1 prompted talk about function execution capabilities in models.
-
Deployments and Showcases in Hugging Face Spaces: Several users announced their RAG-based applications, such as the RizzCon-Answering-Machine, built with Gradio and hosted on Hugging Face Spaces, though some noted the need for speed improvements.
-
Credits and Resources Quest Continues: Queries arose about missing credits and who to contact for platforms like OpenPipe. Users who havenāt received credits shared their usernames (e.g., anopska-552142, as-ankursingh3-1-817d86), and a mention of a second round of credits expected on the 14th was made.
Nous Research AI Discord
-
LLM Objective Discovery Without Human Experts: A paper on arXiv details a method for discovering optimization algorithms for large language models (LLMs) driven by the models themselves, which could streamline the optimization of LLM preferences without needing expert human input. The approach employs iterative prompting of an LLM to enhance performance according to specified metrics.
-
MoA Surpasses GPT-4 Omni: A Mixture-of-Agents (MoA) architecture, as highlighted in a Hugging Face paper, shows that combining multiple LLMs elevates performance, surpassing GPT-4 Omni with a 65.1% score on AlpacaEval 2.0. For AI enthusiasts wanting to contribute or delve deeper, the MoA modelās implementation is available on GitHub.
-
Stable Diffusion 3: A Mixed Bag of Early Impressions: While Stable Diffusion 3 garners both applause and criticism in its initial release, discussions around GPT-4ās counterintuitive better performance with higher temperature settings fuel the debate on model configuration. Conversely, a community member circulates an uncensored version of the OpenHermes-2.5 dataset, and a paper on eliminating MatMul operations promises remarkable memory savings.
-
In Search of the Lost Paper: Engagement is seen around the task of locating a forgotten paper on interleaving pretraining with instructions, suggesting active interest in cutting-edge research sharing within community channels.
-
RAG Dataset Development Continues: The dataset schema for RAG is still a work in progress, with further optimization needed for Markerās document conversion tool, where setting min_length could boost processing speeds. Simultaneously, Pandoc and make4ht emerge as possible conversion solutions for varied document types.
-
World-sim Project Status Quo: Thereās no change yet in the closed-source status of the World-sim project, despite discussions and potential for future reconsideration. Additionally, calls for making the world-sim AI bolder and adapting it for mobile platforms reflect the communityās forward-looking thoughts.
Perplexity AI Discord
-
Enthusiasm for Perplexityās Search Update: Members showed great excitement for the recently introduced search feature in Perplexity AI, with immediate interest expressed for an iOS version.
-
Musk and OpenAI Legal Battle Closes: Elon Musk has withdrawn his lawsuit against OpenAI, alleging a shift from mission-driven to profit-orientation, a day prior to the court hearing. The lawsuit included claims of prioritizing investor interests such as those from Microsoft (CNBC).
-
Perplexity AI Speed with Large Language Models: Perplexity.ai is achieving fast results despite using large language models by utilizing NVIDIA A100 GPUs, AWS p4d instances, and software optimizations like TensorRT-LLM (Perplexity API Introductions).
-
Custom GPT Woes on Perplexity: Engineers are experiencing connectivity issues with Custom GPTs; problems seem confined to the web version of the platform as no issues are reported on desktop applications, suggesting potential API or platform-specific complications.
-
Emailās Environmental Footprint: An average email emits about 4 grams of CO2; the carbon impact can be mitigated by preferential use of file share links over attachments (Mailjetās Guide to Email Carbon Footprint).
CUDA MODE Discord
Compute Intensity Discussion Left Hanging: A member inquired whether the compute intensity calculation should consider only floating-point operations on data from Global Memory. The topic remained open for discussion without a conclusive answer.
Streamlined Triton 3.0 Setup: Two practical installation methods for Triton 3.0 surfaced; one guide details installing from source, while another involves using make triton
with a specific version from the PyTorch repository.
Optimizing Optimizers in PyTorch: A robust conversation on creating a fast 8-bit optimizer using pure PyTorch and torch.compile, as well as making a drop-in replacement for 32-bit with comparable accuracy was had, drawing inspiration from the bitsandbytes implementation.
Breakthroughs in Quantization and Training Dynamics: The BiLLM project boasts rapid quantization of large language models, while torchao members debate the trade-offs in speed and accuracy across various numeric representations during matrix multiplication, from INT8 to FP8 and even INT6.
Hardware Showdown and Quantization Innovations: AMDās MI300X showcases higher throughput for LLM inference than NVIDIAās H100, and Bitnet sees progress with refactoring and nightly build strategies, but a lingering build issue remains due to an unrelated mx format test.
LM Studio Discord
Gemini 1.5 JSON Woes: Engineers report that Gemini 1.5 flash struggles with JSON mode, causing intermittent issues with output. Users are invited to share insights or solutions to this challenge.
Tess Takes the Stage: The Tess 2.5 72b q3 and q4 quant models are now live on Hugging Face, offering new tools for experimentation.
AVX2 Instruction Essential: Users facing direct AVX2 errors should verify their CPUās support for AVX2 instructions to ensure compatibility with application requirements.
LM Studio Limitations and Solutions: LM Studio cannot be run on headless web servers or support safetensor files, but it succesfully employs GGUF format and Flash Attention can be enabled via alternatives like llama.cpp.
Hardware Market Fluctuations: Thereās a spike in the price of electronically scrapped P40 GPUs with current prices over $200, as well as a humorous note on sanctions possibly affecting Russian P40 stocks. A community member shares specs for an efficient home server build: R3700X, 128GB RAM, RTX 4090, and multiple storage options.
Eleuther Discord
-
LLAMA3 70B Shows Diverse Talents: LLAMA3 70B displays a wide-ranging output capability, producing 60% AI2 arc format, 20% wiki text, and 20% code when prompted from an empty document, suggesting a tuning for specific formats. In a separate query, thereās guidance on finetuning BERT for longer texts with a sliding window technique, pointing to resources such as a NeurIPS paper and its implementation.
-
Samba Dances Over Phi3-mini: Microsoftās Samba model, trained on 3.2 trillion tokens, notably outperforms Phi3-mini in benchmarks while maintaining linear complexity and achieving exceptional long-context retrieval capabilities. A different conversation delves into Sambaās passkey retrieval for 256k sequences, discussing Mamba layers and SWA effectiveness.
-
Magpie Spreads Its Wings: The newly introduced method Magpie prompts aligned Large Language Models (LLMs) to auto-generate high-quality instruction data, circumventing manual data creation. Along this innovative edge, another discussion highlights the controversial practice of tying embedding and unembedding layers, sharing insights from a LessWrong post.
-
Debating Normalization Standards: Within the community, the metrics for evaluating models spurred debate, particularly whether to normalize accuracy by tokens or by bytes for models with identical tokenizers. A related log from a test on Qwen1.5-7B-Chat was shared, discussing solutions for troubleshooting empty responses in
truthfulqa_gen
tasks. -
Open Flamingo Spreads Its Wings: A brief message pointed members to LAIONās blog post about Open Flamingo, a likely reference to their multimodal model work.
LlamaIndex Discord
TiDB AI Experimentation on GitHub: PingCap demonstrates a RAG application using their TiDB database with LlamaIndexās knowledge graph, all available as open-source code with a demo and the source code on GitHub.
Paris AI Infrastructure Meetup Beckons: Engineers can join an AI Infrastructure Meetup at Station F in Paris featuring speakers from LlamaIndex, Gokoyeb, and Neon; details and sign-up are available here.
Vector Database Solutions for Quick Queries: For indexes containing 170,000 embeddings, use of Qdrant or FAISS Index is recommended; discussion includes fixing an AssertionError
related to FAISS queries and direct node retrieval from a VectorStoreIndex with Chroma.
Adjacent Node Retrieval from Qdrant: A user inquiring about fetching adjacent nodes for law texts in a Qdrant vector store is advised to leverage node relationships and the latest API features for directional node retrieval.
Pushing LLM-Index Capabilities with PDF Embedding: An AI Engineer discusses embedding PDFs and documents into Weaviate using LLM-Index, demonstrating interest in expanding the ingestion of complex data types into vector databases.
Cohere Discord
- Command-R Takes the Stage: Coral has been rebranded as Command-R, yet both Command-R and the original Coral remain operational facilitating model-related tasks.
- To Tune or Not to Tune: In the pursuit of optimal model performance, debate flourished with some engineers emphasizing prompt engineering over parameter tuning, while others exchanged note-worthy configurations.
- Navigating Cohereās Acceptable Use: A collective effort was noted to decode the nuances of the Cohere Acceptable Use Policy, with a focus on delineating private versus commercial usage nuances in the context of personal projects.
- Trials and Tribulations with Trial Keys: The community exchanged frustrations regarding trial keys encountering permission issues and limitations, contrasting these experiences with the smoother sailing reported by production key users.
- Hats Off to Fluent API Support: A quick nod was given in the conversations to the preference for Fluent API and appreciation for its inclusion by Cohere, evidenced by a congratulatory tone for a recent project release featuring Cohere support.
LAION Discord
-
Gender Imbalance Troubles in Stable Diffusion: The community discussed that Stable Diffusion has problems generating images of women, clothed or otherwise, due to censorship, suggesting the use of custom checkpoints and img2img techniques with SD1.5 as workarounds. Hereās the discussion thread.
-
Dream Machine Debuts: Luma AIās Dream Machine, a text-to-video model, has been released and is generating excitement for its potential, though users note its performance is inconsistent with complex prompts. Check out the model here.
-
AI Landscape Survey: Comparisons across models such as SD3 Large, SD3 Medium, Pixart Sigma, DALL E 3, and Midjourney were discussed, alongside the reopening of the /r/StableDiffusion subreddit and Redditās API changes. The community is keeping an eye on these models and these issues, and the Reddit post offers a comparison.
-
Model Instability Exposed: A study revealed that models like GPT-4o breakdown dramatically when presented with the Alice in Wonderland scenario involving minor changes to inputāhighlighting a significant issue in reasoning capabilities. Details can be found in the paper.
-
Recaptioning the Web: Enhancements in AI-generated captions for noisy web images are on the horizon, with DataComp-1B aiming to improve model training by better aligning textual descriptions. For further insights, review the overview and the scientific paper.
LangChain AI Discord
-
Chasing the Best Speech-to-Text Solution: Engineers discussed speech-to-text solutions, seeking datasets with MP3s and diarization, beyond tools like AWS Transcribe, OpenAI Whisper, and Deepgram Nova-2. The need for robust processing that could handle simple responses without tools and manage streaming responses without losing context was also highlighted.
-
LangChain Links Chains and States: In LangChain AI, integration of user and thread IDs in state management was explicitedly discussed, and tips to leverage LangGraph for maintaining state succinctly across various interactions were shared. For message similarity checks within LangChain, both string and embedding distance metrics were suggested with practical use cases.
-
Simplifying LLMs for All: A GitHub project called tiny-ai-client was presented to streamline LLM interactions, and a YouTube tutorial showed how to set up local executions of LLMs with Docker and Ollama. Meanwhile, another member shared a GitHub tutorial to set up LLM on Google Colab utilizing the 15GB Tesla T4 GPU.
-
Code Examples and Conversations to Streamline Development: Throughout the discussions, various code examples and issues were referenced to aid in troubleshooting and streamlining LLM development processes, with links like the Chat Bot Feedback Template and methodologies for evaluating off-the-shelf evaluators and maintaining Q&A chat history in LangChain.
-
Community Knowledge Sharing: Members actively shared their own works, methods, and problem-solving strategies, creating a community knowledge base that included how-tos and responses to non-trivial LangChain scenarios, affirming the collaborative ethos of the engineering community.
Modular (Mojo š„) Discord
-
Windows Woes and Workarounds: Engineers discussed Windows support for Modular (Mojo š„), with a predicted release in the Fall and a livestream update anticipated. Meanwhile, some have turned to WSL as a temporary solution for Mojo development.
-
Mojo Gets Truthy with Strings: It was noted that non-empty strings in Mojo are considered truthy, potentially causing unexpected results in code logic. Meanwhile, Mojo LSP can be set up in Neovim with configurations available on GitHub.
-
Optimizing Matrix Math: Benchmarks revealed that Mojo has superior performance to Python in small, fixed-size matrix multiplications, attributing the speed to the high overhead of Pythonās numpy for such tasks.
-
Loop Logic and Input Handling: The peculiar behavior of
for
loops in Mojo led to a suggestion to usewhile
loops for iterations requiring variable reassignment. Additionally, the current lack ofstdin
support in Mojo was confirmed. -
Nightly Updates and Compiler Quips: Mojo compiler release
2024.6.1305
was announced, sparking conversations about update procedures with advice to usemodular update nightly/max
and consider aliases for simplification. Discussions also addressed compiler limitations and the potential benefits of the ExplicitlyCopyable trait for avoiding implicit copies in the language.
Interconnects (Nathan Lambert) Discord
-
OpenAIās Indirect Profit Path: OpenAIās revenue surges without Microsoftās aid, nearly doubling in the last six months primarily due to direct sales of products like ChatGPT rather than relying on Microsoftās channels, contradicting industry expectations. Read more.
-
AI Research Goes Full Circle: Sakana AIās DiscoPOP, a state-of-the-art preference optimization algorithm, boasts its origins in AI-driven discovery, suggesting a new era where LLMs can autonomously improve AI research methods. Explore the findings in their paper and contribute via the GitHub repo.
-
Hardware Hype and Research Revelations: Anticipation builds around Nvidiaās potentially forthcoming Nemotron as teased in a tweet, while groundbreaking advancements are discussed with the release of a paper exploring speech modeling by researchers like Jupinder Parmar and Shrimai Prabhumoye.
-
SSMs Still in the Game: The community holds its breath with a 50/50 split on the continuation of Structured State Machines (SSMs), despite a leaning interest towards hybrid SSM/transformer architectures as attention layers may not be needed at each step.
-
Benchmarks Blasted by New Architecture: Introducing Samba 3.8B, an architecture merging Mamba and Sliding Window Attention, showcasing it can significantly outclass models like Phi3-mini in major benchmarks, offering infinite context length with linear complexity. Details of Sambaās prowess are found in this paper.
Latent Space Discord
-
Haize Labs Takes on AI Guardrails: Haize Labs launched a manifesto on identifying and fixing AI failure modes, demonstrating breaches in leading AI safety systems by successfully jailbreaking their protection mechanisms.
-
tldraw Replicates iPad Calculator with Open Source Flair: The team behind tldraw has reconstructed Appleās iPad calculator as an open-source project, showcasing their commitment to sharing innovative work.
-
Amazonās Conversational AI Missteps Analyzed: An examination of Amazonās conversational AI progress, or lack thereof, pointed to a culture and operational process that prioritizes products over long-term AI development, according to insights from former employees in an article shared by cakecrusher.
-
OpenAIās Surging Fiscal Performance: OpenAI has achieved an annualized revenue run rate of nearly $3.4 billion, igniting dialogue about the implications of such earnings, including sustainability and spending rates.
-
Argilla Merges with Hugging Face for Better Datasets: Argilla has merged with Hugging Face, setting the stage for improved collaborations to drive forward improvements in AI dataset and content generation.
OpenInterpreter Discord
-
Open Interpreter Empowers LLMs: Open Interpreter is being discussed as a means to transform natural language into direct computer control, offering a bridge to future integrations with tailored LLMs and enhanced sensory models.
-
Vision Meets Code in Practical Applications: The community shared experiences and troubleshooting tips on running code alongside vision models using Open Interpreter, particularly focusing on the
llama3-vision.py
profile, and strategies for managing server load during complex tasks. -
Browser Control Scores a Goal: A real-world application saw Open Interpreter successfully navigating a browser to check live sports scores, showcasing the simplicity of user prompts and the implications on server demand.
-
DIY Approach to Whisper STT: While seeking a suitable Whisper Speech-To-Text (STT) library, a guild member ended up crafting a unique solution themselves, reflecting the communityās problem-solving ethos.
-
Tweaking for Peak Performance: Discussions on fine-tuning Open Interpreter, such as altering core.py, highlighted the ongoing efforts to address performance and server load challenges to meet the particular needs of users.
OpenAccess AI Collective (axolotl) Discord
-
Appleās 3 Billion Parameter Breakthrough: Apple has unveiled a 3 billion parameter on-device language model at WWDC, achieving the same accuracy as uncompressed models through a strategy that mixes 2-bit and 4-bit configurations, averaging 3.5 bits-per-weight. The approach optimizes memory, power, and performance; more details are available in their research article.
-
Dockerized AI Hits GPU Roadblock: An engineer encountered issues when Docker Desktop failed to recognize a GPU on an Ubuntu virtual machine over Windows 11 despite commands like
docker run --gpus all --rm -it winglian/axolotl:main-latest
. Suggested diagnostic steps include checking GPU status withnvidia-smi
and confirming the installation of the CUDA toolkit. -
CUDA Confusions and WSL 2 Workarounds: The conversation shifted towards whether the CUDA toolkit should be set up on Windows or Ubuntu, with a consensus forming around installation within WSL 2 for Ubuntu. A user has expressed intent to configure CUDA on Ubuntu WSL, armed with the official NVIDIA toolkit installation guide.
OpenRouter (Alex Atallah) Discord
Param Clamping in OpenRouter: Alex Atallah specified that parameters exceeding support, like Temp > 1, are clamped at 1 for OpenRouter, and parameters like Min P arenāt passed through the UI, despite UI presentation suggesting otherwise.
Mistral 7Bās Lag Time Mystery: Users noticed increased response times for Mistral 7B variants, attributing it to context length changes and potential rerouting, supported by data from an API watcher and a model uptime tracker.
Blockchain Developer on the Market: A senior full-stack & blockchain developer is on the lookout for new opportunities, showcasing experience in the field and eagerness to engage.
Vision for Vision Models: A request surfaced for the inclusion of more advanced vision models such as cogvlm2 in OpenRouter to enhance dataset captioning capabilities.
tinygrad (George Hotz) Discord
- Bounty for RDNA3 Assembly in tinygrad: George Hotz sparked interest with a bounty for RDNA3 assembly support in tinygrad, inviting collaborators to work on this enhancement.
- Call for Qualcomm Kernel Driver Development: An opportunity has arisen for developing a āQualcomm Kernel level GPU driver with HCQ graph support,ā targeting engineers with expertise in Qualcomm devices and Linux systems.
- tinygradās Mobile Capabilities Confirmed: Confirmation was given that tinygrad is functional within the Termux app, showcasing its adaptability to mobile environments.
- In Discussion: Mimicking Mixed Precision in tinygrad: A discourse on implementing mixed precision through casting between bfloat16 and float32 during matrix multiplication revealed potential speed benefits, especially when aligned with tensor core data types.
- Tensor Indexing and UOp Graph Execution Queries: Efficient tensor indexing techniques are being explored, referencing boolean indexing and UOp graph execution with
MetalDevice
andMetalCompiler
, with an emphasis on streamlined kernel execution usingcompiledRunner
.
Datasette - LLM (@SimonW) Discord
-
A Dose of Reality in AI Hype: A blog from dbreunig suggests that the AI industry is predominantly filled with grounded, practical work, comparing the current stage of LLM work to the situation of data science circa 2019. His article hit Hacker Newsā front page, signaling a high interest in the pragmatic approach to AI beyond the sensationalist outlook.
-
CPU Yesteryear, GPU Today: The search for computing resources has shifted from RAM and Spark capacity to GPU cores and VRAM, illustrating the changing technical needs as AI development progresses.
-
Token Economics in LLM Deployment: Databricksā client data shows a 9:1 input-to-output ratio for Large Language Models (LLMs), highlighting that input token cost can be more critical than output, which has economic implications for those operating LLMs.
-
Spotlight on Practical AI Application: The recognition of dbreunigās observations at a Databricks Summit by Hacker News underscores community interest in discussions about the evolution and realistic implementation of AI technologies.
DiscoResearch Discord
- European AI Synergy: A strategic partnership has been formed between Aleph Alpha and Silo AI to push the frontiers of open-source AI and tailor enterprise-grade solutions for Europe. This collaboration leverages Aleph Alphaās advanced tech stack and Silo AIās robust 300+ AI expert team, with an eye to accelerate AI deployment in European industrial sectors. Read about the partnership.
Torchtune Discord
-
Participate in the Poll, Folks: A user has requested the communityās participation in a poll regarding how they serve their finetuned models, with an undercurrent of gratitude for their engagement.
-
Tokenizers Are Getting a Makeover: An RFC for a comprehensive tokenizer overhaul has been proposed, promoting a more feature-rich, composable, and accessible framework for model tokenization, as found in a pull request on GitHub.
MLOps @Chipro Discord
- Free Virtual Conference for AI Buffs: Infer: Summer ā24, a free virtual conference happening on June 26, will bring together AI and ML professionals to discuss the latest in the field, including recommender systems and AI application in sports.
- Industry Experts Assemble: Esteemed professionals such as Hudson Buzby, Solutions Architect at Qwak, and Russ Wilcox, Data Scientist at ArtifexAI, will be sharing their insights at the conference, representing companies like Lightricks, LSports, and Lili Banking.
- Live Expert Interactions: Attendees at the conference will have the chance for real-time engagement with industry leaders, providing a platform to exchange practical knowledge and innovative solutions within ML and AI.
- Hands-On Learning Opportunity: Scheduled talks promise insights into the pragmatic aspects of AI systems, such as architecture and user engagement strategies, with a focus on building robust, predictive technologies.
- Networking with AI Professionals: Participants are encouraged to network and learn from top ML and AI professionals by registering for free access to the event. The organizers emphasize the event as a key opportunity to broaden AI understanding and industry connections.
Mozilla AI Discord
-
Donāt Miss the Future of AI Development: An event discussing how AI software development systems can amplify developers is on the horizon. Further info and RSVP can be found here.
-
Catch Up on Top ML Papers: The newest Machine Learning Paper Picks has been curated for your reading pleasure, available here.
-
Engage with CambAI Teamās Latest Ventures: Join the upcoming event with CambAI Team to stay ahead in the field. RSVP here to be part of the conversation.
-
Claim Your AMA Perks: Attendees of the AMA, remember to claim your 0din role to get updates about swag like T-shirts by following the customized link provided in the announcement.
-
Contribute to Curated Conversations with New Tag: The new
member-requested
tag is now live for contributions to a specially curated discussion channel, reflecting community-driven content curation. -
Funding and Support for Innovators: The Builders Program is calling for members seeking support and funding for their AI projects, with more details available through the linked announcement.
YAIG (a16z Infra) Discord
- GitHub Codespaces Roll Call: A survey was launched in #tech-discussion to determine the usage of GitHub Codespaces among teams, using ā for yes and ā for no as response options.
The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI Stack Devs (Yoko Li) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
{% if medium == āwebā %}
Stability.ai (Stable Diffusion) ā· #general-chat (854 messagesš„š„š„):
- Members express disappointment with SD3 quality: Users have criticized the quality of the newly released SD3, often comparing it unfavorably with SDXL and 1.5 versions. Concerns include anatomical inaccuracies, the model not obeying prompts, and reduced quality in photographic styles.
- SD3 medium released on Huggingface: The SD3 medium model is now available for download on Huggingface, although users must fill out a form for access. There have been some issues with git cloning, and a 10GB model option is popular.
- ComfyUI favored for SD3 usage: ComfyUI is currently the preferred interface for running SD3, with users suggesting the best samplers and schedulers for optimal results. Recommendations include avoiding Euler samplers and using uni_pc and ddim_uniform for better performance.
- Alternative model suggestions and tools: Members shared alternatives and tools like Juggernaut reborn for realistic styles and Divinie animemix for anime styles. Other recommended resources for running models include Playground and StableSwarm.
- Discussion off-topic around global and political issues: At times, the channel veered into off-topic discussions about politics, international relations, and personal interactions, distracting from the core focus on AI models and technical help. Mods reminded community members to maintain topical relevance and report inappropriate content.
Links mentioned:
- stabilityai/stable-diffusion-3-medium Ā· Hugging Face: no description found
- Muajaja Risa Malvada GIF - Muajaja The Simpsons - Discover & Share GIFs: Click to view the GIF
- Free AI image generator: Art, Social Media, Marketing | Playground: Playground (official site) is a free-to-use online AI image creator. Use it to create art, social media posts, presentations, posters, videos, logos and more.
- SD3 IS HERE!! ComfyUI Workflow.: SD3 is finally here for ComfyUI!Topaz Labs: https://topazlabs.com/ref/2377/HOW TO SUPPORT MY CHANNEL-Support me by joining my Patreon: https://www.patreon.co...
- Crycat Crying Cat GIF - Crycat Crying Cat Crying Cat Thumbs Up - Discover & Share GIFs: Click to view the GIF
- Reddit - Dive into anything: no description found
- Installation Guides: Stable Diffusion Knowledge Base (Setups, Basics, Guides and more) - CS1o/Stable-Diffusion-Info
- imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
- GitHub - RocketGod-git/stable-diffusion-3-gui: GUI for Stable Diffusion 3 written in Python: GUI for Stable Diffusion 3 written in Python. Contribute to RocketGod-git/stable-diffusion-3-gui development by creating an account on GitHub.
- Reddit - Dive into anything: no description found
- Blender - Open Data: Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.
- Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images ā Stability AI: DeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that allows research labs to examine and experiment with advanced text-to-image genera...
- Stable Assistant Gallery ā Stability AI: Stable Assistant delivers image creation capabilities like never before. Explore our gallery and be amazed by what's possible!
- Stable Assistant Gallery ā Stability AI: Stable Assistant delivers image creation capabilities like never before. Explore our gallery and be amazed by what's possible!
- imgsys.org | an image model arena by fal.ai: A generative AI arena where you can test different prompts and pick the results you like the most. Check-out the model rankings and try it yourself!
- Mobius - v1.0 | Stable Diffusion Checkpoint | Civitai: Mobius: Redefining State-of-the-Art in Debiased Diffusion Models Mobius, a diffusion model that pushes the boundaries of domain-agnostic debiasing ...
Unsloth AI (Daniel Han) ā· #general (446 messagesš„š„š„):
- Rank boosts give better training results: āI can confirm that the issues with Qwen2-1.5b training and it outputting gibberish is to do with too low of a rank. I switched from rank 16 to rank 128, and my outputs matched the quality of my outputs from my llama-3 train.ā
- scGPT might not be practical outside academia: āThatās a custom transformer implemented in torch š Unusable outside academia as isā¦interesting what they did on the prompting and the tokenizer!ā
- Inference support and memory efficiency in Unsloth: A user asked if Unsloth affects inference and received confirmation that Unsloth reduces memory usage significantly during both training and inference.
- Mixture of Agents (MoA) does not impress: Users discussed Together AIās MoA approach to layering LLM agents but found it ākind of just for showā and overly complex.
- Dockerizing for seamless pipelines: āThe notebooks are a great ājump inā but in real-world apps you want to integrate that with other stuff. Making CLI tools for consistent workflows and easier integrations with frameworks like ZenML.ā
Links mentioned:
- Let's reproduce GPT-2 (124M): We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really...
- Together MoA ā collective intelligence of open-source models pushing the frontier of LLM capabilities: no description found
- google/recurrentgemma-9b Ā· Hugging Face: no description found
- Let's build the GPT Tokenizer: The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizer...
- Smh Shaking GIF - Smh Shaking My - Discover & Share GIFs: Click to view the GIF
- Introducing Appleās On-Device and Server Foundation Models: At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply intoā¦
- scGPT/tutorials/zero-shot at main Ā· bowang-lab/scGPT: Contribute to bowang-lab/scGPT development by creating an account on GitHub.
- GitHub - bowang-lab/scGPT: Contribute to bowang-lab/scGPT development by creating an account on GitHub.
- GitHub - bowang-lab/scGPT at integrate-huggingface-model: Contribute to bowang-lab/scGPT development by creating an account on GitHub.
- ollama/examples at main Ā· ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
- Download the complete genome for an organism: no description found
- Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters: Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsit...
- PowerInfer-2: Fast Large Language Model Inference on a Smartphone: This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's ...
- GitHub - sebdg/unsloth at cli-trainer: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - GitHub - sebdg/unsloth at cli-trainer
- Neuroplatform - FinalSpark: no description found
- Replete-AI/code_bagel_hermes-2.5 Ā· Datasets at Hugging Face: no description found
Unsloth AI (Daniel Han) ā· #help (190 messagesš„š„):
-
Trainer and TrainingArguments Confusion: A user expressed confusion about the specifics of Trainer and TrainingArguments in the Huggingface documentation. They noted that the descriptions do not fully explain how to use these classes.
-
Saving as gguf Issues: A user faced a
ValueError
when attempting to save a model as gguf, but found success after specifyingf16
as the quantization method. They shared this solution along with the syntax:model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
. -
Untrained Tokens Error: Another user encountered a
ValueError
related to untrained tokens while training a model, suggesting thatembed_tokens
andlm_head
must be included in the training process. This issue was linked to adding new tokens that require enabling training on certain model parts. -
Dataset Formatting for Multilabel Classification: One member sought advice on finetuning Llama 3 for multilabel classification and whether the same prompt format could be used. While responses acknowledged the need for consistent dataset templates, no specific solution for multilabel classification was provided.
-
citing Unsloth: Users discussed how to cite Unsloth in a paper, with a suggestion to reference it as: āDaniel Han and Michael Han. 2024. Unsloth, Unsloth AI.ā followed by the Unsloth GitHub page.
Links mentioned:
- afrizalha/Kancil-V1-llama3-fp16 Ā· Hugging Face: no description found
- How do you calculate max steps: I am looking to understand the math behind how max steps is calculated when left alone, Iāve tried to work backwards from making changes to epoch, batch, and micro-batch to see if I could figure out t...
- Home: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- Trainer: no description found
HuggingFace ā· #announcements (8 messagesš„):
- Stable Diffusion 3 Launches with Enhanced Features: SD3 is a diffusion model that includes three text encoders (CLIP L/14, OpenCLIP bigG/14, and T5-v1.1-XXL), a Multimodal Diffusion Transformer, and a 16 channel AutoEncoder model. Check the complete details and code on the Stable Diffusion 3 Medium space.
- Community Highlights #62: Featuring new tools and projects like a formula 1 prediction model, Simpletuner v0.9.7, and Tiny LLM client. Check out the 1M+ Dalle 3 captioned dataset and MARS v5 for more cool stuff.
- Argilla Joins Hugging Face: Todayās a huge day for Argilla, Hugging Face, and the Open Source AI community: Argilla is joining Hugging Face! Emphasizing the community and data-centric, open source AI approach.
- Community Welcomes Argilla: Members congratulated Argilla on joining Hugging Face and expressed excitement for future projects and collaborations. āLooking forward to my first distilabel x arguilla project (of course hosted on hf!)ā.
Links mentioned:
- stabilityai/stable-diffusion-3-medium-diffusers Ā· Hugging Face: no description found
- stabilityai/stable-diffusion-3-medium Ā· Hugging Face: no description found
- Diffusers welcomes Stable Diffusion 3: no description found
HuggingFace ā· #general (185 messagesš„š„):
-
Troubleshooting SD3 Issues with Diffusers: Users faced multiple errors while trying to run the Stable Diffusion 3 (SD3) model with the latest diffusers library update. One user shared that adding
pipe.enable_model_cpu_offload()
significantly improved inference time. -
University of Glasgow Organization on HuggingFace: A member announced the creation of the University of Glasgow organization on HuggingFace, inviting faculty, researchers, and students to join using their university email. Link to the organization.
-
Error in LLM Session Management: A user sought help for maintaining sessions with their local LLM using the Ollama interface with Llama3. Another member provided a Python script solution for continuous session management using the OpenAI API format.
-
Generalized LoRA (GLoRA) Implementation Help: A user working on integrating GLoRA into the PEFT library requested assistance with an error in their forked implementation. They provided a link to the GitHub repository and related research paper.
-
Tutorial for LLM on Google Colab: A member shared a tutorial for setting up LLM on Google Colab to take advantage of the free GPU for both GPU-accelerated and CPU-only inference. They highlighted the tutorial as beneficial for others encountering similar issues.
Links mentioned:
- Stable Diffusion 3 - a Hugging Face Space by nroggendorff: no description found
- Ilaria RVC - a Hugging Face Space by TheStinger: no description found
- UniversityofGlasgow (University of Glasgow): no description found
- stabilityai/stable-diffusion-3-medium-diffusers Ā· Hugging Face: no description found
- Video LLaVA - a Hugging Face Space by LanguageBind: no description found
- One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning: We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimi...
- Helsinki-NLP/opus-mt-ko-en Ā· Hugging Face: no description found
- GitHub - viliamvolosv/peft: š¤ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.: š¤ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - viliamvolosv/peft
- GitHub - casualcomputer/llm_google_colab: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session.: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session. - casualcomputer/llm_google_colab
- GitHub - VedankPurohit/LiveRecall: Welcome to **LiveRecall**, the open-source alternative to Microsoft's Recall. LiveRecall captures snapshots of your screen and allows you to recall them using natural language queries, leveraging semantic search technology. For added security, all images are encrypted.: Welcome to **LiveRecall**, the open-source alternative to Microsoft's Recall. LiveRecall captures snapshots of your screen and allows you to recall them using natural language queries, leverag...
- GitHub - continuedev/deploy-os-code-llm: š How to deploy an open-source code LLM for your dev team: š How to deploy an open-source code LLM for your dev team - continuedev/deploy-os-code-llm
- Smol Illegally Smol Cat GIF - Smol Illegally smol cat Cute - Discover & Share GIFs: Click to view the GIF
- Google Colab: no description found
HuggingFace ā· #cool-finds (10 messagesš„):
-
Distillās invaluable yet inactive resources: Despite being inactive, Distill offers highly illustrative articles on various ML and DL topics. Highlights include Understanding Convolutions on Graphs and A Gentle Introduction to Graph Neural Networks.
-
Enterprise LLMs security query sparks interest: A member inquired if anyone is working on security for Enterprise level LLMs. This reflects a growing concern in the community regarding AI model security.
-
Academic dive into NLP and Graph Models: Various academic resources were shared including NLP notes on PCFGs and a pioneering approach to neural ODE models on arXiv.
-
DreamMachine by Luma AI hailed as Pikalabs successor: DreamMachine by Luma AI was celebrated as a successor to Pikalabs, offering 30 free videos a month with img2vid and text2vid features. Prompting tips were discussed to optimize results, with a suggestion to keep prompts simple and let the systemās model assume control.
-
Nodus Labsā ACM paper: A member shared a paper from ACM which may be insightful for those interested in further technical details.
Links mentioned:
- Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow: We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions Ļ_0 and Ļ_1, henc...
- Distill ā Latest articles about machine learning: Articles about Machine Learning
- Archives - Bartosz Ciechanowski: no description found
HuggingFace ā· #i-made-this (11 messagesš„):
-
Colab Tutorial tackles LLM setup: A member shared a tutorial on setting up a LLM on Google Colab to leverage the free 15GB Tesla T4 GPU for both GPU-accelerated and CPU-only inference sessions. They hope it will assist others struggling with existing solutions and troubleshooting problems.
-
Tiny-AI-Client simplifies LLM usage: Another member introduced a tiny, intuitive client for LLMs that supports vision and tool use, aiming to be an alternative to langchain for simpler use cases. They offered to help with bugs and invited others to try it out.
-
SimpleTuner release integrates SD3: A new release of SimpleTuner was announced, fully integrating Stable Diffusion 3ās unet and lora training. The member shared their dedication to the project, working tirelessly to achieve this integration.
-
French Deep Learning Notebooks: A GitHub repository containing Deep Learning notebooks in French was shared, aimed at making Deep Learning more accessible. The course, inspired by resources from Andrej Karpathy and DeepLearning.ai, is a work in progress.
-
Conceptual Captions dataset: A member shared a massive dataset with 22 million high-quality captions for 11 million images from Googleās CC12M, created using LLaVaNext.
Links mentioned:
- Pixel Prompt - a Hugging Face Space by Hatman: no description found
- @Draichi on Hugging Face: "Hey Hugging Face Community š¤
Iām excited to share my latest project thatā¦ā: no description found
- Release v0.9.7 - stable diffusion 3 Ā· bghira/SimpleTuner: Stable Diffusion 3 To use, set STABLE_DIFFUSION_3=true in your sdxl-env.sh and set your base model to stabilityai/stable-diffusion-3-medium-diffusers. Whatās Changed speed-up for training sampleā¦
- GitHub - casualcomputer/llm_google_colab: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session.: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session. - casualcomputer/llm_google_colab
- GitHub - SimonThomine/CoursDeepLearning: Un regroupement de notebooks de apprendre le deep learning Ć partir de 0: Un regroupement de notebooks de apprendre le deep learning Ć partir de 0 - SimonThomine/CoursDeepLearning
- GitHub - piEsposito/tiny-ai-client: Tiny client for LLMs with vision and tool calling. As simple as it gets.: Tiny client for LLMs with vision and tool calling. As simple as it gets. - piEsposito/tiny-ai-client
- CaptionEmporium/conceptual-captions-cc12m-llavanext Ā· Datasets at Hugging Face: no description found
HuggingFace ā· #reading-group (1 messages):
- Proposal to discuss MaPO: A member suggested inviting one of the authors to present the paper MaPO, highlighting the potential interest within the community. The paper discusses a novel alignment technique for text-to-image diffusion models that avoids the limitations of divergence regularization and is more flexible in handling preference data MaPO.
Link mentioned: MaPO Project Page: SOCIAL MEDIA DESCRIPTION TAG TAG
HuggingFace ā· #computer-vision (28 messagesš„):
- Help on Image Retrieval Models: A newcomer seeks advice on the best models for image retrieval, mentioning success with Microsoftās Vision API and CLIP. A suggestion includes using Mobilenets for real-time apps, OpenCLIP for best scores, and Faiss for versatile search engine capabilities.
- Recommendation of smol-vision: A member shares a GitHub link for smol-vision, which provides recipes for shrinking, optimizing, and customizing cutting edge vision models.
- Channel Redirection: Guidance was given to post in a more relevant channel, <#1019883044724822016>, for deeper insights and recommendations.
Links mentioned:
- Smash GIF - Smash - Discover & Share GIFs: Click to view the GIF
- GitHub - merveenoyan/smol-vision: Recipes for shrinking, optimizing, customizing cutting edge vision models. š: Recipes for shrinking, optimizing, customizing cutting edge vision models. š - GitHub - merveenoyan/smol-vision: Recipes for shrinking, optimizing, customizing cutting edge vision models. š
HuggingFace ā· #NLP (1 messages):
- Finetuning Llama3 for MultiLabel Classification: A member inquired about the method for finetuning Llama3 for MultiLabel classification, asking if the same prompt format used in a specific Kaggle notebook should be followed for each row, or if there exists another method similar to BERT.
HuggingFace ā· #diffusion-discussions (79 messagesš„š„):
-
Different results between schedulers with SD3: A user noted they were getting different results between Comfy and Diffusers when testing different schedulers with SD3. They solicited input from others to see if anyone else had similar experiences.
-
Stable Diffusion 3 Medium model released: Stable Diffusion 3 Medium with 2B parameters has been released and is available on the Hugging Face Hub. The model includes integrations with Diffusers and comes with Dreambooth and LoRA training scripts blog post.
-
Common issues while running Diffusers scripts: Multiple users ran into issues running Diffusers scripts for SD3 with errors related to tokenizer and environment setup. Installing
sentencepiece
and ensuring proper paths and dependencies were suggested fixes. -
Training SD3 LoRA with single GPU: Users discussed the possibility of training SD3 LoRA using a single high-end GPU like the RTX 4090. Recommendations included adjusting batch sizes, using fp16 precision, and validating paths are absolute and properly formatted.
-
Troubleshooting GPU configurations: Users with issues running scripts on Windows and Linux shared solutions like ensuring correct NVIDIA driver installations and accelerate configurations. It was suggested to use Hugging Face model hub to manually download models and check dependencies.
Links mentioned:
- Diffusers welcomes Stable Diffusion 3: no description found
- stabilityai/stable-diffusion-3-medium-diffusers at main: no description found
- stabilityai/stable-diffusion-3-medium-diffusers Ā· Hugging Face: no description found
- diffusers/examples/dreambooth/README_sd3.md at main Ā· huggingface/diffusers: š¤ Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers
OpenAI ā· #ai-discussions (223 messagesš„š„):
- AI Pathways Explored with Potential Guidance: A user inquired about the pathways to become an AI engineer and it was suggested to look into resources provided by Karpathy, a former cofounder of OpenAI and Director of AI at Tesla. Another user linked the YouTube series by sentdex for creating a chatbot with deep learning.
- Speculation on GPT-4.5 Turbo Model: There was a heated discussion about a temporarily visible page suggesting OpenAIās release of GPT-4.5 Turbo with a 256k context window and a knowledge cutoff of June 2024. Some users believed it could have been a test page or a mistake while others speculated about continuous pretraining capabilities.
- Memory Limits and Management in ChatGPT: Several users discussed issues related to memory limits in ChatGPT. One member recommended checking and collectively summarizing memory to free up space as a workaround.
- Team Account Benefits and Conditions: There were clarifications about the benefits and restrictions of ChatGPT Team accounts, including doubled prompt limits and billing policies. A user shared that the ChatGPT Team plan requires a minimum commitment of two seats and has a higher monthly cost if not billed annually.
- New UI and Behavioral Changes in ChatGPT: Users observed new UI changes and more expressive text responses in the latest ChatGPT updates, sparking interest and speculation about improvements in the modelās behavior.
Link mentioned: DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs: Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these sā¦
OpenAI ā· #gpt-4-discussions (3 messages):
-
Clarification on model names: The model known as GPT-4 Turbo is referenced with the identifier
"gpt-4-turbo"
in the OpenAI API. Another member confirmed that the model based on the GPT-4 architecture is often referred to with identifiers like"gpt-4"
or"gpt-4-turbo"
. -
Customizing GPT roles generates confusion: A member expressed confusion over the customization of GPT roles with model names like
"gpt-4"
and"gpt-4-turbo"
. -
Temporary messages in GPTs possible: A user inquired whether GPTs can have temporary messages. The same user confirmed that it is possible, expressing excitement with āit is! awesomeā.
OpenAI ā· #prompt-engineering (3 messages):
- Chunk and Trim Large Text Data: A member suggested that for managing ā300MB of text dataā, you āchunk that up, and cut it down.ā This implies breaking down the data into smaller, manageable parts.
- Practical Tips for Large Documents: A link to a forum post offering practical tips for dealing with large documents was shared. This resource likely provides strategies to handle large text inputs within the constraints of token limits.
- Handling Long Texts with Embeddings: Another link to a notebook on embedding long inputs demonstrates handling texts exceeding a modelās maximum context length. The guide uses embeddings from
text-embedding-3-small
and refers to the OpenAI Embeddings Guide for further learning.
Links mentioned:
- Practical Tips for Dealing with Large Documents (>2048 tokens): Dear Mr Plane, Please correspond with the OpenAI team and they will reply with your specific parameters. Kind Regards, Robinson
- Embedding texts that are longer than the model's maximum context length | OpenAI Cookbook: no description found
OpenAI ā· #api-discussions (3 messages):
-
Chunk your data for processing: One member advised splitting large text data files, particularly files around 300MB, to manage them better. They noted that smaller chunks are easier to handle.
-
Practical tips for large documents: A link to a community forum post was shared, providing guidance on managing documents exceeding 2048 tokens in length.
-
OpenAIās embedding models limit: A resource on embedding long inputs was shared, explaining that OpenAIās embedding models have maximum text length limits measured by tokens. The post includes a notebook that demonstrates handling over-length texts and links to the OpenAI Embeddings Guide.
Links mentioned:
- Practical Tips for Dealing with Large Documents (>2048 tokens): Dear Mr Plane, Please correspond with the OpenAI team and they will reply with your specific parameters. Kind Regards, Robinson
- Embedding texts that are longer than the model's maximum context length | OpenAI Cookbook: no description found
LLM Finetuning (Hamel + Dan) ā· #general (15 messagesš„):
-
Finding example data for LLM finetuning: A user inquired about sourcing examples for finetuning when lacking a large dataset. No specific solutions were discussed in the responses.
-
Recordings for June sessions: It was clarified that there wouldnāt be a single recording for Juneās sessions, but each event has its own recording.
-
Memorization experiment reveals interesting results: A user shared results from an experiment on LLM memorization, showing significant improvement with 10x example sets but little increase with higher multiples. Detailed findings and a GitHub repo with datasets and scripts were shared for further exploration.
-
Call for LLM experts to join the team: An open position was posted looking for an expert in developing and fine-tuning LLMs to help create a model for Hugging Faceās leaderboard, emphasizing the need for innovative evaluation metrics. Interested candidates were invited to submit resumes and cover letters for consideration.
-
Morale-boosting reminder with a new channel announcement: In response to a humorous complaint about a member enjoying free time, a new channel, <#1250895186616123534>, was announced for showcasing and demoing projects. Users were encouraged to use the General voice channel for live demonstrations.
Links mentioned:
- Tweet from Hamel Husain (@HamelHusain): Its so nice to have free time!
- Tweet from Hamel Husain (@HamelHusain): Its so nice to have free time!
- Fine-Tuning-Memorisation-v2: Results-all-v2 Examples,Run 1 or 2,Object,No,Question,Expected Answer,Response,Extracted Number,Correct,Temperature 1-50-Examples,One,Apple,Q1,What is the number for Apple?,65451,The number for Apple...
LLM Finetuning (Hamel + Dan) ā· #š©-modal (4 messages):
- Trouble with text-to-SQL template in finetuning llama3: An issue was raised about finetuning llama3 for text-to-SQL using the
llama-3.yml
config andsqlqa.subsample.jsonl
dataset. The preprocessing stage did not format the data correctly, resulting in<|begin_of_text|>
tags instead of[INST]
and[SQL]
markers. - Extra credit confusion: Several members discussed missing out on extra credits due to the deadline on June 11th. One member expressed the complexity of keeping up with many channels and appreciated the ease of starting an app with Modal.
- Modal credits gratitude: Multiple members thanked for receiving $1000 credits with one sharing their email for not receiving $500 extra credits despite running the starter app before the deadline.
LLM Finetuning (Hamel + Dan) ā· #learning-resources (1 messages):
- Seeking resources for LLM input & output validations: A user asked for recommendations on resources to write input & output validations for LLMs. They also requested suggestions for favorite tools in this context.
LLM Finetuning (Hamel + Dan) ā· #hugging-face (2 messages):
- Trouble Uploading SQLite DB to Huggingface Spaces: A user struggled with uploading a 3MB SQLite file to Huggingface Spaces, initially unable to update it due to file size restrictions. The solution they discovered was to use git lfs to manage the file upload.
LLM Finetuning (Hamel + Dan) ā· #langsmith (2 messages):
- LangSmith Free Developer Plan Info: Users on the free Developer plan have a ā$0/seat cost and 5K free traces a month.ā You wonāt be charged for anything until your credits are exhausted.
LLM Finetuning (Hamel + Dan) ā· #berryman_prompt_workshop (1 messages):
hamelh: humanloop, promptfoo are both popular
LLM Finetuning (Hamel + Dan) ā· #clavie_beyond_ragbasics (4 messages):
- Victory in building a working system: A member expressed relief and satisfaction after successfully building a working system, mentioning it took several hours to accomplish.
- Library issues with chunk separation: Another message raised an issue about the fasthtml library not getting the correct answers due to separation in chunks, hinting at possible performance impacts.
- Seeking examples of Colbert embeddings: A member inquired about good examples showing how Colbert embeddings are used for retrieval, suggesting a need for clarity or learning resources.
- Deployed app to Hugging Face Spaces: The deployed app includes a feedback mechanism and is slow but functional. The app took longer to deploy than build, covers 20-21 talks, and stores feedback in a SQLite DB.
Link mentioned: RizzConn Answering Machine - a Hugging Face Space by t0mkaka: no description found
LLM Finetuning (Hamel + Dan) ā· #jason_improving_rag (2 messages):
-
Evaluating Category Selection Approaches: A member discussed challenges in selecting from 500 categories when filtering documents. They mentioned testing this approach with GPT-3.5 and 4 without success and are considering revisiting it with GPT-4.0 for different product feeds to explore potential improvements.
-
ParadeDB Installation Highlight: A message recommended installing ParadeDB extensions, emphasizing its ability to āunify operational and analytical dataā and āunlock insights faster and simplify the data stack with an all-in-one search and analytical database.ā ParadeDBās ACID-compliant transaction control and Postgres compatibility were also mentioned as key benefits.
Link mentioned: ParadeDB - Postgres for Search and Analytics: ParadeDB is a modern Elasticsearch alternative built on Postgres.
LLM Finetuning (Hamel + Dan) ā· #saroufimxu_slaying_ooms (7 messages):
-
Member Requests Training Profiling Lecture: A member asked if experts could give a lecture on training profiling, covering tools, setup, and log interpretation. They emphasized the importance of maximizing GPU utilization and reducing training costs, noting a āBIG gap in speed between frameworks.ā
-
Poll for Lecture Demand: A suggestion was made to gauge interest through a poll. Another member received superpowers to create the poll to measure demand.
-
Potential Speaker Shows Interest: The potential speaker preliminarily agreed to give the lecture if there was enough interest. They also humorously agreed to create a private lesson if community interest was lacking.
LLM Finetuning (Hamel + Dan) ā· #paige_when_finetune (4 messages):
-
Gemini models differ in approach: One user sought clarification on whether the Gemini.Google.com model is a mixture of a RAG/search model, while the Gemini 1.5 Pro is solely a language model. They were trying to reconcile differing outputs to the same prompt across these models.
-
Detailed description of the Adolescent Well-being Framework: The user shared a comprehensive breakdown of the Adolescent Wellbeing Framework (AWF) from gemini.google.com, emphasizing its five interconnected domains such as mental and physical health and community and connectedness. The framework, developed by the WHO, aims to assess and promote well-being in adolescents.
-
Comparing frameworks for adolescent well-being: Another response from aistudio.google.com with Gemini 1.5 Pro provided a broader view, highlighting several well-known models like the Five Cs of Positive Youth Development and the Whole School, Whole Community, Whole Child (WSCC) Model. This alternative response presented various dimensions such as physical health, mental and emotional health, and economic well-being, noting these are crucial for adolescent development.
LLM Finetuning (Hamel + Dan) ā· #gradio (2 messages):
-
RAG-based app launched on Huggingface Spaces: A member announced that they built a RAG-based app with Gradio and uploaded it to Huggingface Spaces. RizzCon-Answering-Machine is presented as āRefreshing,ā processing data for the first 20 talks.
-
Performance needs improvement for RAG-based app: The same member noted that their RAG-based app is presently too slow and needs some profiling for better performance.
Link mentioned: RizzConn Answering Machine - a Hugging Face Space by t0mkaka: no description found
LLM Finetuning (Hamel + Dan) ā· #axolotl (9 messagesš„):
- Llama-3-8b preprocessing error plagues user: A member reported encountering a warning while preprocessing llama-3-8b, stating: ācopying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-opā. This error seems to repeat for each layer, also appearing with the mistral-7b model.
- VSCode disconnects hinder fine-tuning tasks: A member experienced frequent disconnections when running fine-tunes on a DL rig via SSH in VSCode. Another member suggested using
nohup
or switching to manual execution on the physical machine, with additional advice to consider using SLURM for better handling. - Nohup to the rescue for background processes: Members discussed using
nohup
to run commands in the background, allowing processes to continue even if the SSH session disconnects. A helpful link to a SuperUser thread was shared for detailed instructions.
Link mentioned: Run Bash script in background and exit terminal: Is it possible to launch a command or Bash script exit terminal and NOT interrupt command? My solution was to run cron at a specific time of day, but Iām sure there is something easier.
LLM Finetuning (Hamel + Dan) ā· #zach-accelerate (1 messages):
- Seamlessly switch between FSDP and DeepSpeed: An article titled āA Hugging Face Accelerate Story of Multiple Backends: FSDP and DeepSpeedā was shared. It discusses how to go back and forth from FSDP to DeepSpeed using Hugging Face Accelerate, highlighting the differences between these backends and a precision-related change that was upstreamed. Read more here.
Link mentioned: From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate: no description found
LLM Finetuning (Hamel + Dan) ā· #wing-axolotl (3 messages):
-
Axolotl struggles on Modal but hope for success on Jarvis: A user shared that they had difficulty running Axolotl on Modal despite its supposed ease. They plan to try it on Jarvis next.
-
Helpful YouTube tutorial for Axolotl on JarvisLabs: Another user provided a YouTube tutorial which they found useful for running Axolotl on JarvisLabs. The video includes links to both JarvisLabs and the Axolotl GitHub repository.
Link mentioned: How to run axolotl on JarvisLabs | Tutorial: Check out axolotl on JarvisLabs : jarvislabs.ai/templates/axolotlCheck out axolotl github : https://github.com/OpenAccess-AI-Collective/axolotl
LLM Finetuning (Hamel + Dan) ā· #simon_cli_llms (1 messages):
-
Seeking local model access without Ollama: A user inquired about accessing a local model running on a server with an API call instead of using Ollama, specifically mentioning TGI or vllm. The user referenced possible blogs that discuss this.
-
Interest in DeepSeek V2 API support: The user asked if there are plans to support the DeepSeek V2 API. This query indicates a keen interest in enhanced API functionalities.
-
Autocomplete in command line with FIM models: The user expressed a desire to use LLM CMD with FIM models to get inline autocomplete suggestions similar to Copilot but within the command line. They suggest it could run once and switch to a CMD FIM mode or simply handle completions with a prefix.
-
Function calling support in LLM: There was a question about whether LLM will support function calling when the model produces a function call token. This indicates the userās interest in advanced features that handle code execution.
LLM Finetuning (Hamel + Dan) ā· #fireworks (4 messages):
- Requests for credits support flood in: Multiple users are seeking assistance with their credits after filling out the form. Users provided their account IDs such as anopska-552142, as-ankursingh3-1-817d86, standonopenstds-ff2f17, and apratim941208-cc11fd but havenāt received their credits yet.
LLM Finetuning (Hamel + Dan) ā· #emmanuel_finetuning_dead (1 messages):
gitmaxd: great question
LLM Finetuning (Hamel + Dan) ā· #europe-tz (3 messages):
- Greetings in the Europe TZ channel: A member greeted the channel with āSalutareā and later switched to English, inquiring if anyone was interested in fine-tuning a model with Euro24 news and related data such as player/match stats. Another member responded with a friendly āHello from Mainz, Germany!ā
LLM Finetuning (Hamel + Dan) ā· #predibase (5 messages):
-
Office Hours Appreciation: āThanks everyone for coming to the office hours today! Lots of great questions, really appreciate everyone for taking the time!ā This acknowledgment highlights the communityās engagement and participation in a productive Q&A session.
-
DJL Topic from AWS Blog Post: A member raised questions during office hours about āDJLā from an AWS blog post titled āEfficient and Cost-Effective Multi-Tenant LoRA Serving with Amazon SageMaker.ā The blog post discusses the benefits and use cases of generative AI models, mentioning BloombergGPT and technologies like vLLM and an āoptimized rolling batch scheduler.ā
Link mentioned: Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker | Amazon Web Services: In this post, we explore a solution that addresses these challenges head-on using LoRA serving with Amazon SageMaker. By using the new performance optimizations of LoRA techniques in SageMaker large mā¦
LLM Finetuning (Hamel + Dan) ā· #openpipe (3 messages):
- Query about openpipe contact and credits: A member asked if anyone knew who to contact for openpipe as they hadnāt received their credits yet.
- Follow-up on email receipt inquiry: Another member asked if the first member had received the email regarding their credits.
- Update on second round of credits: It was mentioned that a second round of credits would be granted on the 14th, which is tomorrow.
LLM Finetuning (Hamel + Dan) ā· #openai (40 messagesš„):
-
Fine-Tuning Can Add New Knowledge: A lively discussion centered on fine-tuning models to add new knowledge, with multiple members emphasizing that this process can be costly but effective. Examples like āNous Hermesā were cited to illustrate successful multi-skill incorporation through fine-tuning.
-
GPT-Generated Data and Legal Limitations: Questions about the usage of GPT-generated data for training within enterprises sparked debate. The discussion referenced OpenAIās business terms, which restrict using output to develop competing AI models, providing a link to the policy.
-
BLEU Scoreās Relevance for Grammar: There was a query about the appropriateness of using BLEU scores to test grammar, with some members expressing doubt and criticism about its effectiveness.
-
Generating Synthetic Data for Fine-Tuning: The community was curious about the feasibility and legality of using synthetic data for fine-tuning AI models for commercial use. A link to a paper on ToolkenGPT was shared, discussing innovative approaches to combining tool demonstrations with in-context learning.
Links mentioned:
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration d...
- OpenAI Cookbook: Open-source examples and guides for building with the OpenAI API. Browse a collection of snippets, advanced techniques and walkthroughs. Share your own examples and guides.
- Fine-Tuned Q&A - collect data | OpenAI Cookbook: no description found
LLM Finetuning (Hamel + Dan) ā· #pawel-function-calling (92 messagesš„š„):
- Fireworks AI logo gets love: One member expressed their admiration for the Fireworks AI logo. Another member shared a curiosity about function calls and how they could be used for stream evaluations and hallucination management tools.
- Delta in model weights sparks interest: There was a discussion about the concept of ādeltaā, described as the difference between instruct and base weights in model fine-tuning. One member encouraged others to try model merging, calling it āfunā.
- Axolotl special tokens configuration: A member asked about adding special tokens in Axolotl for function calls, and others confirmed it can be done through configuration. They also discussed using the ātemplate-free formatā to avoid issues with chat prompt templates.
- LangGraph and LangChain debated: Various members shared their experiences with LangChain and LangGraph, with mixed feelings about their usability. Some praised LangChainās extensive ecosystem and support, while others preferred LangGraphās customization and independence from LangChain.
- Glaive Function Calling model introduction: Shared a link to glaive-function-calling-v1, a 2.7B parameter open-source chat model capable of multi-turn conversations and intelligent function execution, based on the Replit-code-v1-3b model.
Links mentioned:
- LangGraph (Python): This video series covers how to use code functionality of LangGraph, as well as common modifications one could want to make.
- Tweet from Git Maxd (@GitMaxd): LangGraph Learning Order: 1. LangGraph YouTube Learning Series: https://www.youtube.com/watch?v=5h-JBkySK34&list=PLfaIDFEXuae16n2TWUkKq5PgJ0w6Pkwtg 2. LangGraph Self-correcting code assistants with ...
- glaiveai/glaive-function-calling-v1 Ā· Hugging Face: no description found
- Join the Fireworks.ai Discord Server!: Check out the Fireworks.ai community on Discord - hang out with 2148 other members and enjoy free voice and text chat.
- Axolotl - Template-free prompt construction: no description found
- mergekit-gui - a Hugging Face Space by arcee-ai: no description found
- Llama-3 Function Calling Demo: no description found
- 'This was the last time. Until a next time. No chance.' - Commando: [Matrix declines an offer to resume his unit] John Matrix: This was the last time. Major General Franklin Kirby: Until a next time. [pause] John Matrix: No chance.
Nous Research AI ā· #interesting-links (5 messages):
-
LLM-driven objective discovery pushes the boundaries of preference optimization: A paper on arXiv explores offline preference optimization for LLMs using LLM-driven objective discovery to automatically find new optimization algorithms. This approach allows for the discovery of preference optimization algorithms without expert human intervention by iteratively prompting an LLM based on performance metrics.
-
Mixture-of-Agents (MoA) methodology harnesses multiple LLMsā collective expertise: Another paper on Hugging Face proposes using a Mixture-of-Agents architecture, where multiple LLM agents in layered configurations collaborate to enhance performance on various benchmarks. The MoA model outperforms GPT-4 Omni, achieving a remarkable score of 65.1% on AlpacaEval 2.0 compared to GPT-4 Omniās 57.5%.
-
GitHub repository for MoA methodology: The implementation of the Mixture-of-Agents model can be found on GitHub. Users can contribute to and explore the development of this promising approach to leveraging multiple LLMs.
Links mentioned:
- Discovering Preference Optimization Algorithms with and for Large Language Models: Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervis...
- GitHub - togethercomputer/MoA: Contribute to togethercomputer/MoA development by creating an account on GitHub.
- Paper page - Mixture-of-Agents Enhances Large Language Model Capabilities: no description found
Nous Research AI ā· #general (152 messagesš„š„):
-
Stable Diffusion 3 Early Reviews: Stable Diffusion 3 is receiving mixed feedback, with some users calling it impressive and others highlighting initial issues and bugs. One user mentioned that āit uses an LLM as an encoder,ā suggesting a different approach from traditional models.
-
Debate over GPT-4 Temperature Settings: A tweet revealed that GPT-4 performs better at temperature=1 than temperature=0, even on deterministic tasks, sparking surprise and debate. Users discussed how this doesnāt apply to fine-tuned Llama 3 models, with varying results for different tasks.
-
Uncensored Model Release: A user shared their uncensored version of the OpenHermes-2.5 dataset, removing 2,697 censored lines. There was a discussion on the impact of removing alignment constraints and its effect on model responses.
-
MatMul-Free LLMs: A new paper on eliminating MatMul operations in large language models was shared, boasting significant memory savings and strong performance at billion-parameter scales. The paper claims that MatMul-free models can match the performance of state-of-the-art Transformers while reducing memory usage by up to 61%.
-
Discussion on AI Model Jailbreaking: Haize Labs announced a method to automatically jailbreak AI models, uncovering safety violations in top AI systems by eliciting harmful content. This sparked discussions on the ethics and consequences of such actions in the AI community.
Links mentioned:
- Scalable MatMul-free Language Modeling: Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths...
- Stable Diffusion 3 Medium - a Hugging Face Space by stabilityai: no description found
- Testing Neuro's Morality with Slay the Princess (Part 1): Vedal attempts to make Neuro-sama play Slay the Princess.āŗTwitch: http://www.twitch.tv/vedal987āŗTwitter: https://twitter.com/Vedal987Edited by Paradomix
- Tweet from Kyle Corbitt (@corbtt): Crazy fact that everyone deploying LLMs should knowāGPT-4 is "smarter" at temperature=1 than temperature=0, even on deterministic tasks. I honestly didn't believe this myself until I trie...
- Tweet from Haize Labs (@haizelabs): Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure...
- Tweet from Jeremy Howard (@jeremyphoward): New paper just dropped, showing how to greatly increase math scores on LLMs by combining monte-carlo tree search (MCTS) with a language model. Nice! But... what if instead, we simply tell the LLM to ...
- Tweet from Max Woolf (@minimaxir): Taking a look at people testing out Stable Diffusion 3 and tbh this goes hard.
- Reddit - Dive into anything: no description found
- Reddit - Dive into anything: no description found
- Tweet from Kyle Corbitt (@corbtt): @eugeneyan @aidan_mclau ok well now I'm seeing wildly different results with my fine-tuned llama 3 models (that match my prior intuitions -- higher temp outperforms on creative tasks, lower temp o...
- Replete-AI/OpenHermes-2.5-Uncensored Ā· Datasets at Hugging Face: no description found
- Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters: Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsit...
Nous Research AI ā· #ask-about-llms (2 messages):
- Lost paper on interleaving pretraining and instructions: A member asked for help remembering a recent paper that interleaved pretraining documents and extracted instructions during finetuning. Another member expressed interest in being notified if the paper is found.
Nous Research AI ā· #rag-dataset (15 messagesš„):
-
No final RAG dataset schema yet: Queries about the final dataset schema for RAG reveal that it isnāt yet settled. āDoesnāt seem like itā and discussions on finalizing datagen pipeline confirm ongoing development.
-
Embedding text for better verification: Suggestions were made to link or show embedded texts to avoid hallucinations. One member noted, āyou could do a ctrl+f on the embeddings.ā
-
Marker used for PDF to markdown conversion: Itās confirmed that the raw text will be markdown as PDFs are being converted via Marker. This tool is praised for its accuracy but noted to be slow for inference time conversion.
-
Pandoc and make4ht for other document conversions: Another method discussed for converting various document types includes using Pandoc and make4ht for LaTeX files. These tools were suggested as alternatives for different formats.
-
Speed optimization recommendations: Speed improvements for Marker were suggested, such as setting min_length to avoid unwanted OCR. A member claimed about 1 page per second per worker on an A10 GPU, highlighting the potential for parallel processing.
Link mentioned: GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy: Convert PDF to markdown quickly with high accuracy - VikParuchuri/marker
Nous Research AI ā· #world-sim (6 messages):
- Open-source status remains closed for now: When asked if the project is open-source, it was confirmed, āitās not currently open-source.ā There are no immediate plans to change this status, although itās ātalked aboutā and might be reconsidered as development continues.
- Request for world-sim AI enhancements: A user suggested making the world-sim AI edgier and also proposed developing a free mobile port for the application. They expressed belief that a mobile version would be ācool.ā
Perplexity AI ā· #general (136 messagesš„š„):
-
Love for the New Search Feature: A user expressed excitement about the new search feature, saying, āLoving this new search,ā and immediately requested an iOS version, āNeed it in iOS now pls.ā
-
Concerns on Perplexityās Website Functionality: One member asked if others were experiencing view issues on Perplexityās website where the chat jumps halfway up the page after each answer.
-
Enterprise Support Woes: A user with an enterprise plan expressed frustration over support issues, mentioning that they were ādumped in the ticket system and never got a responseā despite emailing two weeks ago. Suggestions included contacting Alex or checking out the support page.
-
Discussions on API Limitations and Solutions: Members discussed the limitations of the Perplexity API, with one suggesting breaking down complex requests and another recommending using smaller models to handle tasks. An alternative API product in development was also mentioned by a member claiming their product might offer better scalability and features.
-
Frustrations with Rabbit Inc.: In a lengthy discussion, a user shared their negative experience with Rabbit Inc., criticizing their support and privacy policies, and linking to a Reddit post that details the issue. Another member noted that Perplexity does not require such proof to participate in discussions.
Links mentioned:
- Reddit - Dive into anything: no description found
- Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Perplexity AI ā· #sharing (8 messagesš„):
-
Elon Musk drops lawsuit against OpenAI: Elon Musk has withdrawn his lawsuit against OpenAI and its executives just a day before the court hearing. The lawsuit claimed that OpenAI shifted from its mission to a profit-driven entity, prioritizing Microsoft, its largest investor, over humanitarian goals source source.
-
First wooden satellite set to launch in 2024: Researchers from Kyoto University and Sumitomo Forestry unveiled LignoSat, the worldās first wooden satellite, which is expected to launch in September 2024. This project aims to evaluate wood as a sustainable material for space and reduce environmental impacts source source.
-
Email carbon footprint and environmental impact: The average carbon footprint of an email is around 4 grams of CO2, with significant increase when emails contain large attachments. Using file share links instead of attachments can reduce CO2 emissions source source.
-
Origin of the idiom ārock bottomā: The phrase āhit rock bottomā refers to reaching the lowest possible level and has been metaphorically used since the mid-19th century. It originally described the experience of miners hitting the solid rock layer beneath the soil source source.
-
Perplexity.aiās fast results with LLMs: Perplexity.ai leverages state-of-the-art hardware like AWS p4d instances with NVIDIA A100 GPUs and software optimizations such as NVIDIAās TensorRT-LLM to achieve fast results despite using large language models. Integration with AWS and Kubernetes facilitates elastic scaling, reducing downtime and network overhead source.
Links mentioned:
- YouTube: no description found
- where does the idiom "rock bottom" come from?: The idiom "hit rock bottom" originated from the literal meaning of reaching the solid bedrock or bottom layer of rock when digging or mining. It has been used...
- World's First Wooden Satellite: In a groundbreaking development, researchers from Kyoto University and Sumitomo Forestry have unveiled LignoSat, the world's first wooden satellite, set to...
- Musk Drops Lawsuit Against OpenAI: Elon Musk, co-founder of OpenAI, has withdrawn his lawsuit against the company and its executives, Sam Altman and Greg Brockman, just one day before a...
- Email's Carbon Footprint: Studies show that the average carbon footprint of an email is around 4 grams of CO2 equivalent, with the footprint increasing significantly if the email...
- how does perplexity.ai fetch fast results despite using LLMs which are slow: Perplexity.ai achieves fast results despite using large language models (LLMs) through a combination of state-of-the-art software and hardware. The key...
- Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Perplexity AI ā· #pplx-api (14 messagesš„):
- Connection Issues with Custom GPTs: A member reported issues with Custom GPTs saying āit couldnāt connectā when trying to perform function calls on chat.openai.com. Another member suggested checking if the PPLX API key is added in the Auth panel when creating the GPT.
- Desktop App vs. Web Issues: It was noted that Custom GPTs were not functioning on the web version of chat.openai.com, but they worked normally on the desktop app. This implies there might be a platform-specific issue affecting API calls.
- Still No Resolution: Despite confirming the correct use of the API key and trying recommended solutions, a member continued to face the error with no response when using Custom GPTs. Another member indicated that their setup under the same conditions worked fine, suggesting the problem might be isolated.
CUDA MODE ā· #general (3 messages):
- Question on Compute Intensity Metrics: A member asked whether, in practice, only floating point operations done on data accessed from Global Memory are considered for compute intensity (operations per byte accessed), as opposed to operations on data that exists solely in registers. They followed up with a self-referential note, humorously admitting they had typed the query a week prior but hadnāt sent it.
CUDA MODE ā· #triton (2 messages):
-
Install Triton 3.0 with ease: A user shared a hassle-free guide on installing Triton 3.0 from source. This involves uninstalling the current Triton version, cloning the repository, installing dependencies, and running the setup script.
-
Alternative Triton installation from PyTorch: Another member suggested a method involving cloning the PyTorch repository and using the command
make triton
. The specific version can be found in the triton_version.txt file.
Links mentioned:
- Installing Triton 3.0.0: As of June 13 2024, to get Triton 3.0 you have to install it from source, like so:
- GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
- pytorch/.ci/docker/triton_version.txt at main Ā· pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
CUDA MODE ā· #torch (13 messagesš„):
-
Curiosity about fast 8-bit optimizer with PyTorch: A member asked if itās possible to create a fast 8-bit optimizer using only pure PyTorch and torch.compile. They highlighted the importance of creating lookup tables dynamically and performing calculations in FP32 as seen in the bitsandbytes implementation.
-
Drop-in replacement concerns: Another member discussed the potential challenges of making a ādrop-in replacementā for 8-bit optimizers, given possible accuracy deviations from 32-bit calculations. However, they noted that in practice, they observed no accuracy drop when using 8-bit AdamW from bitsandbytes.
-
Pure PyTorch + Triton version of 8-bit optimizers: A member working on such an implementation proposed a pure
PyTorch
+Triton
version of thebitsandbytes
8-bit optimizers. The goal is to avoid using custom CUDA kernels. -
Long-term roadmap for bitsandbytes: Another contributor acknowledged the possibility of integrating bitsandbytes with
torch.compile
for better compatibility. They pointed out the crucial aspects of the implementation, like quantization maps and ensuring operations are performed in FP32.
Link mentioned: 8-Bit Approximations for Parallelism in Deep Learning: The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communicationā¦
CUDA MODE ā· #algorithms (2 messages):
- 1-bit LLM promises rapid quantization: A link to the BiLLM GitHub project was shared, which claims to quantize 7B LLMs in 0.5 hours with a single GPU. One member noted, āseems like no fused kernel so no speedup (yet).ā
- Upward trend in benchmark accuracy observed: A member observed that benchmark accuracy appears to trend upwards. They seemed intrigued by this pattern, but no further details were discussed.
Link mentioned: GitHub - Aaronhuang-778/BiLLM: (ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs: (ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs - Aaronhuang-778/BiLLM
CUDA MODE ā· #cool-links (1 messages):
_shivasinghbagri: https://powerinfer.ai/v2/
CUDA MODE ā· #torchao (31 messagesš„):
-
Struggles with Speed-Up on 4090: Despite setting influential flags and using multiple approaches like
torch.matmul
andtorch.nn.functional.linear
, a member couldnāt achieve a good speed-up on an RTX 4090 GPU during INT8 and FP16 mixed matrix multiplications. They experimented with various configurations and observed slower performance. -
FP8 and INT8 Matrix Multiplication: Discussions revealed that FP8 x FP8 matmuls resulted in slower operations compared to INT8, due to slow casting processes. INT8 quantization seems to provide better speed and accuracy and is supported even on older GPUs like Pascal.
-
INT6 Quantization Shows Promise: INT6, using a group size of 64, showed minimal accuracy degradation in tests with Llama3-8B-instruct models. Members noted that INT6 and INT4 quantizations have promising performance but require careful handling to maintain accuracy.
-
Exploring Mixed Precision with FP16/FP8: Mixed precision operation FP16 x FP8 was discussed, highlighting the potential benefits and existing support in Microsoftās BitBLAS. Implementation and integration challenges were mentioned, particularly with the necessity to compile TVM.
-
Microsoft BitBLAS Implementation: BitBLAS supports mixed precision matrix multiplications such as FP16 x FP8E4M3, but it relies heavily on TVMās Python interface. This requirement complicates the integration with other libraries, making it less accessible for some projects.
Links mentioned:
- pytorch/test/test_matmul_cuda.py at f9c9c06c4478fbfbbf6986f21410ecac37d3f63e Ā· pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
- torch_compile_mixed_mm_test.py: GitHub Gist: instantly share code, notes, and snippets.
- ao/torchao/quantization/subclass.py at 6f44d259fabe1195669654e22f7f97fc028f89af Ā· pytorch/ao: Native PyTorch library for quantization and sparsity - pytorch/ao
CUDA MODE ā· #llmdotc (54 messagesš„):
-
Keeping norm kernel calls in check: A member mentions that they pushed a change to control the number of additional norm kernel calls and suggests a similar strategy for the adam kernel could be beneficial.
-
Exploring C++20 possibilities: Discussion about achieving similar functionality using
std::span<const float *, N>
. A member points out it might require changes at the call site or utilizing a constructor already present. -
Race condition fixes for loss infinities: The loss infinities issue was identified as a race condition. The fix combined with removing the atomic add provides full determinism and slight performance improvement when profiled on 8xH100s.
-
Random data loading improvements: PR for introducing randomness in data loading to ensure better shuffling and processing. Observations are made about how document boundaries and data batching could impact training performance, with a histogram of document lengths shared to provide insights.
-
Discussing batch and document similarity: Debate on the impact of semantic similarity in document batching during training. Some members suggest it aids in in-context learning, while thereās skepticism about its effect on generalization.
Links mentioned:
- Dataloader - introducing randomness by gordicaleksa Ā· Pull Request #573 Ā· karpathy/llm.c: On the way to fully random train data shuffling... This PR does the following: Each process has a different unique random seed Each process train data loader independently chooses its starting sha...
- Add master weights to resume state by gordicaleksa Ā· Pull Request #522 Ā· karpathy/llm.c: We're currently not saving master weights as part of the state -> we lose some precision because otherwise when we resume we'll have to reconstruct the master weights by upcasting from lowe...
CUDA MODE ā· #rocm (1 messages):
- AMDās MI300X outperforms NVIDIAās H100: A recent TensorWave blog post shares exciting benchmarks showing AMDās new MI300X accelerator achieving 33% higher throughput for LLM inference compared to NVIDIAās H100 SXM. This initial success is based on MK1ās inference software running vLLM on Mixtral 8x7B, indicating that despite a less mature software ecosystem, AMDās hardware is a formidable competitor.
Link mentioned: AMDās MI300X Outperforms NVIDIAās H100 for LLM Inference: Discover if AMDās MI300X accelerator can outperform NVIDIAās H100 in real-world AI workloads. Early results are in!
CUDA MODE ā· #bitnet (13 messagesš„):
- Unresolved MX Format Test Causes Build Failure: The build process is failing due to an unrelated mx format test. Attempts to retry the test were unsuccessful, leaving the issue unresolved.
- CoffeeVampir3 Shares Testing File: When asked about a specific file for testing Bitnet, a member provided a link to bitnet_trained_to_ao_test.py.
- Release Branch Deadline and Nightly Builds: It was mentioned that any work to be included in the 0.3 release branch must be merged by next Tuesday. Alternatively, there is the option to continue experimenting with nightly builds.
- Refactoring Uint2 for Bitnet: Refactoring of Uint2 is in progress and is expected to be completed by tomorrow. This work is unblocking further developments now that bitpacking for Bitnet is operational.
LM Studio ā· #š¬-general (71 messagesš„š„):
- Discussing LM Studio on Web Servers: A member asked, ācan LM Studio run on a web server?ā Another member clarified, āIf itās a remote headless server, it wonāt work.ā
- Error in Loading Qwen-2 Model: A user encountered an error with the Qwen-2 model on LMChat. The solution was to use the ChatML preset and enable Flash Attention, as advised by another user.
- Model Recommendations for Translation: Members discussed the best LLM models for translation. Qwen2 and Aya 23 were suggested, though they were noted as not perfect.
- Concerns About LM Studio Developers: A user expressed concerns over the anonymity of LM Studioās developers. It was clarified that the lead dev and founder can be found on GitHub.
- Enabling Flash Attention through CLI: A user sought to enable Flash Attention from the command line interface. It was noted that the LMS CLI lacks this feature, but llama.cpp as an alternative was suggested.
Link mentioned: GitHub - andrewyng/translation-agent: Contribute to andrewyng/translation-agent development by creating an account on GitHub.
LM Studio ā· #š¤-models-discussion-chat (8 messagesš„):
-
Gemini 1.5 flash struggles in JSON mode: āGuys, a bit OT here but⦠has anyone tried gemini 1.5 flash? json mode fails badly and randomly, producing empty strings.ā A member is experiencing issues with Gemini 1.5 flash in JSON mode and is seeking feedback from others.
-
Tess 2.5 72b quant models released on HF: āTess 2.5 72b q3 and q4 quant gguf are live on HF.ā These new quant models are now available on Hugging Face.
-
VRAM limitations affect model choices: āBut with 8GB of VRAM youāre not going to have a lot of options. For small models, perhaps Llama is a good place to start.ā A member advises another on VRAM limitations and suggests smaller models like Llama.
-
Model recommendation for grammar and syntax issues: āThe last time I tried to use a model to review my content for grammar and syntaxes problems,ā a member is seeking recommendations for a model that can accurately review grammar and syntax, especially recognizing dialogue.
-
LM Studio supports GGUF, not safetensor files: āhi! does lmstudio support multipart safetensor files ?ā āSafetensor no, only GGUF.ā A member confirms that LM Studio only supports GGUF format and not safetensor files.
LM Studio ā· #š§ -feedback (2 messages):
- Direct AVX2 Error Solution Offered: A member pointed out that the error message encountered is due to lacking AVX2 instructions. This directs users to check their CPUās AVX2 support.
LM Studio ā· #š-hardware-discussion (9 messagesš„):
- P40 Prices Spike Up: The electronic scrap P40 used to be around $75 on flea markets in China, but now it costs over $200. This surge contrasts with a brand-new 4060Ti 16G, which costs around $480.
- Sanctions Affect Russian P40 Stock: A member noted that sanctions might be reducing the stock of P40s in Russia. They humorously added it shows how sanctions are working in a weird way.
- Home Server Build Insights: A user inquired about server specs for a home setup. Another shared their configuration: ālinux workstation w/ R3700X, 128GB RAM, RTX 4090, 2x 1TB SSD, and some 3.5ā SATA HDDsā.
Eleuther ā· #general (16 messagesš„):
-
LLAMA3 70B generates diverse outputs: A user reported that when prompting LLAMA3 70B from an empty document (
<|begin_of_text|>
), the model generates ā60% evals in AI2ās arc format, 20% wiki text, and 20% codeā. This indicates that LLAMA3 may have undergone final eval tuning for specific formats. -
Help requested for debugging VLLM: A member asked for help in debugging VLLM, to which another replied that they are not a pro but have explored the repository.
-
Finetuning BERT for long texts: A user sought advice on finetuning a BERT model for 6000-word inputs using a sliding window approach. They were directed to specific resources, including a NeurIPS paper and its implementation.
-
Mixture of Millions of Memory Experts as RAG alternative: A user mentioned a research paper on adopting a Mixture of Millions of Memory Experts for factual memorization and reducing hallucinations as an alternative to RAG. Another member felt this approach might be redundant, as it has likely been tried before.
Links mentioned:
- Open Engineer: A free educational platform offering AI learning through hands-on projects, practical tools, and foundational theory.
- Lamini-Memory-Tuning/research-paper.pdf at main Ā· lamini-ai/Lamini-Memory-Tuning: Banishing LLM Hallucinations Requires Rethinking Generalization - lamini-ai/Lamini-Memory-Tuning
Eleuther ā· #research (52 messagesš„):
-
Samba-3.8B achieves perfect retrieval: Microsoftās Samba model trained on 3.2 trillion tokens significantly outperforms Phi3-mini in major benchmarks. It maintains linear complexity with respect to sequence length while achieving long-context retrieval abilities.
-
Challenges in passkey retrieval: Members discussed Sambaās ability to handle passkey retrieval with 256k sequence lengths. The conversation touched on the efficiency of Mamba layers and SWA in handling long-term and local information respectively, with rope potentially influencing results.
-
Search for a neural network paper: One user tried to recall a paper on neural network behavior with OOD data, mentioning concepts like āmean reversion.ā Another user quickly found the likely paper and shared the link along with additional resources.
-
Self-synthesized high-quality instruction data: A new method named Magpie was introduced, which synthesizes instruction data by prompting an aligned LLM with just the start-of-user-input template. This self-synthesis method aims to generate large-scale alignment data, bypassing the need for manual data creation.
-
Discussion about tying embedding and unembedding layers: A member asked about the drawbacks of tying embedding and unembedding layers. Another member shared a LessWrong post which explains that modern LLMs have moved away from this practice, although empirical data was lacking.
Links mentioned:
- What If We Recaption Billions of Web Images with LLaMA-3?: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training acros...
- Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinde...
- An Empirical Study of Mamba-based Language Models: Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requir...
- Image and Video Tokenization with Binary Spherical Quantization: We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). BSQ projects the high-dimensional visual embedding to a lower-dimensional hypersphere and then ap...
- Parallelized Computation and Backpropagation Under Angle-Parametrized Orthogonal Matrices: We present a methodology for parallel acceleration of learning in the presence of matrix orthogonality and unitarity constraints of interest in several branches of machine learning. We show how an app...
- Deep Neural Networks Tend To Extrapolate Predictably: Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural...
- Calibrated Language Models Must Hallucinate: Recent language models generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm pe...
- Tweet from Sakana AI (@SakanaAILabs): Can LLMs invent better ways to train LLMs? At Sakana AI, weāre pioneering AI-driven methods to automate AI research and discovery. Weāre excited to release DiscoPOP: a new SOTA preference optimizatio...
- CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibili...
- LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space ā LessWrong: This post is written as an explanation of a misconception I had with transformer embedding when I was getting started. Thanks to Stephen Fowler for tā¦
Eleuther ā· #lm-thunderdome (10 messagesš„):
-
Debate on acc_norm by tokens vs bytes for same tokenizer: A member questioned the appropriateness of using acc_norm by tokens instead of bytes when comparing models with the same tokenizer. Another member noted that āA lot of people do itā and recommended referencing the GPT-3 paper for more details.
-
Generate_until issues with Qwen1.5-7B-Chat: One member reported that 670 out of 749 questions returned empty responses while running the task truthfulqa_gen on Qwen1.5-7B-Chat at dtype=āfloatā and included a pastebin log for reference. Another member suggested that the omission of
--apply_chat_template
could be the cause, citing prompt formatting issues and recommended using--fewshot_as_multiturn
as a possible fix.
Link mentioned: log_samples truthfulqa_gen - Pastebin.ai: no description found
Eleuther ā· #multimodal-general (1 messages):
yash05880: icydk https://laion.ai/blog/open-flamingo/
LlamaIndex ā· #blog (3 messages):
-
PingCap showcases RAG app using TiDB and LlamaIndex: Our friends at PingCap have crafted a high-quality RAG application centered on their TiDB database utilizing LlamaIndexās knowledge graph feature. Itās open source, and you can try it out here or check the code on GitHub.
-
Talk Alert: AI Infrastructure Meetup in Paris: Join Pierre-Loic Douclet from LlamaIndex, along with speakers from Gokoyeb and Neon, at an AI Infrastructure Meetup at Station F in Paris on Wednesday, June 19. More details and registration here.
-
Mixture-of-Agents (MoA) enhances LLMs capabilities: A study by TogetherCompute shows that a Mixture-of-Agents (MoA) setup using exclusively open-source LLMs can significantly enhance task capabilities. Learn more about the potential of MoA here.
Links mentioned:
- AI Infrastructure Ā· Luma: š£ Calling all AI developers and infrastructure enthusiasts! We are excited to announce an AI Infrastructure Meetup on Wednesday, June 19! The Linuxā¦
- TiDB.AI: no description found
LlamaIndex ā· #general (61 messagesš„š„):
-
Handling Large Embedding Indexes: A member struggling with query times for an index containing 170,000 embeddings is advised to use vector databases like Qdrant or FAISS Index to speed up retrieval times. They shared detailed FAISS index and query code, seeking help with an AssertionError during querying.
-
Nodes Labeled as āUnknownā: Users are discussing why some nodes in the property graph display as āUnknown.ā Suggestions include possible issues with the implicit extractor and the source/parent relationship, and a recommended fix using
pip install -U llama-index-graph-stores-neo4j
. -
Extracting Nodes from VectorStores: A user trying to extract nodes from a VectorStoreIndex built with Chroma encounters empty dictionaries and learns to use
vector_store._collection.get()
to retrieve nodes. They seek further help on performing this task directly from a VectorStoreIndex. -
Filtering Node Retrieval Results: Users discuss different methods to filter results from
index.as_retriever().retrieve()
, focusing on metadata filters, similarity postprocessors, and the use ofget_nodes()
for specific vector stores like Qdrant and Chroma. -
Qdrant Database Node Retrieval: A user working with a Qdrant database containing law text entries asks about retrieving adjacent nodes (previous and next articles) during a query. The suggestion involves using node relationships and the new API method in the latest Qdrant vector store.
Links mentioned:
- Tools - LlamaIndex: no description found
- llama_index/llama-index-core/llama_index/core/evaluation/context_relevancy.py at f5263896121721de1051ce58338a1e0ea6950ca7 Ā· run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- Node Postprocessor - LlamaIndex: no description found
- Node Postprocessor Modules - LlamaIndex: no description found
- Node Postprocessor - LlamaIndex: no description found
LlamaIndex ā· #ai-discussion (1 messages):
- Embedding PDFs to Weaviate with LLM-Index: A member asked for suggestions on embedding PDFs and documents to a Weaviate vector database using LLM-Index. The message indicates an ongoing project but does not include further details or responses.
Cohere ā· #general (56 messagesš„š„):
- Coral rebranded to Command-R: A user inquired about Coralās status, and it was clarified that itās now called Command-R, but both the original Command-R and Coral are still functioning.
- Model usage parameters and prompt controls: Discussions highlighted differing practices in model parameter adjustments, with some users favoring prompt engineering over parameter tuning for better control, while others shared specific configurations they found effective.
- Clarification on acceptable use policy: Users shared and questioned the Cohere Acceptable Use Policy, seeking clarity on private versus commercial use with personal projects being permissible for showcasing to employers.
- Internal server errors with the API: Users experiencing
ApiError: status_code: 500
errors discussed potential causes, including errors within specific prompts versus request issues, with others suggesting checking request details. - Trial key limitations and access issues: Users reported issues with trial keys stating insufficient permissions and hitting usage limits, while others confirmed similar experiences but noted success with production keys.
Links mentioned:
- Terms Of Use: Access and use Cohere's Natural Language Processing and Large Language Models with confidence, knowing that you are fully informed of our Terms of Use.
- C4AI Acceptable Use Policy: no description found
Cohere ā· #project-sharing (3 messages):
- Glarb-Glarb and Fluent API preferences: One member humorously mentioned āGlarb-Glarbā and expressed their liking for the Fluent API.
- Congratulations on release and thanks for Cohere support: Another member congratulated someone on their release and expressed gratitude for adding Cohere support, ending with a salute emoji.
LAION ā· #general (41 messagesš„):
-
Stable Diffusion struggles to generate images of women: A discussion highlighted that the StabilityAIās Stable Diffusion model struggles to generate naked or even clothed images of women due to heavy censorship. Users suggested waiting for custom checkpoints and using second-pass img2img with SD1.5 to achieve desired results. Discussion link.
-
Celebration for Luma AIās new text-to-video model: Users congratulated a member for Luma AIās release of Dream Machine, a text-to-video model. Despite some initial server issues, the model shows promise, though users noted its performance varies with complex prompts. Try Dream Machine.
-
Comparison of different AI models: Thereās a shared link comparing SD3 Large, SD3 Medium, Pixart Sigma, DALL E 3, and Midjourney. This post discusses the reopening of the /r/StableDiffusion subreddit and ongoing issues with Redditās API changes. Comparison link.
-
Interest in video tokenization models: A user inquired about the best open-source models for converting video into sequences of tokens, similar to VAE. Another user admitted they could only help with basic video editing tasks like splitting videos and removing watermarks. Tweet link.
-
Luma Videoās potential: Users remarked that Lumaās Dream Machine video generation has potential, although it is inconsistent. There are ongoing discussions about its capabilities and room for improvement.
Links mentioned:
- stabilityai/stable-diffusion-3-medium Ā· The model does not create an image of naked women: no description found
- Tweet from Blaine Brown  (@blizaine): Dream Machine from @LumaLabsAI really brings memes to life! A thread š§µ
- Tweet from dome | Outlier (@dome_271): LET'S GOOOOOOOOOOO SO HAPPY TO SHARE OUR FIRST TEXT-TO-VIDEO MODEL! https://lumalabs.ai/dream-machine Quoting Luma AI (@LumaLabsAI) Introducing Dream Machine - a next generation video model ...
- Reddit - Dive into anything: no description found
- Tweet from Blaine Brown  (@blizaine): no description found
LAION ā· #research (14 messagesš„):
-
AIW Study Exposes Model Instability: Discussing variations in AIW problems, the study revealed dramatic breakdowns in models like GPT-4o when faced with minor changes, such as altering numerical values. The analysis emphasizes that āGPT-4o (and all other models) is NOT robust to AIW variations,ā highlighting core deficiencies in reasoning capabilities (source).
-
DataComp-1Bās Improved Captions: A link shared from Haoqin Tuās personal site details work on enhancing textual descriptions for noisy web-crawled image-text pairs, improving model training through alignment (source) (paper).
-
Masked Diffusion Models Simplified: A shared paper presents a simple and general framework for masked diffusion models, showing superior performance over prior models at GPT-2 scale when evaluated by perplexity (source).
-
Recaptioning and Dataset Updates: Multiple mentions indicate active work in recaptioning datasets such as CC12M and GBC10M, with links provided to complementary resources on Hugging Face (GBC10M, CC12M LLaVaNext).
Links mentioned:
- What If We Recaption Billions of Web Images with LLaMA-3?: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training acros...
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models: Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, w...
- What If We Recaption Billions of Web Images with LLaMA-3?: no description found
- Simplified and Generalized Masked Diffusion for Discrete Data: Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnec...
LAION ā· #resources (1 messages):
sidfeels: <@&825830190600683521>
LAION ā· #learning-ml (1 messages):
.michu7: <@&825830190600683521>
LAION ā· #paper-discussion (1 messages):
zuwop21: 50$ from steam steamcommunity.com/glft/918524 @everyone
LangChain AI ā· #general (49 messagesš„):
-
Seeking speech-to-text datasets and solutions: A member asked for recommendations on speech-to-text datasets with MP3 files and original transcripts with diarization. They also mentioned having used AWS Transcribe, OpenAI Whisper, and Deepgram Nova-2 and are looking for the best available solution in this space.
-
Handling empty tool calls in chains: A user is dealing with an issue in which a chain throws an error when the ātool_callsā list is empty. They are seeking advice on managing scenarios where user input does not require tool usage, ensuring the chain can handle simple responses without errors.
-
Evaluating LLM message similarity: Another user inquired about how LangChain handles the similarity measure between LLM messages and expected script messages. The response explained that LangChain can use string distance metrics or embedding distance metrics to evaluate this similarity, with practical code examples provided.
-
Design patterns for state management in LangGraph: There was an extended discussion on managing state in LangGraph, with particular interest in integrating user ID and thread ID into the state. The conversation centered around whether state should include the entire conversation history or only recent interactions, offering detailed examples and best practices.
-
Handling human intervention and streaming responses in LangChain: Members discussed methods for managing human intervention in LangGraph workflows and resuming conversations seamlessly. There were also questions about streaming responses while retaining chat history, with code examples provided to demonstrate maintaining context across multiple chat turns.
Links mentioned:
- Chat Bot Feedback Template | š¦ļøš LangChain: This template shows how to evaluate your chat bot without explicit user feedback. It defines a simple chat bot in chain.py and custom evaluator that scores bot response effectiveness based on the subs...
- Use LangChain off-the-shelf evaluators (Python only) | š¦ļøš ļø LangSmith: Before diving into this content, it might be helpful to read the following:
- Issues Ā· langchain-ai/langchain: š¦š Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
- How to migrate from legacy LangChain agents to LangGraph | š¦ļøš Langchain: Here we focus on how to move from legacy LangChain agents to LangGraph
- Issues Ā· langchain-ai/langchain: š¦š Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
- How to add chat history | š¦ļøš LangChain: In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logi...
- How to migrate from legacy LangChain agents to LangGraph | š¦ļøš Langchain: Here we focus on how to move from legacy LangChain agents to LangGraph
- Issues Ā· langchain-ai/langchain: š¦š Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
LangChain AI ā· #share-your-work (2 messages):
- Tiny AI Client simplifies LLM use: A member shared a project on GitHub called tiny-ai-client, describing it as a ātiny LLM client for simple use cases,ā supporting tools and vision for OAI, Anthropic, and Gemini models. They expressed hope that it would be helpful to others in the community.
- Run LLM locally with Docker and Ollama: Another member created a YouTube video demonstrating how to run LLMs locally using Docker and Ollama. They invited feedback on their video, titled āExĆ©cuter des LLMs en local avec Docker et Ollama.ā
Link mentioned: ExƩcuter des LLMs en local avec Docker et Ollama: no description found
LangChain AI ā· #tutorials (1 messages):
- New tutorial on setting up LLM on Google Colab: A member shared a comprehensive tutorial on GitHub for setting up LLM on Google Colab, taking advantage of the free 15GB Tesla T4 Colab GPU. The tutorial includes instructions for both GPU-accelerated and CPU-only inference here.
Link mentioned: GitHub - casualcomputer/llm_google_colab: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session.: A tutorial on how to set up a LLM on Google Colab for both GPU-accelerated and CPU-only session. - casualcomputer/llm_google_colab
Modular (Mojo š„) ā· #general (7 messages):
- Windows support hinted for Fall release: A member inquired about when Windows support would be available. Another responded, guessing an end of summer or Fall release and referenced an upcoming YouTube livestream for updates.
- WSL as a workaround for Windows users: A member reported successfully using WSL (Windows Subsystem for Linux) as a temporary workaround for Mojo development on Windows. However, the asker preferred a native Windows build environment to handle PE format.
Link mentioned: Modular Community Livestream - New in MAX 24.4: MAX 24.4 is now available! Join us on our upcoming livestream as we discuss whatās new in MAX Engine and Mojoš„ - MAX on macOS, MAX Engine Quantization API, ā¦
Modular (Mojo š„) ā· #š„mojo (31 messagesš„):
- Non-empty Strings in Mojo are Truthy: A user discovered that any non-empty string in Mojo evaluates as truthy, leading to some unexpected behavior in their code. This was clarified with explanations that constants like
String.ASCII_LOWERCASE
are non-empty strings, thus always evaluating to True. - Mojo LSP Configuration in Neovim: There was a query about setting up Mojo LSP in Neovim. It was answered with a GitHub link confirming that it comes pre-installed with Neovim.
- Matrix Multiplication Performance in Mojo vs. Python: A user discussed performance benchmarks comparing Mojo and Python for small, fixed-sized matrix multiplications. They found that Mojo performed significantly faster due to Pythonās overhead in using libraries like numpy for small calculations.
- Iterators and Loop Behavior in Mojo: Some users discussed the reassignment behavior of loop variables in Mojo, particularly the use of
for
loops and modifyingi
inside the loop. It was suggested to usewhile
loops if persistent modifications are needed. - Stdin and Stdout in Mojo: There was a question about the lack of
stdin
support in Mojo, which was confirmed as not currently available.
Link mentioned: nvim-lspconfig/doc/server_configurations.md at master Ā· neovim/nvim-lspconfig: Quickstart configs for Nvim LSP. Contribute to neovim/nvim-lspconfig development by creating an account on GitHub.
Modular (Mojo š„) ā· #nightly (13 messagesš„):
-
Conditional non-conformance discussed: A member suggested the need for conditional non-conformance in order to āget rid of implicit copies that can be made by dereferencing.ā Another member mentioned this might be handled by the ExplicitlyCopyable trait and CollectionElementNew.
-
Compiler limitation troubleshooting: When a member asked if a compiler error was a bug, another clarified it was likely a compiler limitation. A suggested workaround was to use
fn (Optional[T]) capturing -> Bool
while assuming the argument would never be none. -
New nightly Mojo compiler release announced: The latest nightly Mojo compiler release, version
2024.6.1305
, was announced with changelog and raw diff links. A member humorously remarked on remembering to update withmodular update nightly/max
instead ofmodular update nightly/mojo
. -
Alias for updates suggested: In response to the update command confusion, a member suggested setting up an alias to simplify the process. This was humorously acknowledged as a good example of āActual Intelligence.ā
-
Conditional non-conformance discussed: A member suggested the need for conditional non-conformance in order to āget rid of implicit copies that can be made by dereferencing.ā Another member mentioned this might be handled by the ExplicitlyCopyable trait and CollectionElementNew.
-
Compiler limitation troubleshooting: When a member asked if a compiler error was a bug, another clarified it was likely a compiler limitation. A suggested workaround was to use
fn (Optional[T]) capturing -> Bool
while assuming the argument would never be none. -
New nightly Mojo compiler release announced: The latest nightly Mojo compiler release, version
2024.6.1305
, was announced with changelog and raw diff links. A member humorously remarked on remembering to update withmodular update nightly/max
instead ofmodular update nightly/mojo
. -
Alias for updates suggested: In response to the update command confusion, a member suggested setting up an alias to simplify the process. This was humorously acknowledged as a good example of āActual Intelligence.ā
Interconnects (Nathan Lambert) ā· #news (33 messagesš„):
- OpenAIās revenue skyrockets without Microsoftās help: Revenue at OpenAI has nearly doubled in the last six months, and it primarily comes from direct sales of ChatGPT and other OpenAI products rather than from Microsoftās sales. Source.
- Sakana AI introduces DiscoPOP: Sakana AI has launched DiscoPOP, a new state-of-the-art preference optimization algorithm discovered and written by an LLM, showcasing AI-driven methods to automate AI research. Check the details and explore the paper and GitHub repo.
- Mira Murati comments on OpenAIās models: Mira Murati mentioned that the AI models available in OpenAIās labs are not significantly more advanced than the ones publicly available. Source.
- Rumors about Rekaās acquisition: Reka was rumored to be acquired by Snowflake for a billion dollars, but shortly after, Reka announced a long-term partnership with Shutterstock. This speculation highlights the dynamic nature of acquisitions in the AI industry.
- Apple and OpenAIās surprising partnership: Appleās partnership with OpenAI focuses more on promoting OpenAIās brand and technology across its devices rather than generating substantial initial revenue. Detailed article.
Links mentioned:
- Bloomberg - Are you a robot?: no description found
- Tweet from Amir Efrati (@amir): News: Revenue at OpenAI ~doubled~ in the last 6 months. And while many ppl thought a lot of revenue was coming from Microsoft selling OpenAI tech and giving the startup a cut⦠au contraire. This r...
- Tweet from Sakana AI (@SakanaAILabs): Can LLMs invent better ways to train LLMs? At Sakana AI, weāre pioneering AI-driven methods to automate AI research and discovery. Weāre excited to release DiscoPOP: a new SOTA preference optimizatio...
- Tweet from Tsarathustra (@tsarnick): Mira Murati says the AI models that OpenAI have in their labs are not much more advanced than those which are publicly available
Interconnects (Nathan Lambert) ā· #ml-drama (4 messages):
- Unusual Paper Submission Discussed: A member shared a tweet from Cosmin Negruseri and linked to an arXiv paper 2404.07221 noting an unusual aspect. Another member humorously responded, expressing surprise with āHahahhahaha what.ā
Link mentioned: Tweet from Cosmin Negruseri (@cosminnegruseri): havenāt seen this in a paper before
Interconnects (Nathan Lambert) ā· #random (6 messages):
-
Nvidia Nemotron hype grows: A member shared a tweet suggesting the potential release of Nvidia Nemotron. This tweet initiated excitement among the community about the new hardware.
-
Latest research paper on speech models: A member posted a link to a research paper on arXiv authored by several researchers including Jupinder Parmar and Shrimai Prabhumoye. The paper explores advancements in speech modeling.
-
Insights from insider connection: Another member commented, ātheir head of alignment is a paid sub and friend but not in the discord I think.ā This sparked curiosity about potential inside information related to the project discussed earlier.
Links mentioned:
- Tweet from SebastianBoo (@SebastianB929): Seems like Nvidia Nemotron Quoting Xeophon (@TheXeophon) june-chatbotš
- Nemotron-4 15B Technical Report: We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multil...
Interconnects (Nathan Lambert) ā· #nlp (7 messages):
- Samba 3.8B smashes benchmarks: A tweet by Liliang Ren introduces Samba 3.8B, a combination of Mamba and Sliding Window Attention architecture that significantly outperforms Phi3-mini on major benchmarks like MMLU, GSM8K, and HumanEval. The architecture boasts an infinite context length with linear complexity (paper here).
- SSMs still hold ground: Members express relief and optimism that SSMs (Structured State Machines) are still being actively developed and discussed. One member humorously remarks that they are 50/50 on whether they will continue, noting the odds are decent.
- Hybrids may be the future: There is a consensus that trends are moving towards hybrid SSM/transformer architectures, citing the logic that attention isnāt required for every layer.
Link mentioned: Tweet from Liliang Ren (@liliang_ren): Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.š® And it has an infinitā¦
Latent Space ā· #ai-general-chat (43 messagesš„):
-
Haize Labs unveils AI jailbreak detection: A new startup, Haize Labs, has announced a manifesto focusing on preemptively discovering and eliminating failure modes in AI systems. They showcased their ability to jailbreak the safety guardrails of industry-leading AI models, exposing serious safety violations.
-
Open Repro of iPad Calculator by tldraw: tldraw shared an open reproduction of Appleās iPad calculator, sparking interest and admiration. The tldraw team, though small, continually impresses with their innovative thought experiments and rapid demonstrations.
-
Article on Amazonās AI struggles: An article shared by cakecrusher discusses how Amazonās culture and operations led to their lag in conversational AI development. Former employees mentioned issues like byzantine build/deploy systems and a product-first mentality hindering long-term development.
-
OpenAIās Revenue Soars: OpenAI has reached an annualized revenue of $3.4 billion, as shared by @deedydas. Discussions ensued about the profitability and potential burn rate associated with such impressive revenue figures.
-
Argilla joins Hugging Face: Argilla announced their integration with Hugging Face, promising enhanced synergies for generating valuable datasets and content. The partnership aims to amplify both teamsā efforts in building great products and fostering innovation in the AI space.
Links mentioned:
- Scaling interpretability: Science and engineering are inseparable. Our researchers reflect on the close relationship between scientific and engineering progress, and discuss the techn...
- no title found: no description found
- Tweet from tldraw (@tldraw): i have an idea
- Tweet from tldraw (@tldraw): i have an idea
- Tweet from Haize Labs (@haizelabs): Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure...
- Tweet from Alvaro Bartolome (@alvarobartt): As you may already know, @argilla_io is now joining @huggingface š On a professional level, it's incredible to see two great teams sharing the same passion. I firmly believe in Argilla's mi...
- Tweet from tldraw (@tldraw): can your calculator do this?
- Tweet from tldraw (@tldraw): Be maths
- Tweet from Sharon Zhou (@realSharonZhou): Excited to announce @LaminiAI Memory Tuning, a new research breakthrough! š ā½95%+ accuracy, cutting hallucinations by 10x ā½Turns any open LLM into a 1M-way adapter MoE (paper & Lamini-1 model weights...
- Tweet from Deedy (@deedydas): OpenAI is at $3.4B annualized revenue. Wow.
- Tweet from Emily (@the_aiju): apparently you can nerdsnipe an LLM by asking it if there are primes whose digits sum to 9 (the correct answer is ānoā) ^^ every single one iāve tried it on goes crazy trying every single prime it can...
- The engineering challenges of scaling interpretability: Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
- Tweet from Liliang Ren (@liliang_ren): Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.š® And it has an infinit...
- GitHub - microsoft/Samba: Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling": Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling" - microsoft/Samba
OpenInterpreter ā· #general (40 messagesš„):
- Clarifying Open Interpreterās Capabilities: A discussion clarified that Open Interpreter gives agentic power to Large Language Models (LLMs). One member explained, āOI converts natural language into computer controlā and considered potential future integration with machine-specific LLMs and sensory input models.
- Running Vision Models: Members helped each other run simultaneous code and vision models with Open Interpreter, such as the
llama3-vision.py
profile. They discussed methods and issues like downloading models and getting them to perform tasks like screengrabs. - Browser Automation Example: A user shared their success having Open Interpreter browse the web to get the current Mavericks score using a simple command. They highlighted the straightforward nature of the prompt and the potential server load causing delays.
- Whisper STT Library Query: A member inquired about a good/easy Whisper Speech-To-Text (STT) library only to later mention they created their own solution.
- Performance and Customization: There were discussions about performance issues, server load, and potential customizations, particularly in modifying core.py to achieve desired results.
OpenAccess AI Collective (axolotl) ā· #general (5 messages):
- Appleās model param size revealed in WWDC: Curious about the model param size used in Appleās recent announcement? Itās a 3 billion parameter on-device language model. Details found here.
- Apple vs. model win clarified: There was a query about whether the term āwinā referred to Appleās victory or a modelās victory. A participant clarified: āI believe itās Appleās model won against the selected modelā.
- Model optimization technique explained: Appleās on-device inference uses low-bit palletization and a mixed 2-bit and 4-bit configuration strategy. This approach averages 3.5 bits-per-weight, achieving the same accuracy as uncompressed models while optimizing memory, power, and performance.
Link mentioned: Introducing Appleās On-Device and Server Foundation Models: At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply intoā¦
OpenAccess AI Collective (axolotl) ā· #general-help (16 messagesš„):
-
Docker Desktop struggles to find GPU: A member tried running axolotl via Docker on a virtual machine (Ubuntu) on Windows 11 but received an error stating āNo GPU found.ā They ran
docker run --gpus all --rm -it winglian/axolotl:main-latest
andaccelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
with no success. -
Suggest using
nvidia-smi
: Another member suggested runningnvidia-smi
to check GPU status. They also inquired if the CUDA toolkit was installed on the host system. -
Where to install CUDA toolkit: The troubleshooting discussion led to questions about whether to install the CUDA toolkit on Windows or Ubuntu.
-
Nvidia toolkit installation guide shared: A link to the NVIDIA installation guide was shared to aid in setting up the toolkit.
-
WSL 2 and GPU configuration: The user clarified they are using WSL 2 and will try configuring CUDA in Ubuntu WSL. They expressed gratitude and mentioned attempting the process again.
Link mentioned: Installing the NVIDIA Container Toolkit ā NVIDIA Container Toolkit 1.15.0 documentation: no description found
OpenRouter (Alex Atallah) ā· #general (16 messagesš„):
-
OpenRouter UI clamps unsupported parameters: When user questioned about supporting Temp > 1 and Min P for Command R in OpenRouter, Alex Atallah clarified that while the UI supports it, parameters like temp will be clamped down to temp=1, and Min P will not be passed.
-
High response times for Mistral 7B models: A user observed high response times for all Mistral 7B variants and linked it to changes in context length and possible rerouting of models. The discussion also pointed out a context length adjustment and continuous disruptions indicated by a model uptime tracker.
-
Employment offer: A user introduced themselves as a senior full stack & blockchain developer, stating they have enough experience and are seeking job opportunities.
-
Request for vision models: Another user inquired about plans to add more vision models, like cogvlm2, for better dataset captioning capabilities.
Links mentioned:
- OpenRouter: LLM router and marketplace
- OpenRouter API Watcher: Explore OpenRouter's model list and recorded changes. Updates every hour.
- Mistral: Mistral 7B Instruct (nitro) ā Uptime and Availability: Uptime statistics for Mistral: Mistral 7B Instruct (nitro) across providers - A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Note: this is...
tinygrad (George Hotz) ā· #general (3 messages):
- RDNA3 Assembly Bounty Hype: George Hotz mentioned a GitHub PR for RDNA3 assembly support, inviting contributions to the bounty. This update seeks to enhance tinygrad by adding RDNA3 assembly support.
- Qualcomm Kernel Level Bounty Suggested: Hotz also recommended a bounty for a āQualcomm Kernel level GPU driver with HCQ graph supportā as a solid starting point for those possessing a Qualcomm smartphone and low-level Linux knowledge. This suggests an opportunity to contribute to the GPU driver development.
- Tinygrad works smoothly in Termux: Hotz confirmed that tinygrad operates well within Termux. This implies that tinygrad is versatile and can be used in various environments including mobile.
Link mentioned: RDNA3 assembly support by geohot Ā· Pull Request #3637 Ā· tinygrad/tinygrad: no description found
tinygrad (George Hotz) ā· #learn-tinygrad (10 messagesš„):
-
Mixed precision in TinyGrad using bfloat16 and float32: A member asked if mixed precision could be mimicked by casting during matmul operations. Another member confirmed itās possible and likely faster (āespecially if it fits tensor core dtype patternā).
-
Indexing a tensor with computed index X: A member sought advice on efficient ways to access tensor indices computed from kernel operations and mentioned a potential inefficiency involving multiple kernels. They also referenced boolean indexing patterns discussed in PR#3707.
-
Using UOp graph in TinyGrad: A member shared code for compiling and running a UOp graph using
MetalDevice
andMetalCompiler
, but needed guidance on executing the compiled kernel. Another member suggested looking intocompiledRunner
for further information.
Datasette - LLM (@SimonW) ā· #ai (6 messages):
- Sober AI reigns at Databricks Summit: A blog post from dbreunig emphasizes that most AI work is practical and grounded, contrary to hype from companies like OpenAI and Google. He notes that best practices are just now settling, likening modern LLM work to data science in 2019.
- GPU vs. Spark capacity fights: Current AI development battles for GPU cores and VRAM swaps, contrasting with the previous struggles over Spark capacity and RAM swaps in data engineering. This highlights the evolving nature of technical constraints in the field.
- High LLM input-to-output ratio: At Databricks, clients have a 9:1 ratio of LLM input to output, suggesting that input token pricing is more significant than output token pricing. This ratio underscores the economic aspect of operating LLMs effectively.
- Frontpage success: dbreunigās insights from the Databricks Summit hit the front page of Hacker News, indicating a broad interest in the discussed themes of sober AI and practical application challenges in the AI domain today.
Link mentioned: Sober AI is the Norm: Sober AI is the quiet default, despite all the hype you hear about human-replacements and AGI. Data scientists and engineers are quietly transforming business intelligence through practical applicatioā¦
DiscoResearch ā· #general (3 messages):
- Aleph Alpha and Silo AI partner up for European AI: Aleph Alpha and Silo AI announced a strategic partnership to advance open-source AI and enterprise-grade solutions across Europe. Their collaboration aims to enhance AI deployment in industrial firms by leveraging Aleph Alphaās tech stack and Silo AIās expertise with a 300+ strong AI team. Aleph Alpha and Silo AI Partnership
Link mentioned: Aleph Alpha and Silo AI enter a strategic partnership to advance open source AI and enterprise-grade solutions in Europe - ALEPH ALPHA - AI for Enterprises and Governments: To foster the adoption and fully leverage the potential of generative AI across European industrial firms, Europeās largest AI lab Silo AI and European AI champion Aleph Alpha are announcing a long-teā¦
Torchtune ā· #general (1 messages):
- Survey on Torchtune user model serving: A user prompted the community to participate in a poll regarding how they serve their finetuned models. They expressed appreciation for any help, stating, āThanks for your help!ā.
Torchtune ā· #dev (1 messages):
- Proposed tokenizer revamp receives enthusiastic suggestion: A developer shared an RFC for a wide-scale tokenizer revamp, detailing important reasons for the change. They emphasized benefits like easier addition of features, better composability, and reduced onboarding time for new or updated model tokenizers.
Link mentioned: Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
MLOps @Chipro ā· #events (1 messages):
- Join Infer: Summer ā24 for AI insights: Qwak is hosting a free virtual conference on June 26, aimed at AI and ML enthusiasts. The event features live expert interactions and practical insights from leaders in the field.
- Insightful Talks and Practical Knowledge: The highlights include discussions on building recommender systems and exploring AI in sports. Youāll learn techniques for implementing predictive solutions and robust systems focusing on architecture and user engagement.
- Esteemed Speaker Lineup: Experts from Lightricks, LSports, and Lili Banking will share their real-world experiences and knowledge. Notable speakers include Hudson Buzby, Solutions Architect at Qwak, and Russ Wilcox, Data Scientist and AI Consultant at ArtifexAI.
- Register Now for Free Access: Donāt miss this opportunity to expand your AI acumen and network with top industry professionals. Register Here for Free and join the event on June 26, 2024.
Link mentioned: Infer Summer ā24 by Qwak | The Engineering Behind AI and ML: Infer Summer ā24 by Qwak brings AI leaders to share how the worldās leading companies use ML and AI in production. Join live on Jun 26, 2024, 11:00 AM EDT
Mozilla AI ā· #announcements (1 messages):
-
Upcoming AI Software Development Systems Event: An announcement about an upcoming event, āAI software development systems that lead to a future where developers are amplified, not automatedā, has been made. Details and RSVP link are here.
-
Machine Learning Paper Picks #2 Released: The latest edition of Machine Learning Paper Picks has been released. Check it out here for a curated selection of papers.
-
New CambAI Team Event: A new event with the CambAI Team has been posted. More information and RSVP link can be found here.
-
AMA Attendees Role Assignment: AMA attendees are reminded to pick up the 0din role via the id:customize link. This role is necessary to be notified about T-shirt distributions.
-
Member-Requested Tags for Curated Channel: A new
member-requested
tag has been added for users to contribute to the curated channel. Special thanks were given to the member who initiated this. -
Builders Program for Funding and Support: Interested members are encouraged to check out the Builders Program for funding and support for their projects. Details here.
YAIG (a16z Infra) ā· #tech-discussion (1 messages):
- Survey on GitHub Codespaces usage: A member initiated a survey asking if teams use GitHub Codespaces, prompting a response of either ā for yes or ā for no. This message likely aims to gauge the adoption and utilization of the feature within teams.
{% else %}
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!
If you enjoyed AInews, please share with a friend! Thanks in advance!
{% endif %}