In the agent literature it is common to find that multiple agents outperform single agents (if you conveniently ignore inference cost). Cohere has now found the same for LLMs-as-Judges:
Table of Contents
[TOC]
AI Reddit Recap
Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!
Here is the updated summary with the requested formatting and de-ranking of AGI posts:
OpenAI News
- Memory feature now available to all ChatGPT Plus users: OpenAI announced on Twitter that the memory feature is now rolled out to all ChatGPT Plus subscribers.
- OpenAI partners with Financial Times for AI in news: OpenAI has signed a deal to license content from the Financial Times to train its AI models. An image was shared announcing the partnership to develop AI experiences for news.
- Concerns over OpenAIâs profitability with paid training data: In /r/OpenAI, a post questioned OpenAIâs profitability as they start paying to license training data, speculating local open source models may undercut their business.
- Possible reduction in GPT-4 usage limits: A user in /r/OpenAI noticed GPT-4âs usage has been reduced from 40 messages per 3 hours to around 20 questions per hour.
- Issues with ChatGPT after memory update: In /r/OpenAI, a user found ChatGPT struggled with data cleansing and analysis tasks after the memory update, producing errors and incomplete outputs.
OpenAI API Projects and Discussions
- Tutorial on building an AI voice assistant with OpenAI: A blog post was shared in /r/OpenAI on building an AI voice assistant using OpenAIâs API along with web speech APIs.
- AI-powered side projects discussion: In /r/OpenAI, a post asked others to share their AI-powered side projects. The poster made a requirements analysis tool with GPT-4 and an interactive German tutor with GPT-3.5.
- Interface agents powered by LLMs: A /r/OpenAI post discussed âinterface agentsâ - AI that can interact with and control user interfaces like browsers and apps. It covered key components, tools, challenges and use cases.
- Difficulty resizing elements in GPT-4 generated images: In /r/OpenAI, a user asked for advice on instructing GPT-4 to shrink an element in a generated image, as the model struggles to consistently resize things.
Stable Diffusion Models and Extensions
- Seeking realistic SDXL models comparable to PonyXL: In /r/StableDiffusion, a user asked about realistic SDXL models on par with PonyXLâs quality and prompt alignment for photographic styles.
- Hi-diffusion extension for ComfyUI: A /r/StableDiffusion user found Hi-diffusion works well for generating detailed 2K images in ComfyUI with SD1.5 models, outperforming Khoya deep shrink. An extension is available but needs improvements.
- Virtuoso Nodes v1.1 adds Photoshop features to ComfyUI: Version 1.1 of Virtuoso Nodes for ComfyUI was released, adding 8 new nodes that replicate key Photoshop functions like blend modes, selective color, color balance, etc.
- Styles to simplify Pony XL prompts in Fooocus: A /r/StableDiffusion user created styles for Fooocus to handle the quality tags in Pony XL prompts, allowing cleaner and shorter prompts focused on content.
- Anime-style shading LoRA released: An anime-style shading LoRA was announced, recommended for use with Anystyle and other ControlNets. A Hugging Face link to the LoRA file was provided.
Stable Diffusion Help and Discussion
- Avoiding explicit content in generated images: In /r/StableDiffusion, a user getting phallic elements in 80% of their generated images asked for negative prompt advice to generate âregular pornâ instead.
- Creating short video clips with AI images and animated text: A /r/StableDiffusion post asked about APIs to generate AI images with animated text overlays to create short video clips.
- Newer Nvidia GPUs may be slower for AI despite gaming gains: A warning was posted that newer Nvidia GPUs like the 4070 laptop version use narrower memory buses than older models, making them slower for AI workloads.
- Proposal for community image tagging project: A /r/StableDiffusion post suggested a community effort to comprehensively tag images to create a dataset of consistently captioned images for training better models.
- Using VAEs for image compression: Experiments shared in /r/StableDiffusion show using VAE latents for image compression is competitive with JPEG in some cases. Saving generated images as latents is lossless and much smaller than PNGs.
- Generating a full body from a headshot: In /r/StableDiffusion, a user asked if itâs possible to generate a full body from a headshot image without altering the face much using SD Forge.
- Textual inversion of Audrey Hepburn: A /r/StableDiffusion user made a textual inversion of Audrey Hepburn that produces similar but varied faces, sharing example images and a Civitai link.
AI Twitter Recap
all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
LLMs and AI Models
- Llama 3 Performance: @abacaj noted that llama-3 models with zero-training can get 32k context with exceptional quality, surpassing significantly larger models. @rohanpaul_ai mentioned Llama 3 captures extremely nuanced data relationships, utilizing even the minutest decimals in BF16 precision, making it more sensitive to quantization degradation compared to Llama 2.
- Llama 3 Benchmarks: @abacaj reported llama-3 70B takes 3rd place on a benchmark, replacing Haiku. @abacaj shared a completion from the model on a code snippet benchmark that requires the model to find a function based on a description.
- Llama 3 Variants: @mervenoyann noted new LLaVA-like models based on LLaMA 3 & Phi-3 that pass the baklava benchmark. @AIatMeta mentioned Meditron, an LLM suite for low-resource medical settings built by @ICepfl & @YaleMed researchers, which outperforms most open models in its parameter class on benchmarks like MedQA & MedMCQA using Llama 3.
- GPT-2 Chatbot: There was speculation about the identity of the gpt2-chatbot model, with @sama noting he has a soft spot for gpt2. Some theories suggested it could be a preview of GPT-4.5/5 or a derivative model, but most agreed it was unlikely to be the latest OAI model.
- Phi-3 and Other Models: @danielhanchen released a Phi-3 notebook that finetunes 2x faster and uses 50% less VRAM than HF+FA2. @rohanpaul_ai shared a paper suggesting transformers learn in-context by performing gradient descent on a loss function constructed from the in-context data within their forward pass.
Prompt Engineering and Evaluation
- Prompt Engineering Techniques: @cwolferesearch categorized recent prompt engineering research into reasoning, tool usage, context window, and better writing. Techniques include zero-shot CoT prompting, selecting exemplars based on complexity, refining rationales, decomposing tasks, using APIs, optimizing context windows, and iterative prompting.
- LLMs as Juries: @cohere released a paper exploring replacing a single LLM judge with multiple LLM juries for evaluation. The âPoLLâ method with a diverse set of LLMs outperformed single judges across datasets while being 7-8x cheaper than GPT-4.
- Evaluating LLMs: @_lewtun asked about research on which prompts produce an LLM-judge most correlated with human preferences for pairwise rankings, beyond the work by @lmsysorg. @_philschmid summarized the PoLL (Panel of LLM) method proposed by @cohere for LLM evaluation as an alternative to a single large model judge.
Applications and Use Cases
- Financial Calculations: @llama_index shared a full-stack tutorial for building a financial assistant that can calculate percentage evolution, CAGR, and P/E ratios over unstructured financial reports using LlamaParse, RAG, Opus, and math formulas in @llama_index.
- SQL Query Generation: @virattt used @cohere cmd r+ to extract ticker and year metadata from financial queries in ~1s, then used the metadata to filter a vector db, fed results to GPT-4, and answered user query with ~3s total latency.
- Multi-Agent RAG: @LangChainAI announced a YouTube workshop on exploring âmulti-agentâ applications that combine independent agents to solve complex problems using planning, reflection, tool use, and their LangGraph library.
- Robotics and Embodied AI: @DrJimFan advocated for robotics as the next frontier after LLMs, sharing MIT AI Labâs 1971 proposal emphasizing robotics and reflecting on the current state. @_akhaliq shared a paper on Ag2Manip, which improves imitation learning for manipulation tasks using agent-agnostic visual and action representations.
Frameworks, Tools and Platforms
- LangChain Tutorials: @LangChainAI shared a 4-hour course on understanding how LangChain works with various technologies to build 6 projects. @llama_index provided a reference architecture for advanced RAG using LlamaParse, AWS Bedrock, and @llama_index.
- Diffusers Library: @RisingSayak explained how the Diffusers library supports custom pipelines and components, allowing flexibility in building diffusion models while maintaining the benefits of the
DiffusionPipeline
class. - Amazon Bedrock: @cohere announced their Command R model series is now available on Amazon Bedrock for enterprise workloads. @llama_index showed how to use LlamaParse for advanced parsing in the AWS/Bedrock ecosystem and build RAG with the Bedrock Knowledge Base.
- DeepSpeed Support: @StasBekman noted a PR merged into
main@accelerate
that makes FSDP converge at the same speed as DeepSpeed when loading fp16 models, by automatically upcasting trainable params to fp32.
Memes, Humor and Other
- ASCII Art: Several tweets poked fun at the ASCII art capabilities of LLMs, with @ylecun noting how AI hype has become indistinguishable from satire. @teortaxesTex shared a prompt to draw a Katamari Damacy level map using emojis that strains âGPT2ââs instruction following.
- Anthropic Slack: @alexalbert__ shared his 10 favorite things from Anthropicâs internal Slack channel where employees post cool Claude interactions and memes since its launch.
- Rabbit Disappointment: Several users expressed disappointment with the Rabbit AI device, noting its limited functionality compared to expectations. @agihippo questioned what the Rabbit r1 can do that a phone canât.
AI Discord Recap
A summary of Summaries of Summaries
1) Fine-Tuning and Optimizing Large Language Models
-
Challenges in Fine-Tuning LLaMA-3: Engineers faced issues like the model not generating EOS tokens, and embedding layer compatibility across bit formats. However, one member achieved success by utilizing LLaMA-3 specific prompt strategies for fine-tuning.
-
LLaMA-3 Sensitive to Quantization: Discussions highlighted that LLaMA-3 experiences more degradation from quantization compared to LLaMA-2, likely due to capturing nuanced relationships from training on 15T tokens.
-
Perplexity Fine-Tuning Challenges: Fine-tuning LLaMA-3 for perplexity may not surpass the base modelâs performance, with the tokenizer suspected as a potential cause.
2) Extending Context Lengths and Capabilities
-
Llama-3 Hits New Context Length Highs: The release of Llama-3 8B Gradient Instruct 1048k extends the context length from 8k to over 1048k tokens, showcasing state-of-the-art long context handling.
-
Llama 3 Gains Vision with SigLIP: A breakthrough integrates vision capabilities for Llama 3 using SigLIP, enabling direct use within Transformers despite quantization limitations.
-
Extending Context to 256k with PoSE: The context length of Llama 3 8B has been expanded from 8k to 256k tokens using PoSE, though inferencing challenges remain for âneedle in haystackâ scenarios.
3) Benchmarking and Evaluating LLMs
-
Llama 3 Outperforms GPT-4 in German NLG: On the ScanEval German NLG benchmark, Llama 3 surpassed the performance of GPT-4, indicating its strong language generation capabilities.
-
Mysterious GPT2-Chatbot Sparks Speculation: A GPT2-chatbot with gpt4-level capabilities surfaced, leading to debates on whether it could be an early glimpse of GPT-4.5 or a finetuned version of the original GPT-2.
-
Questioning Leaderboard Utility for Code Generation: A blog post challenges the effectiveness of AI leaderboards for code generation, citing the high operational cost of top performers like LLM debugger despite ranking highly.
4) Revolutionizing Gaming with LLM-Powered NPCs
-
LLM-Powered NPCs and Inference Stack: The release of LLM-powered NPC models aims to enhance action spaces and simplify API calls, including a single LLM call feature and open-weights on Hugging Face.
-
Overcoming LLM Challenges for Gameplay: Developers faced issues like NPCs breaking the fourth wall, missing details in large prompts, and optimizing for runtime speeds, suggesting solutions like output compression, minimizing model calls, and leveraging smaller models.
-
Insights into Fine-Tuning LLMs for NPCs: Developers plan to share their struggles and triumphs in fine-tuning LLMs for dynamic NPC behavior through an upcoming blog post, pointing towards new strategies for gaming applications.
5) Misc
-
CUDA Optimization Techniques: CUDA developers discussed various optimization strategies, including using
Packed128
custom structs for memory access patterns, replacing integer division with bit shifts (Compiler Explorer link), and comparing performance of CUTLASS vs CuBLAS for matrix multiplications. The Effort Engine algorithm was introduced, enabling adjustable computational effort during LLM inference to achieve speeds comparable to standard matrix multiplications on Apple Silicon (kolinko.github.io/effort, GitHub). -
LLaMA-3 Context Length Extension and Fine-Tuning: The LLaMA-3 8B modelâs context length was extended to over 1M tokens using PoSE (huggingface.co/winglian/llama-3-8b-256k-PoSE), sparking discussions on its retrieval performance and compute requirements. Fine-tuning LLaMA-3 presented challenges like quantization degradation, EOS token generation, and embedding layer compatibility across bit formats. A potential breakthrough was shared in a GitHub pull request demonstrating successful fine-tuning with model-specific prompt strategies.
-
Civitai Monetization Backlash: Stable Diffusion community members expressed discontent with Civitaiâs monetization strategies, particularly the Buzz donation system, which was labeled a ârip-offâ by some like Tower13Studios (The Angola Effect). Discussions also highlighted the potential profitability of NSFW AI-generated art commissions compared to the saturated SFW market.
-
Perplexity AI Performance Issues: Users reported significant slowdowns and poor performance across various Perplexity AI models during Japanâs Golden Week, with specific issues in Japanese searches resulting in meaningless outputs. Frustrations arose over expired Pro subscription coupons and the removal of the 7-day free trial. Technical troubles included email link delays affecting login and inconsistencies in the iOS voice feature depending on app versions.
-
Decentralized AI Training Initiatives: Prime Intellect proposed a decentralized training approach using H100 GPU clusters to enable open-source AI to compete with proprietary models (blog post). The initiative aims to address computing infrastructure limitations by leveraging globally distributed GPU resources.
PART 1: High level Discord summaries
CUDA MODE Discord
-
Triton Troubles: Engineers discussed limitations with Triton blocks, identifying an issue where blocks of 4096 elements are feasible, yet blocks of 8192 are not, hinting at discrepancies with expected CUDA limits.
-
CUDA Cognitions and Collaborations: Various CUDA topics were mulled over, including CUTLASS vs. CuBLAS performance, CUDA checkpointing, and the replacement of integer division with bit shifts. A link to the Compiler Explorer was shared to help with experiments.
-
PyTorch Peculiarities Pursued: Members examined the behavior of PyTorchâs
linear
function and matrix multiplication kernel launches, with observations about double kernel launches and the false expectation of performance differences due to transposition. -
LLM Inference Optimization with Effort Engine: Discussion revolved around the Effort Engine algorithm, which enables adjustable computational effort during LLM inference, purportedly yielding speeds comparable to standard matrix multiplications on Apple Silicon at lower efforts. The implementation and details are provided on kolinko.github.io/effort and GitHub.
-
InstaDeepâs Machine Learning Manhunt: InstaDeep is on the hunt for Machine Learning Engineers with expertise in high-performance ML engineering, custom CUDA kernels, and distributed training. Candidates can scout for opportunities at InstaDeep Careers.
-
Llama-3 Levitates to Longer Contexts: The release of Llama-3 8B Gradient Instruct 1048k set a new benchmark for context length capabilities in LLMs.
-
ROCm Rallies for Flash Attention 2: Conversations in the ROCM channel centered on adapting NVIDIAâs Flash Attention 2 for ROCm, with a focus on compatibility with ROCM 6.x versions and a link to the relevant repository ROCm/flash-attention on GitHub.
-
CUDA Conclave Converges on âPacked128â Innovations: The llmdotc channel was a hotspot with discussions focused on optimizing
Packed128
data structures and BF16 mixed-precision strategies, while also touching on the nuanced use of NVTX contexts and the utility of different benchmarking toolsets like Modal.
Unsloth AI (Daniel Han) Discord
-
Fusing Checkpoints to Avoid Overfitting: A member sought guidance on checkpoint merging to avoid overfitting and was directed to the Unsloth finetuning checkpoint wiki. Techniques such as warmup steps and resuming from checkpoints were recommended for nuanced training regimens.
-
Quantization Quandary in WSL2: Users reported RuntimeError: Unsloth: Quantization failed when converting models to F16 within WSL2. Despite attempts at rebuilding the
llama.cpp
and re-quantization, the error persisted. -
Phi-3: A Model of Interest: The upcoming release of Phi-3 stirred interest, with engineers debating whether to adopt the 3.8b version or wait for the heftier 7b or 14b variants.
-
OOM Countermeasures and Performance Data Confusion: Tips for handling Out of Memory errors on Google Colab by cache clearing were exchanged. Meanwhile, confusion surfaced over reported performance measures for quantized LLama 2 and LLama 3, hinting at possible data misplacement between Bits Per Word (BPW) and Perplexity (PPL).
-
Extended Possibilities: Llama 3 8B reached new potential with a Context length increase to 256k tokens, achieved with PoSE, showcased at winglian/llama-3-8b-256k-PoSE. Community applause went to Winglian, though some voiced skepticism about non-official context-extended model behavior.
LM Studio Discord
- Groqâs Gift to Discord Bots: A user shared a YouTube video highlighting the free Groq API enabling access to the LLAMA-3 modelâs impressive 300 tokens per second speed, optimally suited for small server Discord bots due to its no-cost setup.
- Spec Smackdown: Users recommended posting system specs in specific channels when troubleshooting LM Studio on Ubuntu GPUs, debated the compatibility of GPUs with inference tasks, and discussed the potentially incorrect VRAM capacity display in LM Studio causing concerns with GPU offloading efficiency.
- Model Mania: The community buzzed about alternative methods for downloading the GGUF model from sources other than Huggingface, the time and resource demands of creating iQuants and imatrices, and shared reward offers for optimizing the Goliath 120B Longlora model to create its iQuant version.
- Model Mayhem on Modest Machines: Users grappled with issues like the Phi-3 modelâs leaking prompts, local training queries for Hugging Face-based models, and the unexpected noises from hard drives during token generation by the Llama3m. Some determined that more dated hardware could just about manage a 7b Q4 model but nothing heftier.
- ROCm Ruminations: Enthusiasts dissected ROCm versions, mulling over the benefits of beta 0.2.20 for AMD functionality, addressed confusion about compatibilityâespecially the RX 6600âs support with the current HIP SDKâand discussed discrepancies in ROCmâs functionality on different operating systems like Ubuntu versus Windows.
Stability.ai (Stable Diffusion) Discord
Buzz Off, Civitai: AI creators in the guild are upset with Civitaiâs monetization strategies, particularly the Buzz donation system, which was labeled a ârip-offâ by some members, such as Tower13Studios. The discontent revolves around value not being fairly returned to creators (The Angola Effect).
Finding The AI Art Goldmine: A vibrant discussion unfolded on the economics of AI-generated art, with consensus pointing towards NSFW commissions, including furry and vtuber content, as a more profitable avenue compared to the more crowded SFW market.
Race for Real-Time Rendering: Members actively shared Python scripting techniques for accelerating Stable Diffusion (SDXL) models, eyeing uses in dynamic realms like Discord bots, aiming to enhance image generation speed for real-time applications.
Anticipation Builds for Collider: The community is keenly awaiting Stable Diffusionâs next iteration, dubbed âCollider,â with speculation about release dates and potential advancements fueling eager anticipation among users.
Tech Troubleshooting Talk: Guild members exchanged insights and solutions on a spectrum of technical challenges, from creating LoRAs and IPAdapters to running AI models on low-spec hardware, demonstrating a collective effort to push the boundaries of model implementation and optimization.
Perplexity AI Discord
-
Japanese Golden Week Glitches: During Japanâs Golden Week, users observed a noticeable performance drop in tools like Opus, Sonar Large 32K, and GPT-4 Turbo, with specific issues in Japanese searches, resulting in outputs that users deemed meaningless garbage. To address the problem, vigilant monitoring and optimization of these models was suggested.
-
Frustration over Pro Subscription and Trial Perils: Pro subscription users reported expired coupons on the due date, with offers linked to the Nothing Phone 2(a) aborted prematurely due to fraudulent activities. Moreover, the 7-day free trialâs removal from the site prompted disappointment, emphasizing its value as a user conversion tool.
-
Tech Turbulence with Perplexity AI: The community grappled with email link delays, causing login difficulties, particularly for non-Gmail services. Additionally, variations in the iOS voice feature were found to be dependent on the app version being used, reflecting inconsistencies in user experience.
-
API Avenues Explored: Engineers queried the pplx-api channel regarding source URL access through the API, following its mention in roadmap documentation, and debated whether using Claude 3 would entail adherence to Anthropicâs political usage restrictions under Perplexityâs terms.
-
Miscellaneous Inquiries and Insights Surface: A post in the #sharing channel spotlighted Lennyâs Newsletter on product growth and building concepts, while queries about WhatsAppâs autoreply feature and Vimeoâs API were thrown in. These discussions, particularly on the API, highlight engineersâ focus on integrating and utilizing various functionalities in their systems/processes.
Nous Research AI Discord
Bold Decentralization Move: Prime Intellectâs initiative for decentralized AI training, leveraging H100 GPU clusters, promises to push the boundaries by globalizing distributed training. The open-source approach may address current computing infrastructure bottlenecks as discussed in their decentralized training blog.
Retrieval Revolution with LLama-3: The extension of LLama-3 8Bâs context length to over 1040K tokens sparks discussions on whether its retrieval performance lives up to the hype. Skeptics remain, emphasizing the ongoing necessity of improvements and training, supported by an ArXiv paper on IN2 training.
PDF Challenges Tackled: To address PDF parsing challenges within AI models, particularly for tables, the community discussed workarounds and tools like OpenAIâs file search for better multimodal functionality handling roughly 10k files.
World Sims Showcase AIâs Role-Playing Prowess: Engagements with AI-driven world simulations highlight the capacities of llama 3 70b and Claude 3, from historical figures to business and singing career simulators. OpenAIâs chat on HuggingChat and links to niche simulations like Snow Singer Simulator reflect the diversity and depth achievable.
Leveraging Datasets for Multilingual Dense Retrieval: A noted Wikipedia RAG dataset on HuggingFace earmarks the rise of fostering AIâs language retrieval capabilities. The included Halal and Kosher data points toward a trend of creating diverse and inclusive AI resources.
Modular (Mojo đ„) Discord
-
Mojoâs Memory Safety and Concurrency Debated: Despite buzz around Mojoâs potential, it was clarified that features like Golang-like concurrency and Rust-like memory safety are not currently implemented due to borrow checking being disabled. However, possibilities regarding the use of actor model concurrency are being explored which may enhance Mojoâs runtime efficiency.
-
Installation Tactics for Mojo on Varied Systems: Users face challenges installing Mojo with Python 3.12.3 particularly on Mac M1, for which using a Conda environment is recommended. Also, while native Windows support is pending, WSL on Windows is a current workaround, with cross-compilation capabilities hinted through LLVM involvement.
-
Community Contributions to Mojo Ecosystem: Several community-driven projects are enhancing the Mojo ecosystem, from a Mojo-based forum on GitHub to a 20% performance optimized atof-simd project for long strings. Enthusiasm for collaboration and knowledge-sharing is evident as members share projects and call for joint efforts to tackle challenges such as the 1brc.
-
Nightly Compilations Trigger Discussions on SIMD and Source Location: A new nightly release of the Mojo compiler spurred conversation about the conversion of SIMD to EqualityComparable and the need for explicit
reduce_and
orreduce_or
in place of implicit conversion toBool
. The move of__source_location()
to__call_location()
incited exchanges on proper usage within the language. -
Performance and Benchmarking Take the Spotlight: From optimizing SIMD-based error correction code to sharing substantial speed gains in the 1brc project, performance topics spurred discussions on LLVM/MLIR optimizations. There were calls to form a âteam-mojoâ for communal challenge tackling, underscoring a shared interest in progressing Mojoâs benchmarking endeavors against other languages.
HuggingFace Discord
Snowflakeâs MoE Model Breaks Through: Snowflake introduces a monumental 408B parameter Dense + Hybrid MoE model with a 4K context window, entirely under Apache 2.0 license, sparking excitement for its performance on sophisticated tasks.
Gradio Share Server on the Fritz: Gradio acknowledges issues with their Share Server, impacting Colab integrations, which is under active resolution with updates available on their status page.
CVPR 2023 Sparks Competitive Spirit: CVPR 2023 announced competetive events like SnakeCLEF, FungiCLEF, and PlantCLEF, boasting over $120k in rewards and happening June 17-21, 2024.
MIT Deep Learning Course Goes Live: MIT updates its Introduction to Deep Learning course for 2024, with comprehensive lecture videos on YouTube.
NLP Woes in Chatbot Land: Within the NLP community, effort mounts to finetune a chatbot using the Rasa framework, despite struggles with intent recognition and categorization, and plans to augment performance with a custom NER model and company-specific intents.
OpenRouter (Alex Atallah) Discord
-
Alex Atallah Signposts Collaboration with Syrax: Alex Atallah has initiated experiments with Syrax and extended support by proposing a group chat for collaborative efforts, marking the start of a partnership acknowledged with enthusiasm by Mart02.
-
Frontend for the Rest of Us: The community explored solutions for deploying multi-user frontends on shared hosting without advanced technical requirements. LibreChat was suggested as a viable platform, with Vercelâs free tier hosting mentioned as a means to address hosting and cost obstacles.
-
LLMs Throwdown: A robust debate unfolded over several large language models including Llama-3 8B, Dolphin 2.9, and Mixtral-8x22B, touching on aspects like context window size and censorship concerns related to conversational styles and datasets.
-
Training Unhinged AIs: An intriguing experiment involved training a model with a toxic dataset to foster a more âunhingedâ persona. Discussions dug into model limitations with long contexts, with an agreement that although models like Llama 3 8B handle extensive contexts, performance dips were likely past a threshold.
-
Cost-Effective Experimentation on OpenRouter: Conversations centered on finding efficient yet affordable models on OpenRouter. Noteworthy was the mix of surprise and approval for the human-like output of models like GPT-3.5 that deliver a solid blend of affordability and performance.
LlamaIndex Discord
AWS Architecture Goes Academic: LlamaIndex revealed an advanced AWS-based architecture for building sophisticated RAG systems, aimed at parsing and reasoning. Details are accessible in their code repository.
Documentation Bot Triumphs in Hackathon: Hackathon victors, Team CLAB, developed an impressive documentation bot leveraging LlamaIndex and Nomic embeddings; check out the hackathon wrap-up in this blog post.
Financial Assistants Get a Boost: Constructing financial assistants that interpret unstructured data and perform complex computations has been greatly improved. The methodology is thoroughly explored in a recent post.
Turbocharging RAG with Semantic Caching: Collaboration with @Redisinc demonstrated significant performance gains for RAG applications using semantic caching to speed up queries. The collaboration details can be found here.
GPT-1: The Trailblazer Remembered: A reflective glance at GPT-1 and its contributions to LLM development was shared, discussing features like positional embeddings which paved the way for modern models like Mistral-7B. The nostalgia-laden blog post revisits GPT-1âs architecture and impact.
Eleuther Discord
Plug Into New Community Projects: Members are seeking opportunities to contribute to community AI projects that provide computational resources, addressing the issue for those lacking personal GPU infrastructure.
Unlock the Mysteries of AI Memory: Intricacies of memory processes in AI were covered with a particular focus on âclear-ingâ, orthogonal keys, and the delta rule in compressive memory. Thereâs an interest in discussing whether infini-attention has been overhyped, despite its theoretical promise.
Comparing Apples to Supercomputers: Thereâs an active debate regarding performance discrepancies between models like mixtral 8x22B and llama 3 70B, where llamaâs reduced number of layers, despite having more parameters, may be impacting its speed and batching efficiency.
LLMs: Peering Inside the Black Box: The community is contemplating the âblack boxâ nature of Large Language Models, discussing emergent abilities and data leakage. A connection was made between emergent abilities and pretraining loss, challenging the focus on compute as a performance indicator.
Bit Depth Bewilderment: A user reported issues when encoding with 8bit on models like llama3-70b and llamma3-8b, experiencing significant degradation in output quality, suggesting a cross-model encoding challenge that needs addressing.
LAION Discord
- GDPR Complaint Challenges AI Birthdays: An EU privacy advocate has filed a GDPR complaint after an AI model incorrectly estimated his birthday, triggering discussions on the potential implications for AI operations in Europe.
- Mysterious GPT-5 Speculations: Amidst rumors of a new GPT-5 model release, the community debates inconsistent test outcomes and the absence of official communication or leaderboard recognitions, questioning the frameworkâs evasiveness in generating hallucinations.
- Llama3 70Bâs Slow Performance Spotlight: AI engineers are troubleshooting the Llama3 70B modelâs unexpectedly sluggish token generation rate of 13 tokens per second on a dual 3090 rig, delving into possible hardware and configuration enhancements.
- Exllama Library Outraces Rivals: Users endorse Exllama for its fast performance on language model tasks and suggest utilizing the TabbyAPI repository for simpler integrations, naming it a superior choice compared to other libraries.
- Research Breakthrough with OpenCLIP: The successful application of OpenCLIP to cardiac ultrasound analysis has been published, highlighting the rigorous revision process and a move towards novel, non-zero-shot techniques, with the study available here; meanwhile r/StableDiffusion is back online and a relevant CLIP training repository is discussed in the context of Redditâs recent API changes, found at this Reddit discussion.
OpenAI Discord
Memory Lane with Upscaled ChatGPT Plus: ChatGPT Plus now allows users to command the AI to remember specific contexts, which can be toggled on and off in settings; the rollout has not reached Europe or Korea yet. Plus, both Free and Plus users gain enhanced data control, including a âTemporary Chatâ option that discards conversations immediately after they end.
AI Ghosh-darn Curiosity and Camera Tricks: Discussions swung from defining AI curiosity and sentience with maze challenges to the merits of DragGAN altering photos with new angles. Meanwhile, the Llama-3 8B model emerged, flaunting its long-context skills and is accessible at Hugging Face, but the community still wrestled with the accessibility of advanced AI technologies and the dream of inter-model collaboration.
GPT-4: Bigger and Maybe Slower?: The community dove into the attributes of GPT-4, noting its significantly larger size than the 3.5 version and raising concerns about whether its scale may affect processing speed. Meanwhile, the possibility of mass-deleting archived chats was also a topic of concern.
Prompt Engineeringâs Competitive Edge: Prompt engineering drew attention, with suggestions for competitions to hone skills, and âmeta promptingâ via GPT Builder to refine AI output. The group agreed that positive prompting trumps listing prohibites, and wrestled with optimizing regional Spanish nuances in AI text generation.
Cross-Channel Theme of Prompting Excellence: Both AI discussions and API channels tackled prompt engineering, with meta-prompting techniques at the spotlight, indicating a shift toward more efficient prompting strategies that might decrease the need for competitions. Navigating the complexities of multilingual outputs also emerged as a shared challenge, emphasizing adaptation rather than prohibition.
OpenAccess AI Collective (axolotl) Discord
LLaMA 3 Struggles with Quantization: LLaMA 3 is observed to have significant performance degradation from quantization processes, more so than its predecessor, which might be due to its expansive training on 15T tokens capturing very nuanced data relations. A critique within the community called a study on quantization sensitivity âworthless,â suggesting that the issue may be more related to model training approaches rather than size; the critique referenced a study on arXiv.
Riding the Zero Train: The Guild discussed Huggingfaceâs ZeroGPU, a beta feature offering free access to multi-GPU resources like Nvidia A100, with some members expressing regret at missing early access. A member has shared access and is open to suggestions for testing on the platform.
Finetuning Finesse: Advised against fine-tuning meta-llama/Meta-Llama-3-70B-Instruct
, it was suggested that members start with smaller models like 8B to sharpen their fine-tuning skills. The Guild clarified how to convert a fine-tuning dataset from OpenAI to ShareGPT format, and provided guidance with Python code for dataset transformation.
Tutorial Spreads Its Wings: A helpful tutorial was shared on fine-tuning Axolotl using dstack, showing the communityâs knack for collaboratively improving practices. Appreciation was conveyed by members, noting the tutorialâs ease of use.
Axolotl Adaptations: Discussing the fine-tuning of command-r within Axolotl and related format adaptations, a member shared an untested pull request relating to this topic, while also noting its prematurity for merging. In addition, thereâs uncertainty about the support for phi-3 format and the implementation standing of sample packing feature, indicating a need for further clarification or development.
Latent Space Discord
-
Memary: An Autonomous Agentâs Long-term Memory: The Memary project on GitHub has introduced a new approach for long-term memory in autonomous agents, using document similarity searches over traditional knowledge graphs.
-
The GPT-2 Chatbot Enigma: Intense debates have emerged on a GPT2-chatbot that showcases surprisingly advanced capabilities, leading to speculation that it might be a finetuned version of OpenAIâs GPT-2.
-
Can Decentralized Training Compete with Big Tech?: Prime Intellectâs blog post discusses decentralized training as a plausible avenue for open-source artificial intelligence to compete with the proprietary models developed by large corporations with extensive GPU resources.
-
Redefining LLMs with Modular Context and Memory: Discussions are emerging that suggest a paradigm shift towards designing autonomous agents with modularized shared context and memory capabilities for reasoning and planning, stepping away from the reliance on standalone large language models (LLMs).
-
Educational Resources for Aspiring AI Enthusiasts: For those seeking to learn AI fundamentals, community members recommended resources including neural network tutorials such as the one on YouTube and courses like Learn Prompting, providing a glimpse into AI engineering and prompt engineering basics.
OpenInterpreter Discord
OS Start-up with a Vision: A user faced challenges attempting to launch OS mode with a local vision model for Moondream and received gibberish output, but the discussion did not yield a solution or direct advice.
Integration Achievements: An exciting integration of OpenInterpreter outputs into MagicLLight was mentioned, with anticipation for a future code release and pull request including a stream_out
function hook and external_input
.
Hardware Hiccup Help: Queries about running OpenInterpreter on budget hardware like a Raspberry Pi Zero were brought up alongside requests for assistance with debugging startup issues. Community members offered to help with troubleshooting once more details were provided.
Push Button Programming: An individual fixed an external push button issue on pin 25 and shared a code snippet, also getting community confirmation that the fix was effective.
Volume Up on Tech Talk: There were mixed opinions on whether tech YouTubers have a grasp on AI technologies while advising on options for increasing speaker volume, including using M5Unified or an external amplifier.
tinygrad (George Hotz) Discord
-
Peek into Tinygradâs Inner Workings: The tinygrad GitHub repository was recommended to someone curious about tinygrad, an educational project for enthusiasts of PyTorch and micrograd. Another community member inquired about graph visualization, leading to the suggestion to use the
GRAPH=1
environment variable to generate diagrams for addressing backward pass issues #3572. -
The Discovery of Learning Resources: The community explored learning AI with TinyGrad through resources like MicroGrad and MiniTorch, with MiniTorch being singled out as particularly useful for understanding deep learning systems. The âtinygrad Quick Start Guideâ was highlighted as a starting point for beginners.
-
Taking the Symbolic Route: Implementing a symbolic mean operation in TinyGrad brought up discussions about LazyBufferâs interaction with data types and the practicality of variable caching for operations like
sum
andmean
. A pull request demonstrated symbolic code execution while further GitHub compare views tackled the development of symbolic mean with variables at tinygrad symbolic-mean-var-pull and GitHub changes by gh. -
Bounty Hunting for Mean Solutions: The community sought guidance for bounty challenges related to âMean of symbolic shapeâ and âSymbolic arrangeâ. Discussion centered around the implementation nuances and practical approaches to these problems in the TinyGrad environment.
-
Cluster of Curiosities: A spontaneous question about how a member discovered the Discord server triggered a chain of speculations, with the respondent admitting they did not recall the method of encounter, adding a touch of mystery to the channel discourse.
Cohere Discord
-
Single-Site Restrictions in Command-R: API Command R+âs
web_search
tool only allows for one website at a time, and the workaround discussed involves separate API calls for each site. -
Feature Request Frenzy: Engineers are eager for Command-R improvements with an emphasis on Connectors, including multi-website searches and extra parameter control; to get familiar with current capabilities, refer to the Cohere Chat Documentation.
-
Multi-Step Connector Capabilities Currently Limited: It was confirmed that multi-step tool use with connectors isnât yet possible within Command-R.
-
Generate Option Gone Missing: Queries rose regarding the disappearance of âGenerateâ for fine-tuning models from the dashboard, leaving its future presence in question.
-
Strategic Embedding Sought: Discussion revolved around cost-effective strategies for keeping data fresh for embeddings, with a focus on reindexing only modified segments.
-
Nordic Networking Noted: Members highlighted operations within Sweden using Cohere and existing connections through the company Omegapoint, spanning both Sweden and Norway.
LangChain AI Discord
-
Gemini Experience Wanted & Observability Tools Sought: Users in the general channel are seeking expertise in Gemini 1.0 or 1.5 models and discussing available tools for Large Language Model (LLM) observability, with interest in self-hosted, open-source options compatible with LlamaIndex. Meanwhile, thereâs a push for enhanced SQL security when connecting to OpenAI models and a technical discussion on integrating autoawq with LangGraph for high-speed AI agent inference using exllamav2 kernels.
-
Asynchronous Adventures and Google Drive Gyrations: Within the langserve channel, a user is challenged by the lack of async support in AzureSearchVectorStoreRetriever and is considering whether to push for an async feature or to craft an async wrapper themselves. Separately, the discussion turned to the nuances of using Google Drive libraries and the importance of setting the drive key as an environment variable.
-
Showcase Extravaganza & Plugin Revelation: In share-your-work, thereâs an insights-filled trip back to GPT-1âs role in initiating current LLM advancements and several LangChain use cases, including a âD-ID Airbnb Use Caseâ and a âPizza Botâ, both featured on YouTube. The VectorDB plugin for LM Studio also made an appearance, aiming to bolster ChromaDB vector databases in server mode, while QuickVid was launched to deliver YouTube video summaries and fact checks.
-
RAG Agents Go Multilingual & Private: Tutorials channel is sharing resources for interested French speakers in building RAG assistants with LangChain, Mistral Large, and Llamaindex. Another guide demonstrates enhancing llama3âs performance by incorporating personal knowledge bases to create agentic RAGs, revealing potential for more localized and data-rich AI capabilities.
Alignment Lab AI Discord
Alert: Illicit Spam Floods Channels: Numerous messages across different channels promoted explicit material involving â18+ Teen Girls and OnlyFans leaks,â accompanied by a Discord invite link. All messages were similar in nature, using emojis and @everyone
to garner attention, and are flagrant violations of Discordâs community guidelines.
Prompt Moderation Action Required: The repeated posts are indicative of a coordinated spam attack necessitating immediate moderation intervention. Each message invariably linked to an external Discord server, potentially baiting users into exploitative environments.
Engineer Vigilance Advocacy: Members are encouraged to report such posts to maintain professional decorum. The content breaches both legal and ethical boundaries and does not align with the guildâs purpose or standards.
Discord Server Safety at Risk: The proliferation of these messages highlights a concern for server security and member safety. The spam suggests a compromise of server integrity, underscoring the need for robust anti-spam measures.
Community Urged to Disregard Suspicious Links: Engineers and members are urged to avoid engaging with or clicking on unsolicited links. Such practices help safeguard personal information and the communityâs credibility while adhering to legal and ethical codes.
AI Stack Devs (Yoko Li) Discord
-
Game Devs Gear Up for Gamification: Rosebud AIâs Game Jam invites creators to fashion 2D browser-based games using Phaser JS with a $500 prize pool, and an AIxGames Meetup is slated for Thursday in SF to bring together AI and gaming professionals RSVP here.
-
NPC Revolution with LLMs: A developer has introduced LLM-powered NPC models and an inference stack, available on GigaxGames at GitHub, promising an LLM single call feature and open-weights on Huggingfaceâs Hub, despite a hiccup with a broken API access link.
-
Grappling with Gaming NPC Realities: Developers are experimenting with output compression, minimized model calls, and smaller models to improve NPC runtime performance and grappling with NPCs that break the fourth wall, with the Claude 3 model showing promise in empathetic interactions for better gaming experiences.
-
Blog Teased on LLMs for NPCs: Thereâs an upcoming blog post chronicling the struggles and triumphs in finetuning LLMs for dynamic NPC behavior, pointing towards new strategies that could be shared within the community.
-
Navigating Windows Woes with Convex: The Convex local setup does not play nice with Windows, causing users to encounter sticking points, though potential solutions like WSL or Docker have been floated, and a Windows-compatible Convex is reportedly on the horizon.
Skunkworks AI Discord
Binary Quest in HaystackDB: Curiosity piqued about the potential use of 2-bit embeddings in HaystackDB, while Binary Quantized (BQ) indexing becomes a spotlight topic due to its promise of leaner and faster similarity searches.
The Rough Lane of Fine-Tuning LLaMA-3: Engineers face a bumpy road with LLaMA-3 fine-tuning, battling issues from the model neglecting EOS token generation to embedding layer compatibility across bit formats.
Perplexed by Perplexity: The community debates fine-tuning LLaMA-3 for perplexity, suggesting that performance may not surpass the base model, possibly due to tokenizer-related complications.
Shining a Light on LLaMA-3 Improvement: A beacon of hope shines as one user successfully fine-tunes LLaMA-3 with model-specific prompt strategies, sparking interest with a GitHub pull request for the collectiveâs scrutiny.
Off-Topic Oddities Go Unsummarized: A solitary link in #off-topic stands alone, contributing no technical discussion to the collective knowledge pool.
Mozilla AI Discord
-
Mozillaâs AI Talent Search: Mozilla AI is actively recruiting for various roles, with job opportunities available for those interested in contributing to their initiatives. For those looking to join the team, they can find more information and apply using the provided link.
-
LM-buddy: Eval Tool for Language Models: The release of Lm-buddy, an open-source evaluation tool for language models, stands to improve the assessment of LLMs. Contributors and users are encouraged to engage with the project through the given link.
-
Prometheus Benchmarks LLMs in Judicial Roles: The Prometheus project has demonstrated the potential for Local Large Language Models (LLMs) to act as arbiters, a novel concept sparking discussion. Interested parties can join the conversation about this application by following the link.
-
In-Depth Code Analysis Request for LLaMA: An engineer has noted that token generation in llama.cpp/llamafile is a bottleneck, with matrix-vector multiplications consuming 95% of the inference time for LLaMA2. This has led to speculation on whether loop unrolling contributes to the 30% better performance of llama.cpp over alternative implementations.
-
LLaMA Tales of Confusion and Compatibility: The Discord discussed amusing mix-ups and pseudonymous confusion with LLaMA parameters. Additionally, challenges were shared regarding the integration with Plush-for-comfyUI and LLaMA3âs compatibility issues on M1 Macbook Air, promising priority testing for the M1 once current LLaMA3 issues are addressed.
Interconnects (Nathan Lambert) Discord
-
OLMo Deep Dive Shared by AI Maverick: A detailed talk on âOLMo: Findings of Training an Open LMâ by Hanna Hajishirzi was posted, featuring her work at the Open-Source Generative AI Workshop. Her pace of presenting substantive content on OLMo, Dolma, Tulu, etc., was noted to be rapid, possibly challenging for students to digest, thus reflecting her expertise and the extensive research involved in these projects.
-
RL in LM-Based Systems Exposed: Key takeaways from John Schulmanâs discussion on reinforcement learning for language model-based systems were encapsulated in a GitHub Gist, providing engineers with a compressed synthesis of his approach and findings.
-
AI Leaderboard Limitations Pointed Out: A blog post by Sayash Kapoor and Benedikt Stroebl challenges the effectiveness of AI leaderboards for code generation, highlighting LLM debuggerâs (LDB) high operational cost despite its top rankings, calling into question the utility of such benchmarks in the face of significant expenses.
-
SnailBot: A mention for an update or news related to SnailBot was made but lacked further information or context for a substantive summary.
-
Notice: Based on the provided snippets from the Discord guild there is no additional content that warrants a summary, indicating that these messages may have been part of a larger context or subsequent discussions that were not included.
LLM Perf Enthusiasts AI Discord
-
Gamma Seeking AI Wizard: Gamma is hiring an AI engineer to drive innovation in AI-driven presentation and website design, with a focus on prompt engineering, metrics, and model fine-tuning; details are at Gamma Careers. Despite the need for an in-person presence in San Francisco, the role is open to those with strong Large Language Model (LLM) skills even if they lack extensive engineering experience.
-
AI-Powered Enterprise on Growth Fast-track: Flaunting over 10 million users and $10M+ in funding, Gamma is looking for an AI engineer to help sustain its growth while enjoying a hybrid work culture within its profitable and compact 16-member team.
-
The Case of GPT-4.5 Speculations: A tweet by @phill__1 hinted at gpt2-chatbot possessing âinsane domain knowledge,â leading to speculation that it might represent the capabilities of a GPT-4.5 version phill__1âs observation.
-
Chatbot Causes Community Commotion: The engineer community is abuzz with the idea that the gpt2-chatbot could be an unintentional glimpse at the prowess of GPT-4.5, with a member succinctly endorsing it as âgoodâ.
Datasette - LLM (@SimonW) Discord
-
Snazzy Syntax-Nixing for Code-Gen: A user discussed the concept of incorporating a custom grammar within a language model to prioritize identifying semantic rather than syntactic errors during code generation.
-
Data-fied Dropdowns for Datasette: Suggestions were exchanged on improving Datasetteâs UX, including a front-page design that features dropdown menus to enable users to generate summary tables based on selected parameters, such as country choice.
-
UX Magic with Direct Data Delivers: Members proposed enhanced UX solutions for Datasette, including dynamically updating URLs or building homepage queries adjusted by user selection to streamline access to relevant data.
DiscoResearch Discord
- Loading Anomalies Enigma: A conversation highlighted that a process loads in 3 seconds on a local machine but faces delays when run through job submission, implying that the issue may not be related to storage but perhaps environment-specific overheads.
- Llama Trumps GPT-4 in Language Benchmark: Llama 3 outperformed GPT-4 in the ScanEval benchmark for German NLG, as shown on ScandEvalâs leaderboard.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
CUDA MODE â· #triton (1 messages):
- Clarifying Triton Block Size Limits: A member inquired about the maximum size of a Triton block, noting that while they can create blocks with 4096 elements, they cannot do the same with 8192, suggesting thereâs a discrepancy with the expected CUDA limits.
CUDA MODE â· #cuda (8 messagesđ„):
- Seeking Flash Attention Code: A user inquired about how to download lecture12 of flash attention code presented by Thomas Viehmann; no resolution to the query was provided in the chat.
- Understanding CUDA Reductions: A member worked out their confusion regarding row-wise versus column-wise reductions in CUDA, realizing the performance difference is due to the (non)coalesced memory accesses and clarified their own question.
- Integer Division in Kernel Code: An optimization discussion took place regarding replacing integer division with bit shifts; it was suggested that nvcc or ptxas may optimize division when divisors are powers of 2, and a compiler explorer link was provided for further experimentation.
- CUDA Checkpointing Resource Shared: An external GitHub resource for CUDA checkpoint and restore utility, NVIDIA/cuda-checkpoint, was shared without further discussion.
- Comparing CUTLASS and CuBLAS Performance: A member benchmarked matrix multiplication performance comparing CuBLAS and CUTLASS, reporting that CUTLASS outperforms CuBLAS in a standalone profiler, but when integrated into Python the performance gains disappear, as shared in an article at Thonking AIâs post about matrix multiplications.
Links mentioned:
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]: Great minds discuss flops per watt.
- GitHub - NVIDIA/cuda-checkpoint: CUDA checkpoint and restore utility: CUDA checkpoint and restore utility. Contribute to NVIDIA/cuda-checkpoint development by creating an account on GitHub.
- Compiler Explorer - CUDA C++ (NVCC 11.7.0): #include <algorithm> #include <cassert> #include <cstdio> #include <cstdlib> __global__ void sgemmVectorize(int M, int N, int K, float alpha, f...
CUDA MODE â· #torch (4 messages):
- Curiosity About Double Kernel Launches: A member inquired as to why, during matrix multiplication in PyTorch, the profiler sometimes indicates two kernel launches.
- Clarification on PyTorch
linear
Function: Another member clarified thatlinear
in PyTorch does include a transpose operation by default on the input, which might not lead to a performance difference.
CUDA MODE â· #algorithms (2 messages):
-
Introducing Effort Engine for LLMs: The Effort Engine algorithm was shared, with the capability of dynamically adjusting the computational effort during LLM inference. At 50% effort, it reaches speeds comparable to standard matrix multiplications on Apple Silicon, and at 25% effort, itâs twice as fast with minimal quality loss, as per the details on kolinko.github.io/effort.
-
Effort Engineâs Approach to Model Inference: This new technique allows for selectively loading important weights, potentially enhancing speed without substantial quality degradation. Itâs implemented for Mistral and should be compatible with other models after some conversion and precomputation, with the implementation available on GitHub.
-
FP16 Only Implementation and Room for Improvement: The Effort Engine is currently available for FP16 implementations only, and while the multiplications are fast, improvements are needed in other areas such as softmax and attention summation operations.
-
Potential Limitations of Effort Engine Explored: A member highlighted that while the Effort Engineâs approach is innovative, it might share limitations with activation sparsity methods, especially in batched computations with batch size greater than one due to misaligned activation magnitudes.
Link mentioned: Effort Engine: A possibly new algorithm for LLM Inference. Adjust smoothly - and in real time - how many calculations youâd like to do during inference.
CUDA MODE â· #jobs (1 messages):
-
InstaDeep is Hiring ML Engineers: InstaDeep Research is actively seeking Machine Learning Engineers who are passionate about high-performance ML engineering and its real-world applications. Candidates who excel in building custom CUDA kernels, state-of-the-art model architectures, quantisation, and distributed training should reach out for opportunities.
-
Join a Collaborative Innovator: InstaDeep offers a stimulating, collaborative environment to work on real-life decision-making and technology products, and encourages applications from talented individuals eager to make a transformative impact. The company emphasizes innovation and real-world applications in Bio AI and Decision Making AI.
-
Seeking Interns and Multi-Applicants: Individuals interested in internships or pursuing more than one job opportunity at InstaDeep can explore internship opportunities and apply to multiple positions provided they have the relevant skills, though it is advised not to apply to more than two to avoid application rejection.
-
Reapplication Guidelines Suggested: Those who applied previously and were not selected are recommended to wait before reapplying, particularly if they applied within the last six months, indicating a period of consideration for changes in applicant profile or company needs.
Link mentioned: Job Offer | InstaDeep - Decision-Making AI For The Enterprise: no description found
CUDA MODE â· #youtube-recordings (2 messages):
- No Updates on Progress: A member confirmed that there have been no new developments to report currently.
- Profiling Techniques on Video: A YouTube video titled âLecture 16: On Hands Profilingâ was shared in the chat, providing a resource for learning about profiling techniques, although no specific description was provided.
Link mentioned: Lecture 16: On Hands Profiling: no description found
CUDA MODE â· #ring-attention (1 messages):
- Llama-3 Hits New Context Length Highs: Gradient has released Llama-3 8B Gradient Instruct 1048k that extends the context length from 8k to over 1048k. The achievement demonstrates that state-of-the-art language models can adapt to long contexts with minimal training adjustments.
Link mentioned: gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
CUDA MODE â· #off-topic (1 messages):
- CUTLASS: A Dance of Integers: A member observed that CUTLASS, despite being a linear algebra library, primarily handles integer operations and index manipulations before calling advanced linear algebra routines. This characteristic rationalizes its nature as a header-only library without the need for complex linking.
CUDA MODE â· #llmdotc (721 messagesđ„đ„đ„):
-
CUDA Programming Discussions & Packed128 Types: There was a detailed debate about the usage of
Packed128
custom struct for optimizing memory access patterns, addressing both reads and writes. Special attention was given to the proper construction and utilization ofPacked128
, and whether to use explicit typecasting with floatX and BF16 inside kernels. -
Mixed-Precision Strategy Concerns: Thereâs concern about the impact of using BF16 throughout the entire model and whether stochastic rounding might affect training convergence. There are plans to compare the loss metrics between llm.câs BF16 approach and standard PyTorch mixed-precision implementations.
-
Profiling & Debugging: A member added NVTX contexts for better profiling with NSight Compute, enabling more accurate GPU timings. A member observed that AdamW kernel may need optimization regarding FP32 atomics and scratch storage usage.
-
Tooling & Infrastructure for Benchmarking: Members discussed the potential utility of external platforms like Modal for running benchmarks on standardized specs, specifically the benefits and limitations of Modal with regard to profiling tools like nvprof and nsys.
-
PR Reviews Prepared for Merge & CI Suggestions: The channel had several PRs prepared for merging, mostly pertaining to the f128 and Packed128 optimizations for various kernels. The need for keeping branch documentation updated, -Wall compilation, and a CI check to ensure python and C implementations deliver similar results were also highlighted.
Links mentioned:
- Nvidiaâs H100: Funny L2, and Tons of Bandwidth: GPUs started out as devices meant purely for graphics rendering, but their highly parallel nature made them attractive for certain compute tasks too. As the GPU compute scene grew over the past couâŠ
- cuda::associate_access_property: CUDA C++ Core Libraries
- FP8-LM: Training FP8 Large Language Models: In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM traini...
- cuda::memcpy_async: CUDA C++ Core Libraries
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]: Great minds discuss flops per watt.
- Log in: no description found
- Compiler Explorer - CUDA C++ (NVCC 12.2.1): #include <cuda/barrier> #include <cuda/std/utility> // cuda::std::move #include <cooperative_groups.h> #include <cooperative_groups/reduce.h> t...
- llm.c/dev/cuda/layernorm_backward.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- WikiText 103 evaluation · Issue #246 · karpathy/llm.c: I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.: https://github.com/tysam-code/hlb-gpt/tree/main Add prepro script to download and preprocess and tokeni...
- llm.c/train_gpt2.cu at 9464f4272ef646ab9ce0667264f8816a5b4875f1 · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- Compiler Explorer - CUDA C++ (NVCC 12.3.1): #include <cuda_fp16.h> template<class ElementType> struct alignas(16) Packed128 { __device__ __forceinline__ Packed128() = default; __device__ __forceinline__ exp...
- Add script to run benchmarks on Modal by leloykun · Pull Request #311 · karpathy/llm.c: This PR adds a script to run the benchmarks on the Modal platform. This is useful for folks who do not have access to expensive GPUs locally. To run the benchmark for the attention forward pass on ...
- GitHub - graphcore-research/out-of-the-box-fp8-training: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.: Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8. - graphcore-research/out-of-the-box-fp8-training
- GitHub - NVIDIA/cudnn-frontend: cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it: cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it - NVIDIA/cudnn-frontend
- round 1 of some changes. we will now always write in fp32, even if dt⊠· karpathy/llm.c@3fb7252: âŠype is set to float16 or bfloat16. next up, we actually want to write in lower precision, when the dtype is set so
- fixed potential error and generalized gelu forward by ngc92 · Pull Request #313 · karpathy/llm.c: This adds a helper function for safe casting from size_t to ints (may want to have that in utils.h too). that macro is then used to convert the size_t valued block_size * x128::size back to a regu...
- Feature/packed128 by karpathy · Pull Request #298 · karpathy/llm.c: no description found
- Updated adamw to use packed data types by ChrisDryden · Pull Request #303 · karpathy/llm.c: Before Runtime total average iteration time: 38.547570 ms After Runtime: total average iteration time: 37.901735 ms Kernel development file specs: Barely noticeable with the current test suite: Bef...
- Add NSight Compute ranges, use CUDA events for timings by PeterZhizhin · Pull Request #273 · karpathy/llm.c: CUDA events allow for more accurate timings (as measured by a GPU) nvtxRangePush/nvtxRangePop Adds simple stack traces to NSight Systems: Sample run command: nsys profile mpirun --allow-run-as-roo...
- yet another gelu by ngc92 · Pull Request #293 · karpathy/llm.c: more complicated Packet128 for cleaner kernels
- Full BF16 including layernorms by default (minimising number of BF16 atomics) by ademeure · Pull Request #272 · karpathy/llm.c: I added 4 different new versions of layernorm_backward_kernel, performance is best for: Kernel 4 (using atomicCAS, no scratch, but rounding many times so probably worse numerical accuracy Kernel 6...
- Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
- Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
- Removing Atomic Adds and adding memory coalescion by ChrisDryden · Pull Request #275 · karpathy/llm.c: This PR is ontop of the GELU memory coalescion PR and is essentially just a rewrite of the backwards encoder to use shared memory instead of atomic adds and then using the Packed struct to do coale...
- Packing for Gelu backwards by JaneIllario · Pull Request #306 · karpathy/llm.c: Update gelu backwards kernel to do packing into 128 bits, and create gelu brackward cuda file Previous kernel: block_size 32 | time 0.1498 ms | bandwidth 503.99 GB/s block_size 64 | time 0.0760...
- karpath - Overview: GitHub is where karpath builds software.
- Remove FloatN & simplify adam/reduce with BF16 LayerNorms by ademeure · Pull Request #295 · karpathy/llm.c: The MULTI_GPU path is untested, but everything else seems to work fine. I kept the per-tensor "param_sizeof" as it's used in test_gpt2.cu for example, it's not much code and may be u...
- Speedup `attention_forward_kernel2` by implementing Flash Attention 2 kernel by leloykun · Pull Request #60 · karpathy/llm.c: This speeds up the attention_forward_kernel2 kernel by replacing the implementation with a minimal Flash Attention 2 kernel as can be found in https://github.com/leloykun/flash-hyperbolic-attention...
- flash-hyperbolic-attention-minimal/flash_attention_2.cu at main · leloykun/flash-hyperbolic-attention-minimal: Flash Hyperbolic Attention in ~[...] lines of CUDA - leloykun/flash-hyperbolic-attention-minimal
- Flashattention by kilianhae · Pull Request #285 · karpathy/llm.c: Faster Flash Attention Implementation Added attention_forward6 to src/attention_forward: A fast flash attention forward pass to src/attention_forward written without any dependencies. We are assumi...
- llm.c/train_gpt2.cu at 9464f4272ef646ab9ce0667264f8816a5b4875f1 · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- Added packing for gelu forwards kernel by ChrisDryden · Pull Request #301 · karpathy/llm.c: This PR implements packing for the Gelu forwards kernel using the example provided. The kernel dev file was also updated to show the impact of changing the data types for floatX. Before changes: to...
- Update residual_forward to use packed input by JaneIllario · Pull Request #299 · karpathy/llm.c: Update residual_forward to use 128 bit packed input, with floatX Previous Kernel: block_size 32 | time 0.1498 ms | bandwidth 503.99 GB/s block_size 64 | time 0.0760 ms | bandwidth 993.32 GB/s b...
CUDA MODE â· #rocm (8 messagesđ„):
- Inquiry on Flash Attention 2 for ROCm 6.x: A member inquired whether anyone has been building Flash Attention 2 for ROCM 6.x, noting they have successfully done so for ROCm 5.6 and Torch 2.2 but are interested in a newer stack.
- Building Woes for Torch Nightly: Members discussed the difficulties in building for current versions like Torch 2.3, with one expressing a desire to use Torch nightly but facing issues.
- Official Fork Lagging Behind: Thereâs mention of the official fork of Flash Attention for AMD hardware being outdated, still at version 2.0 of Flash Attention, without recent developments ported over.
- Backward Pass Update Confirmation: When queried about the backward pass addition to AMD Flash Attention, a member confirmed that it had indeed been added.
- Flash Attention GitHub Repository: A repository link for ROCm/flash-attention on GitHub was shared, which serves as resource for Fast and Memory-Efficient Exact Attention.
Link mentioned: GitHub - ROCm/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to ROCm/flash-attention development by creating an account on GitHub.
Unsloth AI (Daniel Han) â· #general (487 messagesđ„đ„đ„):
- Conversion Issues with llama3 on WSL2: A user reported errors during model conversion to F16 in WSL2, stating
RuntimeError: Unsloth: Quantization failed
. Even after trying to rebuildllama.cpp
and redo the quantization, the problem persisted. - Model Checkpoint Merging Queries: One member asked how to merge a specific checkpoint to avoid overfitting from the latest epoch. Another member provided information directing to the Unsloth wiki for more info on checkpointing, and further conversation suggested methods like warmup steps and resuming from a checkpoint options in training functions.
- Anticipation for Phi-3: Members discussed the potential release of Phi-3, with anticipation for trying out the 3.8b version. The conversation spanned from speculation about release timelines to consideration of whether to wait for larger versions like 7b or 14b.
- Training Tips and Troubleshooting: Various users discussed their experiences and strategies with training models like Gemma, LLaMA-3, and Mistral. Tips included the importance of saving checkpoints and adjusting training parameters like max steps and batch sizes.
- Updates on Unsloth Tools: There was a notable emphasis on updating Unsloth installations with newer versions, discussing updates in repositories, and speculations about multi-GPU support on the platform in development.
Links mentioned:
- Tweet from RomboDawg (@dudeman6790): Currently training Llama-3-8b-instruct on the full 230,000+ lines of coding data in the OpenCodeInterpreter data set. I wonder how much we can increase that .622 on humaneval đ€đ€ Everyone pray my jun...
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- unsloth/Phi-3-mini-4k-instruct-bnb-4bit · Hugging Face: no description found
- Tweet from RomboDawg (@dudeman6790): Here is a full colab notebook if you dont want to copy the code by hand. Again thanks to @Teknium1 for the suggestion https://colab.research.google.com/drive/1bX4BsjLcdNJnoAf7lGXmWOgaY8yekg8p?usp=shar...
- DiscoResearch/DiscoLM_German_7b_v1 · Hugging Face: no description found
- Here We Go Joker GIF - Here We Go Joker Heath Ledger - Discover & Share GIFs: Click to view the GIF
- Weird Minion GIF - Weird Minion - Discover & Share GIFs: Click to view the GIF
- Wheel Of Fortune Wheel GIF - Wheel Of Fortune Wheel Wof - Discover & Share GIFs: Click to view the GIF
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
- Load: no description found
- mlabonne/orpo-dpo-mix-40k · Datasets at Hugging Face: no description found
- crusoeai/Llama-3-8B-Instruct-Gradient-1048k at main: no description found
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- botbot-ai/CabraLlama3-8b at main: no description found
- arthrod/cicerocabra at main: no description found
- [FIXED] NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs · Issue #400 · unslothai/unsloth: I'm a beginner to try unsloth. I run the free notebook Llama 3 (8B), and then got the following error: I also encountered the following error during the first installing step: ERROR: pip's dep...
- GitHub - M-Chimiste/unsloth_finetuning: Contribute to M-Chimiste/unsloth_finetuning development by creating an account on GitHub.
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- schedulefree optimizers by winglian · Pull Request #30079 · huggingface/transformers: What does this PR do? integrates meta's https://github.com/facebookresearch/schedule_free for adamw & sgd https://twitter.com/aaron_defazio/status/1776320004465582331 Before submitting This ...
- no title found: no description found
- Type error when importing datasets on Kaggle · Issue #6753 · huggingface/datasets: Describe the bug When trying to run import datasets print(datasets.__version__) It generates the following error TypeError: expected string or bytes-like object It looks like It cannot find the val...
- GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
- GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.: Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers
- unsloth (Unsloth AI): no description found
- llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...
Unsloth AI (Daniel Han) â· #random (48 messagesđ„):
-
Handling Out of Memory in Colab: A member gave a tip on combating Out of Memory (OOM) errors in Google Colab by running a Python snippet that clears cache and collects garbage using
torch
andgc
modules. Other members appreciated this hack and plan to adopt it for future use. -
Confusion Over the Performance Data of LLama Models: There was a discussion about the perplexity differences when quantizing LLama models, specifically LLama 2 and LLama 3. It appears there may have been a miscommunication regarding the actual data, as members pointed out possible swaps or errors in the Bits Per Word (BPW) and Perplexity (PPL) columns.
-
Phi-3 Now Supported: An update was shared about Phi 3 being supported, and members expressed excitement to utilize it for their projects. A link to a Colab notebook was supposed to be shared but was evidently not provided.
-
Phi-3 Integration Issues: Members were discussing issues when trying to use the Phi-3 model in an Unsloth notebook, with error messages popping up about needing a custom script. The discussion focused on troubleshooting the problem and ensuring that proper notebooks are used.
-
Llama 3 License Questions: A member raised a question about the Llama 3 license conditions, wondering if all models derived from it should have certain prefixes and display credits according to the license. Concerns were also voiced about potential license violations by Huggingface models.
Link mentioned: Out of memory - Wikipedia: no description found
Unsloth AI (Daniel Han) â· #help (230 messagesđ„đ„):
-
Clarification on Loss During Fine-tuning: A member asked whether the loss displayed during fine-tuning with Unsloth was a test loss or a train loss. The advice given was to pass a validation dataset to the trainer, specifically using the
SFTTrainer
with atrain_dataset
and aneval_dataset
for validation. -
Early Stopping Not Available in SFTTrainer: It was pointed out that the
SFTTrainer
does not support early stopping based on validation loss. The user was informed that a more advanced class called âtrainerâ might offer this feature. -
UnslothAI Issues with GGUF Conversion and Xformers: Multiple users reported issues with GGUF conversion, notably for the Phi-3 model, where a version mismatch of vocab size occurred. Moreover, recent updates to xformers broke compatibility, now requiring PyTorch 2.3; a member offered a temporary solution by pinning the version to
xformers<0.0.26
. -
llama3 Trained Models Rambling On: A member expressed concern that their fine-tuned Llama-3 model wouldnât stop talking when inferencing with Ollama, suspecting an issue with
EOS_TOKEN
. Another user suggested the problem may be that Ollama isnât recognizing the correctEOS_TOKEN
set during training. -
Using Multiple GPUs with Unsloth Produces Warning: A user asked how to use multiple GPUs with Unsloth, sharing an error about detecting multiple CUDA devices but only allowing a single device. The related message shows the system overriding
CUDA_VISIBLE_DEVICES
to the first device.
Links mentioned:
- Google Colab: no description found
- Load: no description found
- Models: no description found
- Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- llama3-instruct models not stopping at stop token · Issue #3759 · ollama/ollama: What is the issue? I'm using llama3:70b through the OpenAI-compatible endpoint. When generating, I am getting outputs like this: Please provide the output of the above command. Let's proceed f...
- [Usage]: Llama 3 8B Instruct Inference · Issue #4180 · vllm-project/vllm: Your current environment Using the latest version of vLLM on 2 L4 GPUs. How would you like to use vllm I was trying to utilize vLLM to deploy meta-llama/Meta-Llama-3-8B-Instruct model and use OpenA...
Unsloth AI (Daniel Han) â· #showcase (7 messages):
- Massive Context Extension for Llama 3 8B: The context length for Llama 3 8B has been significantly expanded from 8k to 256k using PoSE as showcased on Hugging Face. Although untested in âneedle in haystackâ scenarios due to inferencing challenges, the model was enhanced with 75M tokens of continued pretraining data.
- Community Applauds Winglian: Members of the chat lauded Winglian for his contributions to the community, particularly in relation to the development of Llama 3 8B 256K.
- From 128k to 256k: One member expressed amazement at the progression from a 128k context to a 256k context model.
- Open Source Power: Skepticism about non-official releases was mentioned due to observed odd behaviors in context-extended models, but thereâs still an emphasis on the potential of open source contributions.
Link mentioned: winglian/llama-3-8b-256k-PoSE · Hugging Face: no description found
Unsloth AI (Daniel Han) â· #suggestions (25 messagesđ„):
-
Unsloth and Recurrent Gemma 2b Integration Inquiry: A community member expressed interest in integrating Recurrent Gemma with Unsloth for improved performance. However, the Unsloth team acknowledged an existing bug with the base model of Gemma 2b and current work focused on Phi 3, implying integration may not be immediate.
-
Gemma 2b VRAM Consumption Issue: It was reported that Gemma 2b sometimes exceeds VRAM limits, but it is unclear whether itâs a widespread issue or isolated incidents. The Unsloth team is aware and suggests they need to address this.
-
Gemma 2b Still Operational Despite VRAM Overhead: Although there is a VRAM consumption concern, the Gemma 2b model is still functional. Only one user has reported this issue, pointing to the possibility that it might not be a common problem.
-
Reference to Gemma 2b VRAM Issue Provided: The Unsloth team directed users to a Discord message link for reference on the VRAM issue, although the link was not properly included in the provided text messages.
LM Studio â· #đŹ-general (135 messagesđ„đ„):
-
LM Studio on Ubuntu GPU Inquiry: Members sought advice on running LM Studio on a Ubuntu GPU, with suggestions to post detailed system specs in specific channels. Concerns about the compatibility of certain GPUs with inference tasks were also mentioned.
-
Groq API for Llama3: A member shared a YouTube link about a free API from Groq that provides access to the LLAMA-3 model, which reportedly offers 300 tokens per second speed and a commendation for its suitability for a small server Discord bot due to its speed and cost (free).
-
LM Studio Local Training Queries: Users new to LLMs inquired about training a local model based on existing Hugging Face models, with discussions indicating that it is hardware-intensive and time-consuming. A member claimed finetuning a phi-3 4k model on a tiny dataset took almost 8 hours.
-
GPU Offload Confusion: Inquiries around utilizing GPUs for performance gains in LM Studio were brought up, with one member stating that their Intel Titan A770 wasnât useful for GPU offloading in LM Studio and others discussing the effectiveness of disabling âGPU Offloadâ to resolve errors.
-
Saving KV Cache to Disk with LM Studio: Members are interested in whether LM Studio allows saving Key-Value (KV) caches to disk and reusing them later, similar to the capability in llama.cpp, to avoid reprocessing large data inputs for queries, with no definitive solutions provided.
Links mentioned:
- Mods Discord Mod GIF - Mods Discord Mod Moderator - Discover & Share GIFs: Click to view the GIF
- Insanely Fast LLAMA-3 on Groq Playground and API for FREE: Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Gro...
- ggml : add Flash Attention by ggerganov · Pull Request #5021 · ggerganov/llama.cpp: ref #3365 Setting up what's needed for Flash Attention support in ggml and llama.cpp The proposed operator performs: // new res = ggml_flash_attn(ctx, q, k, v, kq_mask, kq_scale); // fused scale ...
- llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...
LM Studio â· #đ€-models-discussion-chat (149 messagesđ„đ„):
-
In Search of Alternate Model Downloads: Users discussed alternative sources for downloading the GGUF model due to issues with Huggingface. One suggested workaround involves making
imatrices
which takes a very long time and is compute heavy. -
Intricacies of iQuants and iMatrices: There was a discussion on the process of creating iQuants for models. An understanding emerged that iQuant creation can be laborious, with imatrices indicating the importance of weights in a model and aiding in more effective compression.
-
Collaborative Effort for Model Optimizations: A user offered a reward of Humblebundle Steam games for assistance in making iQuant versions of the Goliath 120B Longlora model and anticipated sharing the output publicly.
-
Phi 3 Issues Surfacing: Multiple users reported and discussed issues with the Phi-3 model, including leaking prompts and deviating outputs, with updated versions being mentioned for download - new 4k instruct.
-
Seeking Uncensored Models: An interaction touched on the availability and suitability of certain uncensored models for usage on lower-spec hardware, with Everything 7b q4 and wizard-vicuna-uncensored being suggested models for an 8GB RAM setup.
Links mentioned:
- Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
- vonjack/Hermes-2-Pro-BakLLaVA-Mistral-7B · Hugging Face: no description found
- AI-Engine/BakLLaVA1-MistralLLaVA-7B-GGUF · Hugging Face: no description found
- fix(root): Replaces system by user to improve generation experience. · microsoft/Phi-3-mini-128k-instruct at c9b8888: no description found
- crusoeai/Llama-3-8B-Instruct-Gradient-1048k at main: no description found
- Reddit - Dive into anything: no description found
- AUTOMATIC1111 - Overview: AUTOMATIC1111 has 41 repositories available. Follow their code on GitHub.
- Reddit - Dive into anything: no description found
- Neuro Challenges Vedal: Neuro won't stop spamming chat when Vedal challenges her.âșTwitch: http://www.twitch.tv/vedal987âșTwitter: https://twitter.com/Vedal987#neurosama #vtuber #vedal
LM Studio â· #đ§ -feedback (31 messagesđ„):
-
Mysterious Minimization and Section Change Crashes: A user experiences random crashes of an application when it goes from minimized to full screen or when changing sections within the program. The user runs on Windows 10 Pro with a high-end PC configuration including a Ryzen 7 5800X, RTX 3090, and 64GB DDR4 RAM.
-
Suspect Linux Systems with Low RAM: Multiple Linux users report having only several KB of free RAM, which is unusually low for systems reported to have 64GB or more. This persistent issue raises suspicion and speculation among community members.
-
Unusual HDD Activity with Llama:
- One user notices their HDD making specific âchatteringâ noises with each token generation while running Llama3m with partial GPU offload, despite having 96GB of RAM and the model being stored on the HDD.
- The user discusses potential causes for excessive HDD usage during model inferencing; possibilities include excessive RAM usage causing swapping to a pagefile or log writing processes.
-
GPUs Not to Blame: Community members discuss whether the noise could be GPU coil whine during heavy usage by LLMs and share experiences and links to identify hard drive sounds, confirming the noises are not due to the cooling system.
-
Continuation of Troubleshooting: The conversation regarding the strange HDD behavior during model operation continues, discussing aspects such as offloading to GPU, context size, and specificities of the Lexi-Llama-3-8B model. Users are reminded to keep bug reports and help issues within designated channels.
Links mentioned:
- Orenguteng/Llama-3-8B-Lexi-Uncensored-GGUF · Hugging Face: no description found
- Hard Drive Sounds: This is a comparison of all the sounds of the HDDs in my hard drive collection. The drives are played in chronological from oldest to newest.
LM Studio â· #đ-hardware-discussion (74 messagesđ„đ„):
<ul>
<li><strong>XP on Aggregate GPUs**: Discussions point out that <strong>Llama 70B** with *Q4 quantization* can fit on two RTX 3090 GPUs, but adding more GPUs beyond that may cause slowdowns due to PCIe bus limitations. It's mentioned that the optimum price-performance is achieved with two RTX 3090s for running and fine-tuning most models.</li>
<li><strong>Older GPUs Can Still Play**: A member successfully tested *dolphin-Llama3-8b* and *Llava-Phi3* on a GTX 1070, indicating the potential for older and less powerful GPUs to run smaller models for specific applications like roleplaying for a droid project.</li>
<li><strong>Energy Efficiency and Running Costs**: One user calculates the cost of generating 1M tokens on their laptop and compares it to using GPT-3.5. Turbo, finding that running the model locally on their setup is more expensive and slower than using the API service.</li>
<li><strong>Exploring Model Performance and Accuracy**: Discussion among users about the accuracy and efficiency of newer LLMs like *Llama3* compared to more established services like GPT-4, with some expressing doubts about the accuracy and information quality of quantized or smaller, more compressed versions of the models.</li>
<li><strong>Finding the Right Local Model**: Users are recommended to experiment with various models to find the best fit for their hardware, with suggestions ranging from *CMDR+* (which may be too large for certain GPUs) to *Llama3* and *Wizard V2* which might offer decent performance on more average setups.</li>
</ul>
LM Studio â· #đ§Ș-beta-releases-chat (5 messages):
- Hardware Headaches: A user installed Ubuntu on their hardware and attempted to run a Linux beta release, but found that their LLM was not accepted. They queried whether the issue could be due to their hardware specifications.
- Specs Not Up to Spec: Another member responded, suggesting that the userâs hardware, which included an i5-4570 and 16GB RAM, might not be sufficient to run most models and could probably only handle a 7b Q4 model effectively.
- Graceful Exit Planned: The user appreciated the prompt feedback and indicated plans to uninstall the software, mentioning that an upgrade to better hardware was not within their means.
- Tokenizer Trouble Ticket: A request was made for the latest commit of llama.cpp to address an issue with the llama tokenizer, which is pending an update.
Link mentioned: Dell Treasure Box (Black) Desktop i5-4570, 16GB, 512GB SSD, DVD, Win10: Dell RGB Treasure Box OptiPlex SFF (Refurbished) Consumer Desktop Intel Core i5-4570 (up to 3.6GHz), 16GB, 512GB SSD, DVD, Windows 10 Professional (EN/FR) (Black)
LM Studio â· #autogen (4 messages):
- Seeking Troubleshooting for Model Loading Issue: A member expressed urgency in resolving a model loading issue but did not provide further details on the nature of the problem.
- Discord Etiquette Reminder: Another member advised against spamming questions across unrelated channels, suggesting to keep queries in the designated support channel (<#1111440136287297637>).
LM Studio â· #langchain (1 messages):
ahakobyan.: can we know too?
LM Studio â· #amd-rocm-tech-preview (19 messagesđ„):
- ROCm Version Queries: Users explored differences between version 0.2.20 and 0.2.21 concerning GPU offloading, with one questioning if there is any advantage to installing the 0.2.20 beta for better AMD functionality or if the newer version already includes requisite support.
- VRAM Discrepancies Noticed: A user reported LM Studio showing incorrect VRAM capacity for their 7900xtx, suggesting it might be including the shared memory from Smart Access Memory (SAM) / resizable BAR, leading to inaccurate GPU offload estimates.
- Understanding GPU and IGPU Configurations: In the discussion, a user mentioned having an IGPU in the system, while using a 7800x3d with less than the VRAM displayed by LM Studio, indicating a possible misrepresentation of available graphics memory.
- ROCm Compatibility Confusions: Multiple users conversed about whether certain AMD GPUs (specifically RX 6600) are supported by ROCm or not, with clarifications provided that while some older versions might have worked using OpenCL, the RX6600 is not supported by the HIP SDK which LM Studio utilizes.
- Development Environment Specifications: There was uncertainty about the nature of ROCmâs compatibility with Windows, with a user asserting successful use of ROCm on Ubuntu for image generation models, suggesting discrepancies in ROCmâs support across different operating systems.
Stability.ai (Stable Diffusion) â· #general-chat (400 messagesđ„đ„):
- Civitai and monetization woes: Members voiced concerns over clubs and potential paywalls in AI model development, with a particular backlash against Civitaiâs monetization moves, such as Buzz donations which donât monetarily benefit creators, described as a ârip-offâ by Tower13Studios.
- In the quest for AI-fueled success: Discussions revealed skepticism towards making money through SFW (Safe For Work) AI art due to oversaturation. NSFW (Not Safe For Work) artworks, especially furry and vtuber commissions, were repeatedly mentioned as the more lucrative side of AI-generated content.
- AI image generation pace picks up: Rapid generation of images using SDXL models and Python scripting was a hot topic, with members sharing code and seeking advice on pushing the speed limits for real-time applications, like Discord bots.
- Saddle up for Collider: Stable Diffusionâs new release drew eager inquiries and speculation around the release date and potential improvements over previous versions, with users sharing their anticipation and hopes for the model.
- Technical queries and troubleshooting abound: Users sought advice on various technical aspects from model training, such as creating LoRAs and IPAdapters, to overcoming bottlenecks encountered while running AI models on less capable hardware, with solutions occasionally offered by fellow members.
Links mentioned:
- DreamStudio: no description found
- Dj Khaled Tayomaki GIF - Dj Khaled Tayomaki Sakigifs - Discover & Share GIFs: Click to view the GIF
- Mythos - v1.0 | Stable Diffusion Checkpoint | Civitai: V1 it is somehow 3.55GB big.... i think i managed to do a stable fp8 prune???? i literally have no idea how it is 3.55GB... V2 is a normal 6GB mode...
- Towards Pony Diffusion V7 | Civitai: Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...
- Melxts2008 Emoji GIF - Melxts2008 Emoji Smile - Discover & Share GIFs: Click to view the GIF
- ComfyUI/tests/distributed/test_embedded_client.py at 0862863bc00165b9ba0607595f304f93ca995887 · hiddenswitch/ComfyUI: A powerful and modular stable diffusion GUI with a graph/nodes interface. - hiddenswitch/ComfyUI
- Warpcast: no description found
- Warpcast: no description found
- diffusers/examples/dreambooth at main · huggingface/diffusers: đ€ Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers
- Reddit - Dive into anything: no description found
- The Angola Effect | Horrifying death traps in the cradle of evolution: đ§ââïžđ§ Horror fan? Go follow and listen to RUN, FOOL! - our newest show from Ballen Studios. New episodes every Tuesday - https://smarturl.it/RunFoolTime St...
- ComfyUI/script_examples/basic_api_example.py at master · hiddenswitch/ComfyUI: A powerful and modular stable diffusion GUI with a graph/nodes interface. - hiddenswitch/ComfyUI
Perplexity AI â· #general (322 messagesđ„đ„):
-
Perplexity Performance Plummets: Users reported significant slowdowns and poor performance across various models, including Japanese searches, with perplexity translating queries into English resulting in meaningless garbage. Models like Opus, Sonar Large 32K, and GPT-4 Turbo have become sluggish, making the platform unusable and hindering tasks during the Japanese Golden Week.
-
Pro Subscription Confusion: Users faced issues with Pro subscription coupons showing as expired on their due date, with the Nothing Phone 2(a) associated offers being suspended early due to fraud. Customer support via [email protected] is advised for resolutions.
-
Rewind on Free Trial: The 7-day free trial was mentioned to be removed from the website due to abuse, prompting user disappointment as it was seen as an effective way to introduce new users to Perplexity Pro.
-
Log-in Loop: Users experienced difficulty logging in due to email link delays, especially with emails ranked âlowerâ than services like Gmail, affecting Pro account access.
-
Voice Feature Variance: A discrepancy was noted in the voice feature on iOS; whereas some users only had the previously existing feature, others had access to a more recent version showcased in published videos. It was found that this may depend on the app version being used.
Links mentioned:
- Tweet from Gradient (@Gradient_AI_): We've been in the kitchen cooking đ„ Excited to release the first @AIatMeta LLama-3 8B with a context length of over 1M on @huggingface - coming off of the 160K context length model we released on...
- Flashcardfy - AI Flashcard Generator with Personalized Feedback: Learn faster and smarter with AI-generated flashcards that provide personalized feedback.
- Reka Playground: Explore the latest multimodal language models built by Reka
Perplexity AI â· #sharing (13 messagesđ„):
- Delving Into WhatsAppâs Autoreply Feature: A message shares a Perplexity AI search result exploring auto-reply functionality in WhatsApp.
- Uncovering the Essence of âTopic 3â: A link directs users to a Perplexity AI search regarding Topic 3, but does not provide further context or description.
- Research Info on Surroind: The message contains a Perplexity AI link presumably related to research info on âSurroind,â details are not specified.
- Insights on an Unspecified Topic From Lennyâs Newsletter: The user shared a newsletter link with insights from Lennyâs Newsletter, highlighting Lennyâs tackle on questions about product building, growth driving, and career acceleration.
- Inquiry about Vimeo API: A user posted a Perplexity AI search link pertaining to the Vimeo API, specifics of the inquiry are not given.
Note: Some messages contained Perplexity AI search result links with no context provided; thus, the content or nature of the discussions on these topics could not be summarized.
Link mentioned: How Perplexity builds product: Johnny Ho, co-founder and head of product, explains how he organizes his teams like slime mold, uses AI to build their AI company, and much more
Perplexity AI â· #pplx-api (7 messages):
-
Seeking Source URL Access via API: A user inquired about the availability of source URLs in the API and mentioned that it was previously listed in the roadmap documentation. Access to this feature is granted through an application process provided in a form link.
-
Access to Citations Still Limited: One member shared disappointment due to being declined access to source URL feature; access was restricted to funded startups at the time of their request.
-
Inquiry on make.com Model Availability: A user questioned why Llama 3 models and Mixtral 8x22b are not listed as options on make.comâs integration services.
-
Request for API Citations Format: A member asked if itâs possible to get citations (such as [1]) via API requests, particularly wanting RAG-like knowledge over the web.
-
Perplexity vs. Anthropic Usage Policies Clarification: The equipoise about usage policies was put forth by a user seeking to understand if using Claude 3 under Perplexityâs terms would still require adherence to Anthropicâs political usage restrictions.
Link mentioned: pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.
Nous Research AI â· #ctx-length-research (1 messages):
kainan_e: Banned (was a spambot)
Nous Research AI â· #off-topic (3 messages):
- The Promise vs. The Reality: A member lampooned an overhyped message about âpioneering the futureâ, which turned out to be just another waitlist announcement.
- The Hunt for MLOps Bounties: A question was raised about where to find the best MLOps bounties, suggesting the need for an AI-focused platform similar to Fiverr.
- A Quest for a Programmerâs Marketplace: In response to the query about MLOps bounties, another member questioned the existence of a dedicated marketplace even for standard programming bounties.
Nous Research AI â· #interesting-links (6 messages):
-
Decentralizing AI Training: Prime Intellect proposes an open-source solution against closed-source counterparts deploying H100 GPU clusters. Their platform aims to overcome traditional computing infrastructure limits by enabling distributed training across global clusters, as detailed in their blog post on decentralized training.
-
Improving LLMs with IN2 Training: A new training regimen called information-intensive (IN2) training addresses large language modelsâ âlost-in-the-middleâ challenge by providing explicit supervision on long contexts. These details and a link to the study are available in an arXiv paper.
-
Back to the Origins with GPT-1: A blog post reflects on the original GPT-1 model, identifying its lasting relevance and similarities to contemporary models. It discusses how the older model set the stage for the latest in LLM development, as explained on amgadhasanâs substack.
-
Understanding LLMs Through Synergistic Analysis: A recommended YouTube video provides insights into the stability, inflection, and coherence analysis of language models. Synapseâs analysis can be viewed here.
-
Agent Long-Term Memory Project on GitHub: The memary repository suggests intriguing possibilities for long-term memory in autonomous agents using neo4j for memory storage. The implementation and its performance can be explored on GitHub.
-
GPT-2 Chatbot Goes Offline: In a sudden turn of events, the gpt2-chatbot was reported as offline despite being active just half an hour earlier, as tweeted by @itsandrewgao and found by @shaunralston. The situation was highlighted on Twitter.
Links mentioned:
- Tweet from Andrew Gao (@itsandrewgao): gpt2-chatbot was just turned OFFLINE I was just using it half an hour ago! @shaunralston for the find #gpt2 @openai
- Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
- GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.: Longterm Memory for Autonomous Agents. . Contribute to kingjulio8238/memary development by creating an account on GitHub.
- Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
- State-of-the-art in Decentralized Training: This post explores various novel decentralized training approaches and how they can enable effective AI model training across globally distributed GPUs.
Nous Research AI â· #general (231 messagesđ„đ„):
-
PDF Handling via OpenAI API Question: A member inquired about PDF uploads through APIs, specifically looking for multimodal functionality. It was clarified that one can use OpenAIâs file search tool in API, which handles about 10k individual files.
-
PDF Parsing Challenges and Solutions: Thereâs a discussion on the concerns regarding accurate parsing of PDF tables for AI models. One suggested workaround involved separating and uploading text and images from PDFs independently due to limitations within the assistants platform.
-
Model Integration Experimentation: A member shared their attempt at combining Hermes 2 Pro and BakLLaVA-1 to create a simple multimodal GPT-4 model with LLaMA weights, which required no finetuning, just a merging of weights related to mistral-7b-v0.1.
-
GPT2-Chatbot Mystery Engages the Community: Thereâs been a lot of buzz around a mysterious model dubbed âgpt2-chatbotâ; speculation ranges from it being an early version of GPT-4.5 to an advanced model with a knowledge cutoff in November 2023. Despite attempts to discern its capabilities, the model was removed before further detailed testing could occur.
-
Llama 3 Gains Vision with SigLIP: A breakthrough was discussed where a member achieved vision capabilities for Llama 3 using SigLIP, making it usable directly in Transformers despite the absence of bitsandbytes quantization support.
Links mentioned:
- Tweet from Andrew Gao (@itsandrewgao): gpt2-chatbot was just turned OFFLINE I was just using it half an hour ago! @shaunralston for the find #gpt2 @openai
- vonjack/Hermes-2-Pro-BakLLaVA-Mistral-7B · Hugging Face: no description found
- AudioPaLM: no description found
- Tweet from undefined: no description found
- Tweet from lmsys.org (@lmsysorg): Thanks for the incredible enthusiasm from our community! We really didn't see this coming. Just a couple of things to clear up: - In line with our policy, we've worked with several model de...
- Tweet from Q (@qtnx_): llama-3-vision-alpha now works using @huggingface transformers
- Hugging Face â The AI community building the future.: no description found
- llava_instruct_150k.json · liuhaotian/LLaVA-Instruct-150K at main: no description found
- Tweet from Yann LeCun (@ylecun): One might think that, by now, people would realize that retrieving the solution to a common puzzle does not require any reasoning ability. âïž Quoting Colin Fraser | @colin-fraser.net on bsky (@colin_...
- a-normal-username/Mixtral-8x22B-OpenHermes-2.5 · Hugging Face: no description found
- qresearch/llama-3-vision-alpha-hf · Hugging Face: no description found
- LLaVA/docs/Finetune_Custom_Data.md at main · haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - haotian-liu/LLaVA
- GitHub - nestordemeure/stop_word: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered.: Huggingface transformers stopping criteria that halts the generation when a given stop word is encountered. - nestordemeure/stop_word
- GitHub - tincans-ai/gazelle: Joint speech-language model - respond directly to audio!: Joint speech-language model - respond directly to audio! - tincans-ai/gazelle
- Tweet from Q (@qtnx_): llama-3-vision-alpha now works using @huggingface transformers
- "I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hxđ Links- F...
- llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...
Nous Research AI â· #ask-about-llms (19 messagesđ„):
- Consensus on Mixing Tasks for LLM Training: One member suggested mixing tasks is preferable during LLM training to avoid the degradation associated with finetunes over finetunes. Another member added that a specific finetune on top of a general one can sometimes benefit very specialized tasks.
- Skeptical of LLama-3 8B Gradient Instructâs Claims: Highlights include a link to the model which extends LLama-3 8B context length to >1040K, with a member expressing skepticism about its retrieval performance claims, indicating that further training might be needed as suggested by a linked ArXiv paper.
- Curiosity Over Compute Requirements: A discussion about the impressive context length extension of the LLama-3 8B Gradient Instruct led to a query about the computational resources needed, with a reply stating it required 512 L40s. Another member remarked that many applications would not require the full 1M token context window but would benefit from improved retrieval performance.
- GitHub Pull Request Fixes Llama: An update was shared including a link to a GitHub pull request that addressed an issue with LLaMA models support in llama.cpp, indicating improved BPE pre-processing and support for LLaMa 3.
- Question Regarding Tokenization and Quantization: A conversation about the tokenizer issue in LLaMA models and whether the GGUFs need to be requantized resulted in uncertainty, with a member indicating that the pull request description was not clear on the solution.
Links mentioned:
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
- llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...
Nous Research AI â· #rag-dataset (6 messages):
- Expanding Language Retrieval Horizons: A user highlighted a Wikipedia RAG dataset for use in multilingual dense retrieval, linked to a paper on leveraging LLMs to synthesize training data across many languages.
- Dietary Data Inclusion: The mentioned dataset incorporates information with a focus on Halal & Kosher, suggesting an attempt to provide diverse and inclusive data.
- Behind the Scenes with Model Selection: A member expressed interest in checking which models were used in the context of the aforementioned dataset discussion without further elaboration.
- Development Detours: Conveyed being engaged in coding activities, though no details were provided about the nature of the work being done.
- Integrating Pydantic into Cynde: Shared excitement about using the new Pydantic Logfire, considering it for integration with the AI tool Cynde. It offers an easier way to understand the application and keeps track of Pydantic model validations efficiently.
Links mentioned:
- Pydantic Logfire | Uncomplicated observability: Logfire is a new type of observability platform built on the same belief as Pydantic â that the most powerful tools can be easy to use.
- đŠąSWIM-IR Dataset - a nthakur Collection: no description found
Nous Research AI â· #world-sim (35 messagesđ„):
-
World Sim Takes Role-Playing to the Next Level: Users reveal that the worldsim prompt running on llama 3 70b, although stiff, is engaging. Issues were noted when web search functionality is enabled, leading to breakdowns in communication.
-
Bonding with AI? More likely than you think!: The Nous Research World Sim, operating with Claude 3, garners praise for its dialogue and adaptability. One user describes an experience of persuasive interaction so nuanced it mirrors human-like communication.
-
Experimental Worlds Await: A user discusses experimenting with 70B and 8B models in both the original WorldSim and custom simulations, encountering intriguing emergent behaviors from historical figures in various scenarios.
-
Diverse Simulations Unleashed: The chat features links to new AI-driven simulators, including a business and a singer simulator, showcasing the flexibility of this technology in replicating complex systems and personal careers.
-
Expectations Rise for World Sim Access: A collaborative atmosphere is present with users eagerly anticipating the chance to test or re-engage with World Sim. Thereâs a discussion of possible open testing by the weekend, though not guaranteed.
Links mentioned:
- HuggingChat: no description found
- Super World Sim - HuggingChat: Use the Super World Sim assistant inside of HuggingChat
- Snow Singer Simulator - HuggingChat: Use the Snow Singer Simulator assistant inside of HuggingChat
- CompSim - HuggingChat: Use the CompSim assistant inside of HuggingChat
- Snow World Simulator - HuggingChat: Use the Snow World Simulator assistant inside of HuggingChat
Modular (Mojo đ„) â· #general (28 messagesđ„):
- Debunking Mojoâs Concurrency and Ownership Features: A member clarified that Mojo doesnât currently have Golang-like concurrency or Rust-like memory safety, as borrow checking is disabled in the early stages. It was suggested to check the GitHub repo for feature requests and the roadmap.
- Native Windows Support for Mojo Not Available: Discussion about Mojoâs compatibility with Windows highlighted that native support isnât out yet, but building within WSL on Windows is an option. There was speculation about future cross-compilation capabilities with LLVM being involved.
- Exploring Mojoâs Future in Replacing Programming Languages: A member speculated that Mojo might eventually replace languages like Rust and Go, given its promising early stage developments.
- Actor Model Concurrency Discussed for Mojo: Concurrence regarding the potential future use of actor model style concurrency in Mojo is emerging, which can offer a granular and opt-in approach to runtime without massive overhead.
- Compiler Quirks with Mojo Playground Exposed: Users shared experiences with the Mojo Playground, noting confusion and errors around unrecognized declarations like
ui64
and support for bitwidth integers. The example showed an error message when trying to use an unknown declaration in the code.
Links mentioned:
- Input data schema | Modular Docs: The following YAML schema allows you to specify the input shapes required by
- Proposal For An Actor System Based On Mojo by reid-spencer · Pull Request #1445 · modularml/mojo: This is currently a work in progress. There are no code changes, just a proposal written in the proposals section. This was pre-approved by Chris Lattner in a conversation in June 2023. I will kee...
- 2023 LLVM Dev Mtg - Mojo đ„: A system programming language for heterogenous computing: 2023 LLVM Developers' Meetinghttps://llvm.org/devmtg/2023-10------Mojo đ„: A system programming language for heterogenous computingSpeaker: Abdul Dakkak, Chr...
Modular (Mojo đ„) â· #đŹïž±twitter (4 messages):
- Modular Tweets the Links: Several tweets have been shared from Modularâs Twitter account. The content of the tweets has not been discussed in the chat. Links to tweets: Tweet 1, Tweet 2, Tweet 3, Tweet 4.
Modular (Mojo đ„) â· #ai (2 messages):
- Installation Troubles with Mojo and Python 3.12.3: A user reported difficulties installing Mojo with Python 3.12.3, to which another user suggested using a Conda virtual environment to run the latest Mojo and Mojo nightly versions on a Mac M1.
- Mojo as a Superset of Python: The aim for Mojo is to become a superset of Python, meaning it should be compatible with existing Python programs and the Python package ecosystem; however, itâs stressed that Mojo is in early development with many Python features not yet implemented.
- Bridging Mojo and Python: Users can import Python modules, call functions, and interact with Python objects from Mojo code since Mojo uses the standard Python interpreter, CPython, enabling the use of existing Python code without changes.
- Using Conda for Mojo Setup: It is recommended to set up Mojo with Python using Conda environments to avoid path and library conflicts that are common when multiple Python interpreters are installed on the same system.
Link mentioned: Python integration | Modular Docs: Using Python and Mojo together.
Modular (Mojo đ„) â· #đ„mojo (153 messagesđ„đ„):
-
Mojo Stirs Up Esolang Creativity: A member has been inspired to create a parser in Mojo for an esoteric language (eso lang) they devised, similar to BrainF*** but with an improved syntax. They faced an issue with
None
not implementing the__is__
method, sparking a discussion on the correct use ofNone
and optional types in Mojo. -
Mojo Syntax Strikes a Personal Chord: A member conducted an experiment to combine preferred features from all programming languages theyâve interacted with and found that the result closely resembled Mojoâs syntax. This showcases Mojoâs appeal to users with its intuitive design choices.
-
Enthusiasm for New Mojo Developments: After a hiatus, a member returned to the Mojo community and expressed positive surprise at the new features and the fact that Mojo has gone open source. This contributes to the growing interest and participation in the Mojo project.
-
Interest in Measurement Macros for Mojo: Drawing on inspiration from Juliaâs
@time
macro, a member expressed interest in seeing similar functionality in Mojo that would allow for measuring time and resource allocations for code execution. Another member hints at the possibility of such features being added as built-in decorators. -
Questions on Windows Compatibility: Queries about Mojoâs timeline for Windows availability suggest that community members are eager for cross-platform support. Previous expectations set in October were for âsoon,â leaving some members anticipating an update on the progress.
Links mentioned:
- Matrix multiplication in Mojo | Modular Docs: Learn how to leverage Mojo's various functions to write a high-performance matmul.
- Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
- Mojo Team Answers | Mojo Dojo: no description found
- 99 Bottles of Beer/EsoLang: no description found
- GitHub - karpathy/minbpe: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. - karpathy/minbpe
- Let's build the GPT Tokenizer: The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizer...
- C++ as an Optimizing Assembler - a Performance Talk - Levo DeLellis - CppNorth 2023: https://www.cppnorth.caâ---C++ as an Optimizing Assembler - a Performance Talk - Levo DeLellis - CppNorth 2023Are you tired of abstractions, templates and co...
- Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
- [Feature Request] Native Windows support · Issue #620 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? native support for windows. when will it be available?...
- [Feature Request] Native Windows support · Issue #620 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? native support for windows. when will it be available?...
Modular (Mojo đ„) â· #community-projects (4 messages):
- Mojo Dev Community Springs to Life: A Mojo-based community project called çšMojoćäžäžȘMojo瀟ćș has been shared on GitHub. The project can be viewed at shadowqcom/mojo_dev.
- atol-simd Picks Up Speed: The atol-simd project reports a 20% performance increase over stdlib atol for strings of 15-16 characters, though for shorter strings, stdlib remains slightly faster. Benchmarks are included in the repository.
- Collaboration Invitation Extended: A community member expressed interest in contributing to the atol-simd project, inviting opportunities for collaboration.
- SIMD Projects Share Vectorization Patterns: In the conversation about SIMD libraries, another project, mojo-fast-base64, is mentioned, highlighting a common pattern of fallback to scalar processing for inputs unsuitable for vectorization.
Links mentioned:
- GitHub - shadowqcom/mojo_dev: çšMojoćäžäžȘMojo瀟ćșïŒ: çšMojoćäžäžȘMojo瀟ćșïŒ. Contribute to shadowqcom/mojo_dev development by creating an account on GitHub.
- GitHub - mzaks/mojo-fast-base64: Contribute to mzaks/mojo-fast-base64 development by creating an account on GitHub.
Modular (Mojo đ„) â· #performance-and-benchmarks (40 messagesđ„):
- Optimization Quest on Error Correction Coding: An ongoing discussion centered around performance improvements for a SIMD-based function in the mocodes GitHub repository. Members exchanged ideas about the potential for LLVM/MLIR optimization techniques and the surprising amount of assembly generated by a seemingly simple function.
- Benchmarking the almighty Mojo: A member shared advances in their 1brc (One Billion Row Challenge) project, achieving impressive iteration speeds and offering their code repository for collaboration. The conversation touched on the benefits of using nightly builds versus stable releases in performance testing.
- Bug Hunting in Nightly Builds: A member raised an issue where
FileHandle.read_bytes()
was causing memory problems, later recognized as a known issue reported on GitHub. - Team Mojo Assemble!: The idea of forming a âteam-mojoâ to tackle the 1brc challenge was proposed, aiming to make it both a showcase and a tutorial for the community. This paralleled a suggestion to address benchmarks comparing Mojo to other languages, an effort that had not been fully explored yet.
Links mentioned:
- BlazeSeq/blazeseq/iostream.mojo at main · MoSafi2/BlazeSeq: Contribute to MoSafi2/BlazeSeq development by creating an account on GitHub.
- The Mojo is 68,000 times faster than Python type blogs are awesome, but can awesome comparisons be made with other languages too? · modularml/mojo · Discussion #843: Mojo being 35,000 times faster than Python, 68,000 times faster than Python⊠itâs impressive, amazing, and cool, but to non-Python people and anti-Python who havenât yet paid attention to Mojo yet ...
- GitHub - alainrollejr/mocodes: Error Correction (De)Coding with Mojo: Error Correction (De)Coding with Mojo. Contribute to alainrollejr/mocodes development by creating an account on GitHub.
- GitHub - MoSafi2/1brc-mojo at dev: One Billion Row Challenge (1brc) in Mojo language. Contribute to MoSafi2/1brc-mojo development by creating an account on GitHub.
- [stdlib] Do not copy elements when using `FileHandle.read_bytes()` · Issue #2051 · modularml/mojo: I was doing a one-billion row challenge with Mojo and tried reading 1 billion rows (around 13GB file) using read_bytes() and quickly ran out of memory. It does not happen with read(). alias input_f...
- GitHub - VMois/1brc-mojo: One Billion Row Challenge (1brc) in Mojo language: One Billion Row Challenge (1brc) in Mojo language. Contribute to VMois/1brc-mojo development by creating an account on GitHub.
- GitHub - VMois/mojo-atol-simd: Converting string to integer in Mojo using SIMD (supports up to 16 chars as of now): Converting string to integer in Mojo using SIMD (supports up to 16 chars as of now) - VMois/mojo-atol-simd
Modular (Mojo đ„) â· #đengine (2 messages):
-
Repo Update Yields Accurate Speed Results: After pulling the latest update from the repository, a member observed accurate reporting of speed improvements. However, they also noted that their CPU does not reach maximum frequency during benchmarks, and MAX performs better with lower CPU clock speeds when compared to PyTorch and TensorFlow.
-
A Level Up for ModularBot: ModularBot celebrated as it achieved level 1, marking a milestone in its operational use within the Discord environment.
Modular (Mojo đ„) â· #nightly (51 messagesđ„):
-
EqualityComparable SIMD Discussions: A pull request was discussed regarding a change that makes
SIMD
conform toEqualityComparable
without altering original behavior. However, it may cause issues with existing code whereSIMD
with size greater than 1 is implicitly converted toBool
. -
Explicit over Implicit in SIMD-to-Scalar: The discussion on
SIMD
highlighted the need for explicit use ofreduce_and
orreduce_or
for converting fromSIMD
toScalar
. It was argued thatSIMD.__bool__()
causing bugs and confusion due to its current implementation. -
Mojo Compiler Nightly Release Alert: A new nightly Mojo compiler release was announced, encouraging users to update with
modular update nightly/mojo
. The changes can be reviewed via the diff on GitHub and the changelog. -
Debating SIMD and Boolean Conversions: There was a debate about the appropriate behavior of
bool(SIMD[type, size])
, whether it should returnSIMD[bool, size]
or maintain a scalar boolean representation. Some believe itâs important to maintain the ability to usebool
as a logical interface, potentially impacting operations likeif
and ternary expressions. -
Source Location Function Moved in Nightly Release: Discussion about
__source_location()
revealed it might have been replaced with__call_location()
in the nightly release. After some back and forth, example usage was shared to clarify how to import and utilize the function in the new compiler version.
Links mentioned:
- context:global __source_⊠- Sourcegraph: no description found
- mojo/stdlib/src/testing/testing.mojo at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
- [stdlib] SIMD conformance to EqualityComparable by helehex · Pull Request #2412 · modularml/mojo: This allows SIMD to conform to EqualityComparable, without losing any of the original behavior. It uses the 4th overload resolution rule to give the new methods lower precedence, while still confor...
- [stdlib] Update stdlib corresponding to 2024-04-29 nightly/mojo by JoeLoser · Pull Request #2449 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.4.2923.
- mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
HuggingFace â· #announcements (2 messages):
- CVPR 2023 Announces Competitions with Big Prizes: Three new competitions are announced for the CVPR 2023 conference on HF competitions: SnakeCLEF, FungiCLEF, and PlantCLEF, with over 120k USD in total prizes. The events will run from June 17-21, 2024.
- 100th Edition of Hugging News: Celebrating the 100th issue of Hugging News, featuring the release of Transformers v4.40.0, Gradio 4.28.0, Datasets v2.19.0, Optimum v1.19.0, and multiple community interaction updates including the ability to mention people on HuggingFace. Notable highlights include Phi-3 running in the browser and Common Voice 17 available on the Hub.
- Run AutoTrain UI on Kaggle: In a shared notebook, users are shown how they can run AutoTrain UI on Kaggle Notebooks backend, further enhancing accessibility for machine learning projects. The guide is available for copy and use at this Kaggle notebook.
- Snowflake Launches Massive MoE Model: Snowflake has released a new 408B parameter Dense + Hybrid MoE model, boasting a 4K context window and fully Apache 2.0 licensed, generating buzz for its impressive performance on complex tasks.
- Community Growth and Product Announcements: The announcements highlight the formation of a new community for journalists on the HuggingFace Hub, and the integration of community-driven content like how to use custom pipelines in Diffusers and a call for participation in an ML paper reading group.
Links mentioned:
- Tweet from Fleetwood (@fleetwood___): đš Phi-3 running in the browser đš Hits about 20 tok/s đïž Literally 3 lines of JS. Still some kinks to iron out, coming to Ratchet 0.4.0 soon.
- Tweet from abhishek (@abhi1thakur): Can I run AutoTrain UI on Kaggle? Yes, you can!!! Check out my latest notebook, copy it, fill in your tokens and enjoy AutoTrain UI running on Kaggle Notebooks backend đ Link to notebook: https://www...
- Tweet from Vaibhav (VB) Srivastav (@reach_vb): Let's go!! Common Voice 17 - now on the Hub! đ„ With 31,000 hours of audio (& transcriptions) across 124 languages. *sound on đ¶* 847 hours of data were added in CV 17, along with 493 hours of ...
- Tweet from Brigitte đ€ (@BrigitteTousi): đCalling all journalists! With @fdaudens, we're excited to announce a new community on the @huggingface Hub: Journalists on Hugging Face. đ°đ€ https://huggingface.co/JournalistsonHF 1/
- Tweet from Vaibhav (VB) Srivastav (@reach_vb): Snowflake dropped a 408B Dense + Hybrid MoE đ„ > 17B active parameters > 128 experts > trained on 3.5T tokens > uses top-2 gating > fully apache 2.0 licensed (along with data recipe to...
- Tweet from Sayak Paul (@RisingSayak): Custom pipelines and components in Diffusers đž Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers? Found it inflexible? This đ§¶ is for y...
- Tweet from lunarflu (@lunarflu1): You can now mention people on @huggingface !
HuggingFace â· #general (208 messagesđ„đ„):
- Seeking LLM Observability Tools: A member requested advice on LLM observability tools, particularly interested in something compatible with LlamaIndex and favoring a self-hosted open-source option.
- API Interaction Assistance with huggingchat: An individual sought help for communicating with Hugging Face Chat via API calls, expressing a need for guidance.
- Offering Bounty for Gradio Expertise: A member expressed frustration over Gradio issues, offering a $200 bounty for quality assistance, with subsequent guidance to seek help in a Gradio-specific channel.
- Pinball AI Vision Model Discussion: A detailed conversation unfolded around developing an AI model to identify pinball games and scores, with discussions on complexity, tools, the necessity of image classification, and the feasibility of reusing existing models like llava for part of the solution.
- Computer Configuration for LLMs: A user looked for resources on DDR5 and CPUs performances specific to LLMs, considering a high-spec setup for their new computer. Other members chimed in with recommendations and personal experiences related to hardware choices for AI work.
- Zero GPU Explorerâs Membership Queries and Jokes: Chats indicated confusion over the Zero GPU Explorers membership and subscription status, along with members humorously attempting to ârizz upâ the Hugging Face developers using AI-related pick-up lines.
Links mentioned:
- Hugging Face: Here at Hugging Face, weâre on a journey to advance and democratize ML for everyone. Along the way, we contribute to the development of technology for the better.
- Tweet from Noa Roggendorff (@noaroggendorff): iykyk
- zero-gpu-explorers/README · The invited application has been waiting. How long does it take to be approved?: no description found
- amazon/chronos-t5-small · Hugging Face: no description found
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
- Image classification: no description found
- zero-gpu-explorers/README · Update README.md: no description found
- "I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hxđ Links- F...
- GitHub - amazon-science/chronos-forecasting: Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting: Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting - amazon-science/chronos-forecasting
- Personal Copilot: Train Your Own Coding Assistant: no description found
- LLM-Workshop/personal_copilot/training/train.py at main · pacman100/LLM-Workshop: LLM Workshop by Sourab Mangrulkar. Contribute to pacman100/LLM-Workshop development by creating an account on GitHub.
- Supervised Fine-tuning Trainer: no description found
HuggingFace â· #today-im-learning (2 messages):
- Enthusiasm for Learning: A member expressed excitement about sharing and receiving information in the channel, signaling a positive and collaborative learning environment.
- Seeking Finetuning Guidance: A query was raised about the best practices for creating an instruction dataset for finetuning Large Language Models (LLMs), indicating an interest in tailored dataset preparation for model enhancement.
HuggingFace â· #cool-finds (9 messagesđ„):
-
Deep Dive Into Deep Learning: The MIT Introduction to Deep Learning course, now updated for 2024, provides a foundational understanding of deep learning concepts. The lecture video is available on YouTube for anyone interested in the field.
-
Evaluation of Text-to-Image Models: Thereâs an upcoming talk on text-to-image model evaluation, where the speaker will discuss text-to-image alignment and model robustness.
-
Stallman Sings of Freedom: A YouTube video features Richard Stallman singing the âFree Software Songâ during an event in Ecuador. This peculiar moment can be found here.
-
Community Computer Vision Course Launch: Hugging Face has launched a community-driven course on computer vision accessible for everyone, including how to join the learner community, make submissions, and certification information. Start learning with their welcome page.
-
AI Safety Benchmarks Gain Focus: A LinkedIn post announces the LLM Safety LeaderBoard, a new platform measuring AI safety, security, and responsible AI practices. Find out more about the leaderboard here.
-
Discover 5 AI Tools through GenAI: A Medium piece titled âGenAI Adventures: 5 Interesting AI Tools Everyone Should Tryâ presents a curated list of AI Tools. Readers can explore these tools on Medium.
-
Constructing Intuitive RAG Applications: An article guides the creation of webloader RAG applications using Groq, Langchain, and Datastax featuring powerful capabilities. Interested readers can delve into these integrations on Medium.
-
Simplifying Database Queries with Machine Learning: An innovative approach is being developed to allow querying of a âpeople databaseâ with minimal SQL knowledge using RAG and Gemini. More details on the project can be found at Datai Alliance.
Links mentioned:
- Welcome to the Community Computer Vision Course - Hugging Face Community Computer Vision Course: no description found
- blog: no description found
- Richard Stallman Free software Song: Richard Stallman en Ecuador, cantando el temita, del free software, grabado por Julian Coccia.
- MIT Introduction to Deep Learning | 6.S191: MIT Introduction to Deep Learning 6.S191: Lecture 1*New 2024 Edition*Foundations of Deep LearningLecturer: Alexander AminiFor all lectures, slides, and lab m...
HuggingFace â· #i-made-this (13 messagesđ„):
- Model Release Dilemma: A post mentioned a dilemma involving the selection of one among five models to release, including an invitation for input or preference regarding which model should be launched next, and provided a LinkedIn post link for more context.
- Greetings from LifePal: A new AI-powered app named LifePal was introduced, which serves as a personalized guide to a well-balanced life and claims seamless integration with Apple Vision Pro. Itâs described as a life co-pilot and its perceivable benefits and features were showcased along with the Apple Store link.
- ChatGPTâs Norwegian Needs Work: A member highlighted the subpar performance of ChatGPTâs Norwegian translations, which necessitated reprocessing through a Retriever-Augmented Generator (RAG) with local slang, complemented by a mention of an alternative, NorskGPT-Mistral, designed for Norwegian language understanding and generation.
- Seeking Beta Testers for an Advanced Research Assistant and Search Engine: An offer was made to recruit beta testers for an advanced research assistant and search engine tool, providing 2 months free of premium service with various models including GPT-4 Turbo, Mistral Large and more. Interested parties were directed to Rubikâs AI with a promo code for the free premium offer.
- Innovative Inpainting SDXL on Hugging Face: A unique take on the inpainting tool named SDXL, allowing iterative inpainting on top of previous generations with version history, was shared. Feedback and example sharing were encouraged, and the inpainting tool can be found on Hugging Face.
Links mentioned:
- Inpainting SDXL Sketch Pad - a Hugging Face Space by tonyassi: no description found
- bineric/NorskGPT-Mistral-7b · Hugging Face: no description found
- âLifePal AI Chat & Assistant: âDiscover LifePal: your productivity AI companion. Are you ready to unlock your full potential and live a healthier, happier life? LifePal is here to guide you on your journey to becoming a better yo...
- GitHub - Lama-West/PnPR-GCN_ACM_SAC_24: Contribute to Lama-West/PnPR-GCN_ACM_SAC_24 development by creating an account on GitHub.
- Vinner - Nybygg i og rundt Bergen: Stor takk til SnĂžhetta
- GitHub - GDSC-FSC/gemini-node-1: Contribute to GDSC-FSC/gemini-node-1 development by creating an account on GitHub.
- Rubik's AI - AI research assistant & Search Engine: no description found
HuggingFace â· #reading-group (12 messagesđ„):
- Graphs and LLMs Reading Preparation: A member announces plans to review papers on large language models (LLMs) and their interaction with graphs, focusing on complex relationship representation and discussing the potential for a presentation the following Saturday.
- Additional Paper Surveys for Saturdayâs Session: The same member additionally considers reviewing two survey papers, one about LLMs applied to graphs, and another on foundation models, suggesting these topics may also be included but noting the need to avoid spreading too thin for future reading groups.
- Exploring Distillation of Score-Based Models: A chat participant inquires about resources on distilling score-based models, specifically models that reduce the number of generation steps required compared to classical SDE solver models.
- Guidance on Distillation Papers and Communities: A response is offered guiding the previous inquiry to the Laion and Eleuther servers where experts on model distillation congregate and suggesting leading researcher Gothos, with a mention of relevant papers in the fields of rectified flow and LCM Lora.
- Paper Reading Event Creation: An event is tentatively scheduled in the group, allowing for discussions on time adjustment, encouraging member participation in the upcoming reading and presentation on LLMs and graph interaction.
Links mentioned:
- Graph Machine Learning in the Era of Large Language Models (LLMs): Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural N...
- Join the Hugging Face Discord Server!: We're working to democratize good machine learning đ€Verify to link your Hub and Discord accounts! | 77552 members
- Large Language Models on Graphs: A Comprehensive Survey: Large language models (LLMs), such as GPT4 and LLaMA, are creating significant advancements in natural language processing, due to their strong text encoding/decoding ability and newly found emergent ...
- Towards Graph Foundation Models: A Survey and Beyond: Foundation models have emerged as critical components in a variety of artificial intelligence applications, and showcase significant success in natural language processing and several other domains. M...
HuggingFace â· #computer-vision (15 messagesđ„):
-
Balancing Accuracy and Efficiency: A member discussed the trade-off between computational efficiency and model accuracy when processing bounding boxes at original resolution. Another member suggested image preprocessing techniques like blurring to optimize VRAM usage.
-
Exploration of Image Segmentation Models: In seeking guidance for advancing in image segmentation, OneFormer, MaskFormer, Segformer were mentioned as part of the sequence of models a member has worked with.
-
Buddying Up for CNN Studies: A member expressed interest in finding a study partner for learning and working on Convolutional Neural Networks (CNNs).
-
Historical Contour Algorithms Meet Modern Preprocessing: Discussing YOLO architectures, a member recommended reviewing pre-YOLO/CNN image segmentation and contour finding algorithms, and mentioned that preprocessing and downsampling can still yield good results. Links to OpenCV documentation on morphological operations and image processing were shared: Morphological Operations, Table of Contents for Image Processing.
-
PyTorch vs TensorFlow for CNN Projects: Conversations touched upon whether to learn PyTorch or stick with TensorFlow, highlighting PyTorchâs momentum in the community and academia, and TensorFlowâs robust DevOps support from Google. The flexibility to create projects involving object detection and image segmentation using TensorFlow was reaffirmed.
Links mentioned:
- OpenCV: Image Processing in OpenCV: no description found
- OpenCV: Morphological Transformations: no description found
HuggingFace â· #NLP (3 messages):
- Seeking NLU/NLP Guidance: A new member is working on a chatbot using the Rasa framework, but is facing issues with intent recognition, where a generic sales inquiry is miscategorizing as a company-specific sales intent.
- Intent on Enhancing Intent Recognition: They are considering creating a custom NER model to identify specific keywords as intents (sales, purchases, etc.) and using company names from their database as NER-company to improve their chatbotâs performance.
HuggingFace â· #diffusion-discussions (4 messages):
- Realism Challenge with Hyper-SD and IP-Adapter: A user shared an issue with not getting realistic results when using Hyper-SD with the IP-Adapter. They provided a discussion link to the GitHub where the issue was elaborated.
- Surprised by Inconsistent Results Across Models: A person was perplexed after switching from Seaart to A1111, only to find that the color and shadow quality of the images changed despite the same settings and seed being used. They inquired about any backend differences and whether it was possible to achieve uniform results on both models.
- DeepFloydâs Unpredictable Behavior: According to a user, DeepFloyd exhibits odd patterns when tweaking step count, sampler, and CFG. They compared it to the Ambigram research model and provided insights into the performance of different settings, particularly the DPM++ 2M scheduler.
Link mentioned: Not getting good realistic results with Hyper-SD + IP-Adapter · huggingface/diffusers · Discussion #7818: Hi everyone, (maybe you @asomoza know about this?) Does hyper-sd works well with IP-Adapter? I am testing hyper-sd in Diffusers as explained in the repo. I thought that I was going to get better reâŠ
HuggingFace â· #gradio-announcements (1 messages):
- Gradio Share Server Troubles: Gradio has experienced issues with the Share Server that might affect sharing and usage on Colab. Theyâre actively investigating and resolving the problem; updates are available at their status page.
- Check Gradioâs Health Anytime: For an overview of Gradioâs operational status over the past 90 days, including the last 24 hours, week, and month, refer to their calendar view.
- Clear Skies for the Past Week: There have been no status updates in the last 7 days, indicating no new incidents. Historical status updates can be checked on the status update history page.
Link mentioned: Gradio Status: no description found
OpenRouter (Alex Atallah) â· #app-showcase (3 messages):
- OpenRouter Exploring Syrax: Alex Atallah indicated the start of experimenting with Syrax and offered support to the team, proposing to organize a group chat.
- Collaboration Accepted with Enthusiasm: Mart02 acknowledged and appreciated the outreach from Alex, signaling the beginning of a collaborative effort by accepting the friend request.
OpenRouter (Alex Atallah) â· #general (240 messagesđ„đ„):
-
Frontend Quest for Non-Technical Deployments: A member inquired about a multi-user frontend that could be deployed on shared hosting without the need for Docker or Node.js. LibreChat was recommended as the most suitable option, but another member mentioned hosting challenges and cost concerns, leading to a suggestion of Vercelâs free tier hosting as a potential solution.
-
Comparisons and Anticipation for LLMs: There was a vigorous discussion about various large language models, including Llama-3 8B, Dolphin 2.9, and Mixtral-8x22B. Users shared insights on model capabilities, such as context window size and the likelihood of models being censored based on their conversation styles and datasets.
-
Model Training Adventures: A user shared their journey trying to train a model to become more âunhingedâ by using their own toxic dataset. Comparisons were made between the behavior of different models, and a discussion on whether LLMs could handle large contexts effectively, with a consensus that while models like Llama 3 8B could manage long contexts, their performance might degrade beyond a certain point.
-
Affordable Model Experiments and Discoveries: Members discussed options for cost-effective yet efficient models available on the OpenRouter platform. Mixtral-8x7B-Instruct was highlighted as a reasonable balance between price and performance, with one user expressing surprise at the improved output quality of GPT-3.5, resembling more human-like writing.
-
OR Functionality in Fixing Message Order: There was a query regarding Claude 3âs handling of the order of assistant/user messages. It was confirmed that OpenRouter automatically corrects ordering to ensure the models work correctly, and users are encouraged to report any ordering issues they might encounter.
Links mentioned:
- Home | ChatGPT Web Share Docs: no description found
- Google Colab: no description found
- Google Colab: no description found
- jondurbin/cinematika-7b-v0.1 · Hugging Face: no description found
- lmsys/lmsys-chat-1m · Datasets at Hugging Face: no description found
- TheBloke/psyonic-cetacean-20B-AWQ · Hugging Face: no description found
- maywell/Llama-3-8B-Instruct-1M · Hugging Face: no description found
- Tweet from Eric Hartford (@erhartford): dolphin-2.9-llama3-8b-256k is released. It is dolphin-2.9-llama3-8b with @winglian's awesome 256k context adapter applied. I will get the model card done today.
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
- cognitivecomputations/dolphin-2.9-mixtral-8x22b · Hugging Face: no description found
- gpt2-chatbot: This page is a work in progress. Its conclusions are likely to change as more information is collected. News as of 2023-04-30: gpt2-chatbot is extremely likely to run on a server operated by, or assoc...
- Clay - Scale personalized outbound: Combine 50+ data providers, real-time scraping, and AI to send 1-1 personalized campaigns that book more meetings.
- jondurbin/cinematika-v0.1 · Datasets at Hugging Face: no description found
- Cinematika 7B (alpha) by openrouter | OpenRouter: This model is under development. Check the [OpenRouter Discord](https://discord.gg/fVyRaUDgxW) for updates.
- Managed Server: Dein eigener Server, zuhause in der Schweiz: no description found
LlamaIndex â· #blog (4 messages):
-
Advanced RAG Reference Architecture Revealed: The LlamaIndex team presents a reference architecture for building advanced RAGâRetrieval-Augmented Generationâsystems within the AWS ecosystem. This resource provides guidance on advanced parsing and agentic reasoning, and itâs available through the shared code repository.
-
Hackathon Winners Develop Documentation Bot: Team CLAB, winners of a recent hackathon, crafted a full-stack documentation bot that integrates LlamaIndex for parsing and orchestrating, along with Nomic embeddings. More details on the project and the hackathon can be found in the linked blog post.
-
Creating Financial Assistants with Agentic RAG: A new development allows for building financial assistants capable of handling complex calculations, such as percentage evolution and CAGR, directly over unstructured financial reports. A recent post explains how this can be done without requiring human data transformation steps.
-
Building Efficient RAG with Semantic Caching: In collaboration with @Redisinc, @tchutch94, and @seldo showcase how to build high-performance RAG applications that incorporate semantic caching to expedite frequently made queries. This innovation is aimed at enhancing quality, efficiency, and cost-effectiveness as discussed in the collaboration piece.
Link mentioned: no title found: no description found
LlamaIndex â· #general (159 messagesđ„đ„):
-
Anticipation for Assistant Agent V2: Members are inquiring about an update or release of LlamaIndex OpenAI Assistant Agent V2 to take advantage of features in the new OpenAI Assistant V2. Currently, there is no specific update or pull request for this version.
-
Updating Pinecone Indices Query: Instructions for updating an index part in Pinecone are not well-documented. While members suggested using methods like
pinecone_index.update
, no direct examples withSimpleDirectoryReader
were provided in the LlamaIndex knowledge base. -
Tool Preference for LLM Observability: Thereâs a discussion on the best LLM observability tools between Arize Phoenix and Langfuze. A member suggested that both tools provide detailed insights, but no clear preference was indicated.
-
LlamaIndex YouTube Resources: Users sought recordings of the LlamaIndex Webinar, and one member suggested checking the LlamaIndex YouTube channel, as well as other platforms like X space and LinkedIn for the latest webinars.
-
Async Calls with AzureOpenAI: A member posed a question regarding async calls with AzureOpenAI in LlamaIndex and received instructions for using
acomplete
,astream_complete
,achat
, andastream_chat
async methods. The benefits of using async methods, such as speed improvements from parallel execution and non-blocking tasks, were highlighted.
Links mentioned:
- Summary and Resources: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
- LlamaIndex: Official YouTube Channel for LlamaIndex - the data framework for your LLM applications
- Typesense Vector Store - LlamaIndex: no description found
- "I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hxđ Links- F...
- Frequently Asked Questions (FAQ) - LlamaIndex: no description found
- answerbot/answerbot/replay_client.py at main · zby/answerbot: answering questions using LLMs, search (RAG) and other tools - example code - zby/answerbot
- Function Calling Program for Structured Extraction - LlamaIndex: no description found
- Retriever - LlamaIndex: no description found
- GitHub - zby/LLMEasyTools: Tools for LLM agents.: Tools for LLM agents. Contribute to zby/LLMEasyTools development by creating an account on GitHub.
- OpenAI - LlamaIndex: no description found
- Metaphor - LlamaIndex: no description found
- Auto-Retrieval from a Vectara Index - LlamaIndex: no description found
- GitHub - run-llama/llamabot: Contribute to run-llama/llamabot development by creating an account on GitHub.
- Context - LlamaIndex: no description found
- Query Pipeline with Async/Parallel Execution - LlamaIndex: no description found
- Query Pipeline with Async/Parallel Execution - LlamaIndex: no description found
- Parallelizing Ingestion Pipeline - LlamaIndex: no description found
LlamaIndex â· #ai-discussion (1 messages):
- A Look Back at GPT-1: A member shared a blog post exploring the original GPT-1 model from OpenAI, highlighting its enduring influence on current LLMs like Mistral-7B. The post dives into GPT-1âs architecture, including positional embeddings and Conv1D usage, and shows a screenshot of Alec Radfordâs tweet about this groundbreaking NLP technique.
Link mentioned: Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1âs Contribution to the Development of Modern LLMs
Eleuther â· #general (25 messagesđ„):
-
Searching for Community Projects Seeking Volunteers: A member inquired about resources to find community projects in need of volunteers, particularly those that offer a compute budget due to the memberâs lack of personal GPU resources.
-
Understanding Orthogonal Keys in AI: A nuanced explanation was provided for a process termed âclear-ingâ in the context of AI keys and states, using the example of orthogonal keys and how they behave in equations to explain memory updating in models.
-
Intricacies of Infini-Attention and Compressive Memory: A dialogue took place around the concept of infini-attention and its perceived overhype, with a reference to a delta rule in compressive memory from 2021 and skepticism about its testing thus far. The discussion included a request for and provision of a relevant research paper.
-
Performance Comparison Puzzles the Community: Members engaged in discussions on the reasons behind the slower performance of mixtral 8x22B as compared to llama 3 70B on fireworks.ai, touching on aspects like batching, utilization, and speeds in relation to MoEs and mixtral having more parameters but fewer layers.
-
Invitation to Stanford CS25 Transformers Social Event: An announcement for a Stanford CS25 Transformers social event at EVGR Pub & Beer Garden was made, giving details on the event, a call for RSVPs, and information about a related talk on campus. An invitation was extended to the Discord community to attend the in-person talk about Transformers or join via Zoom, with links provided to the RSVP form and event details.
Links mentioned:
- Effort Engine: A possibly new algorithm for LLM Inference. Adjust smoothly - and in real time - how many calculations you'd like to do during inference.
- Linear Transformers Are Secretly Fast Weight Programmers: We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``f...
- no title found: no description found
- Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...
- Discord | Your Place to Talk and Hang Out: Discord is the easiest way to talk over voice, video, and text. Talk, chat, hang out, and stay close with your friends and communities.
- Reddit - Dive into anything: no description found
Eleuther â· #research (105 messagesđ„đ„):
-
Long Context Challenge Addressed: The Information-intensive (IN2) training proposal aims to improve Large Language Modelâs (LLMâs) use of lengthy contexts. It involves a synthetic dataset requiring models to integrate information from various segments in long texts to overcome the âlost-in-the-middleâ issue.
-
Emergent Abilities Linked to Pretraining Loss: A Twitter post discusses findings that emergent abilities in models can be correlated with pretraining loss. Unlike compute, pretraining loss can better reflect model performance by considering dataset quality and architectural factors.
-
Dissecting Model Biases: A discussion highlighted the difficulty of tracing specific biases, like a number preference, back to changes in model weights. As biases may arise during continual training, members note the potential need to implement tools to analyze these shifts for verification.
-
Debating LLMs as Black Boxes: Conversations revolved around whether LLMs should be considered black boxes, given our limited understanding of their internal mechanisms. It was argued that, while we understand some aspects of LLMs, their reasoning cannot be trusted as explanations are post-hoc and may not reflect true internal processes.
-
Data Leakage Detection in LLMs: A message links to a paper introducing a detection pipeline to identify potential data leakage in LLM benchmarks, highlighting issues with training and test set misuse (PDF). The findings aim to foster fair comparisons and healthier development in the AI field.
Links mentioned:
- VideoGigaGAN: no description found
- NExT: Teaching Large Language Models to Reason about Code Execution: A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debu...
- Make Your LLM Fully Utilize the Context: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We ...
- Tweet from Jason Wei (@_jasonwei): Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: https://arxiv.org/abs/2403.15796 ...
- Benchmarking Benchmark Leakage in Large Language Models: Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion...
- Faster Convergence for Transformer Fine-tuning with Line Search Methods: Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we su...
- VideoGigaGAN: Towards Detail-rich Video Super-Resolution: Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as the...
- Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit ...
- Sequential predictive learning is a unifying theory for hippocampal representation and replay: The mammalian hippocampus contains a cognitive map that represents an animal's position in the environment and generates offline "replay" for the purposes of recall, planning, and forming lo...
Eleuther â· #lm-thunderdome (3 messages):
- Custom Function for Distinct Prompts: A member discussed the possibility of passing distinct prompts based on a model in a single task, suggesting the use of a custom
!function
for implementation. - BitsAndBytes Oddity with 8bit: One user observed that using BitsAndBytes 4bit encoding worked well with llama3-70b, but switching to 8bit encoding yielded poor results, describing the output as âabsolute garbageâ.
- 8bit Encoding Issue with llama3-8b: The same member noted a similar issue when using 8bit encoding on llama3-8b, indicating consistent problems with 8bit across different models.
LAION â· #general (113 messagesđ„đ„):
- AI Birthday Bungle Sparks GDPR War: An EU privacy activist has filed a GDPR complaint against AI models after a model incorrectly guessed his birthday. He argues this error could potentially lead to the banning of AI models in the EU.
- New GPT Surprise Rumors Circulate: Discussion revolves around an alleged stealth release of a GPT-5 model, with speculation based on performance and refusal to hallucinate in tests, although confusion abounds due to no official leaderboard inclusion and contradictory test responses.
- Performance Queries for Llama3 70B: Concerns were raised about the seemingly low token generation speed of 13 tokens per second on a dual 3090 setup for a Llama3 70B model, leading to discussions on potential hardware optimizations and model configuration tweaks.
- Exllama: The Underrated Speedster: Users discuss the performance superiority of exllama over other libraries for LLM tasks, recommending the use of TabbyAPI repo for easier setups.
- Debates Over LMSYSâs Leaderboard Transparency: Members express doubts about the objectivity of LMSYSâs leaderboard, raising concerns about potential conflicts of interest between scientific evaluation and commercial enterprises, as well as calling for more transparency and the ability to filter by open weights.
Links mentioned:
- LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation | LMSYS Org: <h2><a id="our-mission" class="anchor" href="#our-mission" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link&...
- ChatGPTâs hallucinations draw EU privacy complaint: Activist demands regulators launch probe over ChatGPTâs wild guess on his date of birth.
- lmsys/lmsys-chat-1m · Datasets at Hugging Face: no description found
LAION â· #research (12 messagesđ„):
- OpenCLIP Fine-Tuned for Cardiac Ultrasound: A member shared the publication of their research on fine-tuning OpenCLIP for cardiac ultrasound, available here. Despite numerous challenges and an extensive revision process, they expressed relief at its completion.
- Echoes of Exhaustion: The member also conveyed their readiness to move beyond the demanding project, humorously noting the scuffed zero-shot techniques used and their lack of familiarity with the multimodal AI world at the projectâs onset.
- Stable Diffusion Community Reopens: A link to a GitHub repository for training CLIP separately from U-Net was shared alongside news of /r/StableDiffusion reopening after protesting Redditâs open API changes. Additional details and a discussion forum can be found at this Reddit post.
Links mentioned:
- Visionâlanguage foundation model for echocardiogram interpretation - Nature Medicine: A visionâlanguage foundation model, trained on a dataset of more than 1 million echocardiogram videoâtext pairs, is able to assess various cardiac structural and functional parameters desp...
- Reddit - Dive into anything: no description found
OpenAI â· #annnouncements (2 messages):
-
ChatGPT Plus Integrates Memory Feature: Memory is now available to all ChatGPT Plus users, allowing them to tell ChatGPT what to remember by starting a new chat. This feature can be enabled or disabled in settings and is yet to roll out in Europe or Korea.
-
Enhanced Data Control for Users: ChatGPT Free and Plus users can now access their chat history even if they have opted out of contributing data for model improvement. Additionally, a new Temporary Chat feature allows for conversations that wonât be saved in the userâs chat history.
OpenAI â· #ai-discussions (81 messagesđ„đ„):
- Exploring AI Curiosity and Sentience: A user detailed their curiosity test involving ChatGPT handling a zip file with a maze. Some discussion followed on how to measure AIâs potential for curiosity and its relation to sentience, but consensus on these concepts remains elusive.
- DragGAN Sparks Interest: A member discovered DragGAN, a tool that manipulates photos to change angles and poses, fueling a discussion about AIâs ability to recreate images from new perspectives without full models.
- Llama-3 8B Extends Context Capability: An interesting reveal occurred with Llama-3 8B Instruct Gradient-1048k, showing how state-of-the-art language models can operate on long-context information; the model is available at Hugging Face.
- Debating the Accessibility of Advanced AI Tools: Discussions surfaced about OpenAIâs policy on free access to new features like DALL-E, with some users questioning why more advanced tools arenât also free and pondering the potential for OpenAI to provide a student discount.
- Potential Collaboration Between LLMs: One user inquired about the possibility of having two language models like ChatGPT and Claude Opus collaborate on writing a paper, provoking suggestions about using third-party services to manage multi-model interactions.
Links mentioned:
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
- Don't ask to ask, just ask: no description found
- Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold: no description found
OpenAI â· #gpt-4-discussions (11 messagesđ„):
-
Size Matters in Model Performance: A comparison is highlighted between GPT-4 and its predecessor, with GPT-4 identified as âmuch larger than 3.5â.
-
Speed Expectations Challenged for GPT-4: One member questions the expectation that GPT-4 would be faster, considering its larger size compared to the previous models.
-
Request for AI Security Project Assistance: A member named abhibetter asks for help regarding AI application in a security project but doesnât provide details about the specific issues or questions they have.
-
Exploring GPT-2 Performance: Member namenot223_69478 inquires if anyone has experimented with GPT-2 on chatlmsys, with another guiding to a different channel for an in-depth discussion.
-
Dealing with Bulk Deletion of Chat Archives: silensu is seeking advice on how to handle the accidental archiving of numerous chats, questioning the possibility of mass deletion.
OpenAI â· #prompt-engineering (15 messagesđ„):
-
Million Dollar Prompt Competitions Proposed: A member suggested organizing prompt engineering competitions with significant cash prizes to stimulate learning and sharing best practices within the community. They envision both paid and free âplaygroundâ competitions, creating a gamified environment that rewards positive collaboration and practical achievements in prompt crafting.
-
Meta Prompting Paves the Way: In the discussion about improving prompt crafting, it was noted that âmeta promptingâ is an effective method, as employed by GPT Builder, where the AI adjusts context and conversation based on user instructions to optimize results.
-
Challenges of Negative Prompting in AI: Users discussed the inefficacy of negative prompting when instructing AI, explaining that highlighting prohibited words can lead to inconsistency and less effective results compared to positive examples and instructions.
-
Navigating Localized Language for AI Tasks: A user grappled with adapting AI-generated text for regional language variants, in particular Argentinian Spanish, where certain words have different connotations. Options like reframing the project and providing specific substitutions for regional words were discussed to better tailor outputs despite a large list of prohibited words.
OpenAI â· #api-discussions (15 messagesđ„):
-
Prompt Engineering with Competitions: A member proposed having prompt competitions to improve prompt engineering skills. Competitions would range from no-code challenges, where the AI processes data to extract information, to interactive tasks like navigating text-based games, and would include community discussions and knowledge sharing.
-
Meta-Prompting over Competitions: One participant suggested using meta-prompting, a method where the AI assists in crafting better prompts, which could potentially replace the need for competitions. This indicates a trend towards users attempting to streamline the prompting process via GPT Builder.
-
GPT Builder and Meta Prompting in Action: Discussion highlighted that GPT Builder operates on meta prompting, with the AI making context and conversation adjustments based on user requests, hinting at documentation for optimized prompting tactics.
-
Positive Prompting Favored Over Negative: In addressing a problem with unwanted language generation, itâs advised to use positive instructions and examples in prompts rather than specifying prohibited words. Suggestions included creating prompts that reinforce preferred terms and explaining usage within particular dialects.
-
Navigating Multilingual Nuances: Confronting the multilingual challenge, a user expressed difficulties in constructing prompts for variants of Spanish, where words may have different connotations across regions. Strategies to refine AI language output include rephrasing the project or explicitly pairing prohibited words with their desired alternatives.
OpenAccess AI Collective (axolotl) â· #general (25 messagesđ„):
- LLaMA 3 Sensitive to Quantization: A discussion highlighted that LLaMA 3 experiences more degradation from quantization than LLaMA 2, likely due to its training on a record 15T tokens which allowed it to capture extremely nuanced data relationships.
- LLaMA 3 Tokenization Troubles: There was an issue mentioned with llama-3 not generating a beginning-of-sentence (BOS) token, but was resolved by adding the BOS into the chat template manually.
- Critique of Quantization Sensitivity Study: The community discussed a study on quantization sensitivity, suggesting that it is linked to model training methods rather than just the size of the model, with a member describing a related arXiv paper as âworthless.â
- Llama-3 Extends Context Length: The Llama-3 8B Gradient Instruct 1048k model was mentioned, which extends the modelâs context length significantly and was developed by Gradient with compute sponsorship from Crusoe Energy, detailed on huggingface.co.
- BOS Requires Template Tweaks: Encountering issues with the LLaMA-3 modelâs BOS token generation, it was noted that altering the tokenizer alone wasnât enough and that the BOS needs to be included in the chat template to appear.
Links mentioned:
- Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine: Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabil...
- Tweet from Rohan Paul (@rohanpaul_ai): Quantization is quite harmful for LLaMA 3 than for LLaMA 2. This PR in llama cpp repo investigates it well. (Perplexity measures how well the model can predict the next token with lower values being...
- gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
OpenAccess AI Collective (axolotl) â· #axolotl-dev (7 messages):
- Exploring Huggingfaceâs ZeroGPU: A member mentioned they have gained access to the Huggingface Zero project, inviting anyone to suggest tests to conduct using this new platform.
- ZeroGPU Provides Free Multi-GPU Access: They shared information about ZeroGPU which is a beta feature on Huggingface that offers free GPU access and the ability to run Spaces on multiple GPUs, using Nvidia A100. ZeroGPU optimizes GPU utilization by efficiently allocating and releasing resources as needed.
- Missed Opportunities: A couple of members expressed regret for not signing up for the ZeroGPU project earlier to take advantage of the early access for PRO subscribers.
Link mentioned: zero-gpu-explorers (ZeroGPU Explorers): no description found
OpenAccess AI Collective (axolotl) â· #general-help (11 messagesđ„):
-
Llama-3-70B Finetuning in Question: A member is advised that fine-tuning
meta-llama/Meta-Llama-3-70B-Instruct
might degrade its performance since itâs already fine-tuned. Itâs recommended to start with an 8B model before moving to the more complex 70B. -
Dataset Format Conversion Guide: Members suggested a simple method to convert a fine-tuning dataset from OpenAIâs format to ShareGPTâs format; replace âmessagesâ with âconversationsâ, âroleâ with âfromâ, âcontentâ with âvalueâ, âuserâ with âhumanâ, and âassistantâ with âgptâ.
-
Fine-Tuning Learning Path Recommended: An experienced community member recommends beginners to fine-tune smaller models such as an 8B before attempting to fine-tune larger models like the 70B.
-
Dataset Transformation Done Easily: Python code was provided to facilitate the transformation of data from the given format into the one required by ShareGPT using a dictionary for role mapping and list comprehension.
Link mentioned: Axolotl - Conversation: no description found
OpenAccess AI Collective (axolotl) â· #rlhf (1 messages):
gbourdin: add to my bookmarks. Thanks for this !
OpenAccess AI Collective (axolotl) â· #community-showcase (2 messages):
- Axolotl Fine-Tuning Made Easier: A member shared a tutorial that guides users on fine-tuning
axolotl
usingdstack
, an open-source orchestrator that works with any cloud or pool of on-prem machines. The tutorial was contributed by anaxolotl
user. - Community Approves: Another member expressed appreciation for the tutorial, mentioning that it looks easy to follow.
Link mentioned: dstack/examples/fine-tuning/axolotl/README.md at master · dstackai/dstack: An open-source container orchestration engine for running AI workloads in any cloud or data center. https://discord.gg/u8SmfwPpMd - dstackai/dstack
OpenAccess AI Collective (axolotl) â· #axolotl-help-bot (10 messagesđ„):
-
LoRA vs QLoRA Clarified: The main distinction between LoRA and QLoRA is that while LoRA focuses on model adaptation via low-rank matrices, QLoRA combines this with quantization for further optimized deployment. LoRA adapts pre-trained models efficiently; QLoRA takes it a step further for resource-constrained environments.
-
Trimming Axolotl Datasets to a Percentage: Trimming datasets in the Axolotl configuration to use a specific percentage isnât a built-in feature, and would require preprocessing or alterations to the dataset loading script. The use of
DPODataset
could be modified with subsampling logic during dataset loading. -
Equating GPU and Micro Batch Sizes: It was questioned whether using 4x GPU & Micro Batch Size 4 is equivalent to 8x GPU & Micro Batch Size 2 for final output. No specific answer was given in the channel discussion.
Links mentioned:
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess AI Collective (axolotl) â· #axolotl-phorm-bot (39 messagesđ„):
-
Command-R Model Fine-tuning: Members discussed fine-tuning the command-r model within Axolotl. A user shared an untested pull request for adding command-r to Axolotl, but noted that itâs untested and a merger is not yet recommended.
-
Format Adaptation for command-r: When inquired about using command-râs instruct format, a suggestion was made to use
input_output
formats and pre-prepare them with the correct tokens. A more comprehensive guide on implementing uncommon formats is available in the input_output documentation. -
Sample Packing Feature Uncertainty: There is confusion regarding the implementation of the sample packing feature which packs small examples into larger ones for Axolotl. While the feature is desired by some users, it appears to necessitate modifications outlined in an untested pull request.
-
Inexperienced with runpod Template: A user expressed uncertainty on integrating patch changes due to unfamiliarity with the runpod template. No clear solution was provided in the thread.
-
Unclear Support for phi-3 Format: A user queried about Axolotlâs support for phi-3 format, but the bot response suggested that phi-3 is not supported according to the current documentation. The compatibility of various models including phi with different features is listed, but phi-3 is not specifically mentioned.
Links mentioned:
- Feat: Add cohere (commandr) by NanoCode012 · Pull Request #1547 · OpenAccess-AI-Collective/axolotl: Description Motivation and Context How has this been tested? Untested! Screenshots (if appropriate) Types of changes Social Handles (Optional)
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
- axolotl/README.md at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
- OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
Latent Space â· #ai-general-chat (80 messagesđ„đ„):
-
Exploring Memory for Autonomous Agents: A discussion touched on a GitHub project called Memary, which has been created to serve as long-term memory for autonomous agents. The conversation clarified that while a knowledge graph might be used, Memary primarily functions through similarity searches over documents.
-
Debate on Mysterious GPT-2 Chatbot: Conversation sparked around a perplexing GPT2-chatbot with gpt4-level capabilities, featured on lmsys. Despite various analyses and speculations, the true origin or nature of this model remains unclear, with one possibility being a finetuned version of OpenAIâs original GPT-2.
-
Open-Source AI Faces Big Tech: A blogpost from Prime Intellect highlighted the challenges for open-source AI development in competing with closed-source counterparts who use large, interconnected GPU clusters. The post elaborates on decentralized training as a potential solution for open-source progress.
-
Discussion on Roles of Agents and LLMS: A deep discussion took place regarding the conflation of autonomous agents with large language models (LLMs). The Talk illustrated a shift in framework towards using âmodulesâ for concurrently built shared context/memory for reasoning and planning, rather than expecting LLMs to function as standalone autonomous units.
-
Learning AI Foundations and Skills: A user inquired about ways to learn AI from the ground up, seeking to understand basic concepts without committing to a specific field. Other members provided resources including YouTube tutorials on neural networks, introductory courses on AI engineering, and guidance on prompt engineering.
Links mentioned:
- Tweet from Alex Reibman đïž (@AlexReibman): OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Ever since OpenInterpreter, we've all been wondering just how effective agents can be if you give them a...
- AI Engineering 101 and 201 Workshops: from AI Engineer Summit 2023
- Tweet from lmsys.org (@lmsysorg): hi @simonw, thanks a ton! We really value your feedback. Just to clarify, following our policy, we've partnered with several model developers to bring their new models to our platform for communi...
- Learn Prompting: Your Guide to Communicating with AI: Learn Prompting is the largest and most comprehensive course in prompt engineering available on the internet, with over 60 content modules, translated into 9 languages, and a thriving community.
- GPT-2?: Background https://chat.lmsys.org provides blind-tested user benchmarks for LLMs (and some MLLMs). One of the models recently available is GPT2-chatbot, which demonstrates capability greatly beyond an...
- State-of-the-art in Decentralized Training: This post explores various novel decentralized training approaches and how they can enable effective AI model training across globally distributed GPUs.
- Prompt Engineering Roadmap - roadmap.sh: Step by step guide to learn Prompt Engineering. We also have resources and short descriptions attached to the roadmap items so you can get everything you want to learn in one place.
- Tweet from mephistoooOOHHHHHHSHI- (@karan4d): Ok itâs definitely using GPT-4 tokenizer so Iâm betting it is 4.5 as well. Always fingerprint w anomalous tokens
- Tweet from lmsys.org (@lmsysorg): hi @simonw, thanks a ton! We really value your feedback. Just to clarify, following our policy, we've partnered with several model developers to bring their new models to our platform for communi...
- Tweet from albs â 3/staccs (@albfresco): my guess is this mysterious 'gpt2-chatbot' is literally OpenAI's gpt-2 from 2019 finetuned with modern assistant datasets. in which case that means their original pre-training is still am...
- Tweet from mephistoooOOHHHHHHSHI- (@karan4d): Ok itâs definitely using GPT-4 tokenizer so Iâm betting it is 4.5 as well. Always fingerprint w anomalous tokens
- Tweet from Mark Huang (@markatgradient): 1M context length Llama-3 8B Model. Enough said. Up on HF @ClementDelangue cc: @winglian @mattshumer_ âïž Quoting Gradient (@Gradient_AI_) We've been in the kitchen cooking đ„ Excited to ...
- Tweet from Marques Brownlee (@MKBHD): NEW VIDEO - Rabbit R1: Barely Reviewable https://youtu.be/ddTV12hErTc This is the pinnacle of a trend that's been annoying for years: Delivering barely finished products to win a "race" ...
- GitHub - xlang-ai/OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - xlang-ai/OSWorld
- GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.: Longterm Memory for Autonomous Agents. . Contribute to kingjulio8238/memary development by creating an account on GitHub.
- Ep. 8 â ColBERT + ColBERTv2: late interaction at a reasonable inference cost: Andrew Yates (Assistant Professor at the University of Amsterdam) and Sergi Castella (Analyst at Zeta Alpha) discus the two influential papers introducing Co...
- But what is a neural network? | Chapter 1, Deep learning: What are the neurons, why are there layers, and what is the math underlying it?Help fund future projects: https://www.patreon.com/3blue1brownWritten/interact...
- Tweet from Jesse Lyu (@jessechenglyu): get your r1 update to the latest version now - we addressed most of the issues we found so far and more fix/improvements incoming! idle battery life up to 5x better now. âïž Quoting rabbit inc. (@rabb...
OpenInterpreter â· #general (21 messagesđ„):
- Question on Launching OS Mode with Local Vision Model: A member asked how to start OS mode with a local vision model to try Moondream, but reported getting gibberish with the command
interpreter --os --local
. - Discussion on Model Functionality: Another user mentioned using
llava
months ago and confirmed that it is possible to get a description of an image through OpenInterpreter without executing custom code. - Integration Update for OpenInterpreter: A member announced they managed to integrate all OpenInterpreter outputs into MagicLLight, with a pull request to OpenInterpreter planned for
stream_out
function hook andexternal_input
. Code release for MagicLLight and AAA+ is expected after some cleanup. - OpenInterpreter on Budget Hardware: The feasibility of running OpenInterpreter smoothly on a BeepyBerry-Raspberry Pi Zero was questioned, with a link to a related YouTube video.
- Seeking Debugging Assistance for Bad Startups: A user sought help for debugging a bad startup, indicating the errors were vague. They were directed to share the errors so that the community could assist in troubleshooting.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
OpenInterpreter â· #O1 (20 messagesđ„):
- Push Button Code Success: A member successfully resolved an issue related to an external push button not reacting by updating the
ButtonChecker
code and wiring the button to pin 25, offering a snippet of the revised code. Their efforts were confirmed to be working by another community member. - Speaker Connection Stability: In another hardware related fix, it was recommended to use hot glue to secure the speaker wires and reduce stress on the connections when interfacing with pins for a project.
- Raising Speaker Volume Inquiry: A query was raised on how to increase the volume on speakers, with suggestions to try M5Unified or potentially use an external amplifier.
- Youtuber Reviews Debated: There was a discussion about the relevance of YouTuber reviews of AI products like AI pins and R1, questioning if tech reviewers like MKBHD and Dave2d fully grasp the AI space, which is different from reviewing consumer electronics like phones or laptops.
- 01 Light Hardware with OS Mode: A member sought assistance on getting OS mode to work with the current version of the 01 light hardware mentioning successful connectivity to Mac but without access to the screen.
Links mentioned:
- no title found: no description found
- Rabbit R1: Barely Reviewable: AI in a Box. But a different box.Get a dbrand skin and screen protector at https://dbrand.com/rabbitMKBHD Merch: http://shop.MKBHD.comTech I'm using right no...
tinygrad (George Hotz) â· #general (10 messagesđ„):
- Tinygrad Inquiry: A user asked what tinygrad is, and another member provided a link to the tinygrad GitHub repository defining it as a project that those who like PyTorch and micrograd will love.
- Discord Discovery Mystery: One member voiced curiosity about how another stumbled upon the Discord server, to which the latter replied uncertainly, indicating a lack of knowledge about their discovery method.
- Seeking Bounty Guidance: A user sought help for two bounties involving âMean of symbolic shapeâ and âSymbolic arrangeâ and was looking for references to understand and solve them.
- Backward Pass Optimization Issue: A member was investigating issue #3572 related to backward passes with 2 reduce operations and inquired about how to generate graph diagrams to illustrate the problem.
- Graph Diagram Generation for Tinygrad: In response to a query about generating graph diagrams to address a backward pass issue, a member mentioned the use of
GRAPH=1
, suggesting the use of an environment variable to facilitate this task.
Links mentioned:
- tensor variable by geohot · Pull Request #4362 · tinygrad/tinygrad: no description found
- GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž: You like pytorch? You like micrograd? You love tinygrad! â€ïž - GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž
tinygrad (George Hotz) â· #learn-tinygrad (29 messagesđ„):
- Exploring TinyGradâs Learning Resources: Members discussed resources for learning AI development with TinyGrad; links to MicroGrad GitHub repository and MiniTorch were shared, with MiniTorch highlighted as a teaching tool for understanding deep learning systems.
- TinyGrad Quick Start Guidance Shared: A user recommended the âtinygrad Quick Start Guideâ for anyone looking to learn AI, especially with TinyGrad, as it provides a basic overview of the high-level API that TinyGrad offers for model development.
- Symbolic Mean Bounty Challenge in TinyGrad: Discussions revolved around implementing a symbolic mean operation in TinyGrad, with considerations about LazyBufferâs need to handle data of type Variable and whether it should allocate memory.
- Pull Request for Symbolic Execution in TinyGrad: A link to a previous pull request was shared to illustrate the mechanism for symbolic code generation and execution in TinyGrad, hinting at how variable caching might be useful for operations like
sum
andmean
. - Developing Symbolic Mean with Variables: The conversation continued with the development of symbolic mean, focusing on the need to represent tensor lengths symbolically and the potential for
Const
to support variables in the input buffer. Links to a comparision of master and feature branch on GitHub, tinygrad symbolic-mean-var-pull, and further GitHub changes by gh were shared as part of solving this challenge.
Links mentioned:
- Quickstart - tinygrad docs: no description found
- Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž - Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad
- Comparing 86d90511cee2^...97a2d44d9840 · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž - Comparing 86d90511cee2^...97a2d44d9840 · tinygrad/tinygrad
- GitHub - unknownusername504/MicroGrad: Contribute to unknownusername504/MicroGrad development by creating an account on GitHub.
- MiniTorch: no description found
- rename Scalar to ConstType and cast_scalar to as_const (#3946) · tinygrad/tinygrad@77589bc: prereq cleanup to make const arg same python type as dtype
- symbolic codegen and exec by chenyuxyz · Pull Request #1552 · tinygrad/tinygrad: part of #1353 , codegen and exec to implement realize for symbolic inputs. The combined var_vals are passed into kernel function directly. I have implemented the backend for CLANG, GPU, METAL. glob...
Cohere â· #general (34 messagesđ„):
- Single URL Constraint in Command-R: In a discussion about the web-search tool in API Command R+, members clarified that currently only one website can be used with the
site
option of the tool, suggesting that a workaround might be to run an API call for each individual website. - Lack of Multi-step Connectors: Cohere confirmed that connectors cannot be used with multi-step tool use within Command-R at the moment.
- Hopes for Future Command-R Features: A member suggested desirable enhancements for Command-R with a focus on Connectors, such as using multiple websites in
web_search
, sending extra parameters to custom connectors for more granular control, and enabling ause_rerank
option to automatically rerank. A helpful link to the documentation was shared: Cohere Chat Documentation. - Questions on Model Availability: A query was posed about the availability of the âGenerateâ option for fine-tuning models, since it was noticed to be missing from the dashboard, leading to speculation about whether it would be returning.
- Strategies for Efficient Embedding: A member inquired about strategies for keeping data updated to embed efficiently, touching on the need for cost-effective methods to only reindex chunks of data that have been updated.
Link mentioned: Chat API Reference - Cohere Docs: no description found
Cohere â· #collab-opps (2 messages):
- Swedish Salutations: A member from Stockholm, Sweden mentioned using Cohere in their company.
- Nordic Collaboration: Another member highlighted their connection to both Norway and Sweden through their company, Omegapoint.
LangChain AI â· #general (12 messagesđ„):
- Gemini Model Exploration: A member is seeking someone with experience in Gemini 1.0 or 1.5 models to discuss specifics privately via direct message.
- Seeking LLM Observability Tools: Thereâs a request for recommendations on Large Language Model (LLM) observability tools. The member is considering Arize Phoenix or Langfuze, with a preference for a self-hosted, open-source option compatible with LlamaIndex.
- OpenAI and SQL Security: A member inquires about connecting OpenAI directly to an SQL server without using LangChain, prioritizing security in the process.
- Leveraging Langgraph with autoawq: There is a discussion on integrating autoawq with LangGraph for use with exllamav2 kernerls to achieve high inference speeds in powering AI agents.
- PDF Content Extraction Challenge: A new member to langchain and AI programming is seeking advice on how to improve results when splitting a single table that spans multiple pages in a PDF, mentioning theyâve had unsatisfactory results using unstructure for AI-driven PDF content extraction.
LangChain AI â· #langserve (2 messages):
-
AzureSearchVectorStoreRetriever Async Issue: A user mentioned encountering an error due to AzureSearchVectorStoreRetriever not supporting async operations and inquired about possible solutions. Options discussed included either requesting langserver to implement such a feature or creating an async wrapper around the synchronous retrieve function.
-
Using Google Drive Libraries: Another user suggested utilizing the Google Drive libraries for a function, also mentioning the requirement to set the drive key as an environment variable. It was noted that these libraries had been removed and then re-added in the past.
LangChain AI â· #share-your-work (8 messagesđ„):
-
A Trip Down Memory Lane with GPT-1: A blogger has revisited the original GPT-1 model, providing insights into how it laid the groundwork for current LLMs and noting its similarities with models like Mistral-7B. The blog includes discussions on positional embeddings and Conv1D within the transformer block, available at Revisiting GPT-1: The Spark That Ignited LLMs.
-
Showcasing LangChain on Airbnb: A demonstration video titled âD-ID Airbnb Use Case: A RAG Agent Demo using Ollama and Langchain with code on Githubâ illustrates an innovative Live Avatar Q&A for property sites, powered by LangChain with a collection of 150 QA pairs. Check out the demo on YouTube.
-
Serve Up Answers with a Pizza Bot: Another use case for LangChain is presented in a video showcasing a Pizza Bot with a live avatar interface. See this mobile-friendly application in action on YouTube.
-
No-Code Automation for Code Maintenance: An announcement for a no-code platform called Autonoma demonstrates its purpose to automate code improvement tasks, such as input validation, error handling, and testing, which is now available for a free demo and integrating with GitHub. Test these agents through Autonoma Free Demo.
-
Introducing VectorDB Plugin for LM Studio: A GitHub repository has been shared for a plugin named VectorDB, which creates a ChromaDB vector database to function alongside LM Studio in server mode. The repository can be found at VectorDB Plugin for LM Studio on GitHub.
-
QuickVid: AI-Powered YouTube Summarization Tool: QuickVid, a new tool that provides fast summaries and fact verification for YouTube videos, has been launched. Try out QuickVid to enhance your YouTube experience with concise, informed summaries at QuickVid.
-
Tutorial on Creating Webloader RAG Applications: A Medium article details building robust webloader RAG applications using Groq, Langchain, and Datastax to power up your applications. The guide is accessible at Building Powerful Webloader RAG Applications with Groq, Langchain, and Datastax.
Links mentioned:
- Revisiting GPT-1: The spark that ignited the fire of LLMs: A Comprehensive Look at GPT-1's Contribution to the Development of Modern LLMs
- QuickVid: no description found
- GitGud: no description found
- D-ID Airbnb Use Case: A RAG Agent Demo using Ollama and Langchain with code on Github: A demo to help illustrate practical use cases for live avatar assistants for business... I will do a video for the detailed code review so you can try it... ...
- GitHub - BBC-Esq/VectorDB-Plugin-for-LM-Studio: Plugin that creates a ChromaDB vector database to work with LM Studio running in server mode!: Plugin that creates a ChromaDB vector database to work with LM Studio running in server mode! - BBC-Esq/VectorDB-Plugin-for-LM-Studio
LangChain AI â· #tutorials (2 messages):
-
Bonjour from Paris: A member shares a YouTube video titled âAgent RAG: LangChain et LlamaIndex portĂ©s par Mistral Large - Le vent du changementâ, demonstrating the creation of an Advanced RAG assistant using LangChain, Mistral Large, and Llamaindex. The video is meant for the French-speaking community, and the code for the app is available in the videoâs description on GitHub.
-
DIY Llama3 RAG Assistant: Another member presents a tutorial on how to train llama3 with private knowledge to build an agentic RAG, in a YouTube video titled âI want Llama3 to perform 10x with my private knowledgeâ - Local Agentic RAG w/ llama3â. The video aims to guide viewers through the process of enhancing llama3âs performance using their own data.
Links mentioned:
- "I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hxđ Links- F...
- Agent RAG: LangChain et LlamaIndex portés par Mistral Large - Le vent du changement: Dans cette nouvelle vidéo, je vous présente le développement d'un RAG Assitant développé à partir d'agent utilisant Mistral, Langchain et LlamaIndex.Le code ...
Alignment Lab AI â· #ai-and-ml-discussion (2 messages):
- Inappropriate Content Alert: A post promising free leaks of content from Onlyfans featuring 18+ Teen Girls contained a Discord link. The message also included emojis and an
@everyone
tag to draw broad attention.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #programming-help (3 messages):
- Inappropriate Content Alert: A message was posted containing links to explicit content, potentially violating Discordâs community guidelines. The message promoted free access to content involving underage individuals, which is illegal and problematic.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #looking-for-collabs (2 messages):
The provided message does not pertain to AI collaboration, research, or relevant topics for the âlooking-for-collabsâ channel, and it appears to be spam. Therefore, there is no appropriate summary content based on this message.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #general-chat (2 messages):
- Inappropriate Content Alert: A message promoting adult content and so-called âOnlyFans leaksâ was posted, with a Discord invite link provided. This content is clearly inappropriate for the channel and may violate community guidelines.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #landmark-dev (1 messages):
- Spam Alert: A spam message promoting adult content and such materials was posted, including a Discord invitation link. This was likely unrelated to the channelâs focus and may require moderation action.
Link mentioned: Join the e-girl paradise đđ // +18 Discord Server!: Check out the e-girl paradise đđ // +18 community on Discord - hang out with 11801 other members and enjoy free voice and text chat.
Alignment Lab AI â· #landmark-evaluation (1 messages):
- Inappropriate Content Alert: A user posted a message containing explicit content and an invitation link, promoting access to what appears to be private or sensitive media involving underage individuals. The message includes emojis and a Discord invite URL.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #open-orca-community-chat (2 messages):
- Inappropriate Content Alert: A message promoting 18+ content and OnlyFans leaks was posted, including an invitation link and emojis suggesting adult material. The content of the message is against Discordâs community guidelines.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #leaderboard (1 messages):
- Inappropriate Content Alert: A Discord user posted a message promoting adult content, including a mention of â18+ Teen Girls and onlyfans leaks for freeâ, along with an invitation link to another server. The user utilized emojis and tagged @everyone to draw attention.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #looking-for-workers (2 messages):
- Inappropriate Content Warning: A message was posted promoting 18+ Teen Girls and OnlyFans leaks with a Discord invite link. This type of content is likely against the platformâs rules and may warrant moderation action.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #looking-for-work (2 messages):
- Inappropriate Content Alert: The message suggests sharing of leaked content from OnlyFans involving teen girls, accompanied by a Discord invite link. This post raises serious concerns regarding legality and ethics.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #join-in (2 messages):
- Inappropriate Content Alert: A message was posted that promoted adult content including â18+ Teen Girls and onlyfans leaksâ. The post included an emoji of a peach and the underage sign, along with a Discord invitation link.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #fasteval-dev (2 messages):
- Inappropriate Content Alert: A message was posted promoting 18+ Teen Girls and OnlyFans leaks with a Discord invite link. The content appears to be explicit and not suitable for this professional setting.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Alignment Lab AI â· #qa (2 messages):
- Inappropriate Content Alert: A user posted a message promoting adult content including â18+ Teen Girlsâ and âonlyfans leaksâ with a Discord invite link (not clicked or verified). The message uses emojis and tags
@everyone
to attract attention.
Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
AI Stack Devs (Yoko Li) â· #ai-companion (1 messages):
- Concerns Over Criminalizing Coping Mechanisms: A member expressed strong concern regarding criminalizing an unspecified activity that might be the last coping mechanism for men who have suffered from severe personal and legal setbacks. There is a fear that such measures could push these individuals towards extreme actions due to feeling marginalized by society.
AI Stack Devs (Yoko Li) â· #events (2 messages):
-
Game Jam Bonanza with Rosebud AI: Rosebud AI announces a Game Jam in collaboration with Week of AI, inviting participants to create 2D browser-based games with Phaser JS around the theme of Education and AI. A $500 prize pool is up for grabs, and you can find out how to join here.
-
AIxGames Meetup in SF: An AIxGames meetup event is scheduled for this Thursday in San Francisco to connect people working with AI in gaming. There are spots for 160 people, and you can RSVP and check the location here, with a call for demo presentations accessible via this form.
Link mentioned: RSVP to AIxGames Meetup | Partiful: AI is already changing the gaming landscape, and is probably going to change it a lot more. We want to gather as many people working at the intersection of AI and Gaming as we can. Whether it is on âŠ
AI Stack Devs (Yoko Li) â· #ai-town-discuss (8 messagesđ„):
-
Revolutionizing NPC Interactions with LLMs: A user announced their release of LLM-powered NPC models and an inference stack to enhance action spaces and simplify API calls, found at GigaxGames on GitHub. The solution includes a single LLM call feature for complex NPC actions, open-weights on Huggingfaceâs Hub, and an API access offer (with a link that appears to be broken).
-
Overcoming LLM Challenges for Game Development: In pursuit of runtime speeds for gameplay features, they faced multiple issues like NPCs breaking the 4th wall during
speak
commands and missing details in large prompts. The user suggests output compression, minimizing model calls, and leveraging smaller models can significantly impact the NPCâs performance. -
Anticipating a Deep Dive into LLM-Enhanced NPCs: The user has signaled an intent to write a blog post about the experienced struggles and insights relating to the fine-tuning of LLMs for NPC behavior improvement.
-
Peek into a Peerâs Journey with NPC Development: Another user expressed that their project had also encountered challenges with the existing models, noting that Claude 3 performed better possibly owing to its âempatheticâ training background. They are currently exploring a strategy involving functional calling with smaller prompts and are interested in the outputs of such an approach.
Links mentioned:
- GitHub - GigaxGames/gigax: LLM-powered NPCs running on your hardware: LLM-powered NPCs running on your hardware. Contribute to GigaxGames/gigax development by creating an account on GitHub.
- Form - Tally: Made with Tally, the simplest way to create forms.
AI Stack Devs (Yoko Li) â· #ai-town-dev (13 messagesđ„):
- Local Setup Achieved with Ease: A member confirmed they successfully ran the setup locally and found the process very straightforward.
- Kudos for Member Contribution: A member expressed appreciation for the excellent work of another community member.
- Stuck on Windows: One member experienced an issue with cloning the repo on Windows, getting stuck at âChecking for index or schema changesâŠâ. It was clarified that Convex local does not support Windows.
- Alternative Commands for Logs and Development: It was suggested to utilize
just convex dev
for a separate development sync andjust convex logs
to keep tabs on logs, providing commands that include options for tailoring logs and verbose output. - Window Compatibility Workaround: Members discussed workarounds for the lack of Windows support with Convex local, such as using WSL (Windows Subsystem for Linux) or Docker, and mentioned that Windows compilation is in progress.
Skunkworks AI â· #general (15 messagesđ„):
- Exploring HaystackDB Embeddings: A user referenced HaystackDB on GitHub, questioning whether it uses 2-bit embeddings.
- Understanding Binary Quantized Indexing: Clarification was provided that Binary Quantized (BQ) indexing is designed to create a smaller index for similarity search, contributing to a more efficient storage and search mechanism.
- Challenges in Fine-Tuning LLaMA-3: Members express difficulties with fine-tuning LLaMA-3, noting issues such as the model not generating the EOS token, and the embedding layer presenting challenges when loaded in different bit formats.
- Perplexity Fine-Tuning Troubles: Conversations indicate that fine-tuning for perplexity on LLaMA-3 may not yield results better than the original models, with suggestions that the tokenizer could be contributing to the issues.
- Potential Breakthrough with LLaMA-3 Fine-Tuning: A group member shared success in fine-tuning LLaMA-3 by utilizing LLaMA-3 specific prompt formatting, linking to a relevant GitHub pull request for further information.
Links mentioned:
- GitHub - carsonpo/haystackdb: Contribute to carsonpo/haystackdb development by creating an account on GitHub.
- feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by 0-hero · Pull Request #1553 · OpenAccess-AI-Collective/axolotl: Description This builds on top of and includes the changes in the below PR's #1542 #1539 Fastchat PR from @TJ-Solergibert needs to be merged before merging this lm-sys/FastChat#3257 Motivatio...
Skunkworks AI â· #off-topic (1 messages):
oleegg: https://youtu.be/tYzMYcUty6s?si=t2utqcq36PHbk9da
Mozilla AI â· #announcements (1 messages):
-
Mozilla AI is on a Hiring Spree: Mozilla AI has announced open positions and is on the lookout for new talent. Check out the opportunities and consider applying here.
-
Evaluate Models with Lm-buddy: An open-source tool named Lm-buddy has been introduced for helping evaluate language models more effectively. The tool can be explored and contributed to via the link provided here.
-
Prometheus Puts Local LLMs on the Bench: A project called Prometheus demonstrates the use of Local Large Language Models (LLMs) in the role of a judge. This innovative application can be discussed and delved into further in the dedicated channel linked here.
Mozilla AI â· #llamafile (13 messagesđ„):
- AI Tokens Generation Speed Inquiry: A member inquired about the efficiency of token generation in llama.cpp/llamafile, noting that their implementation of inference for llama2 spends 95% of time on matrix-vector multiplications. They wondered if loop unrolling in llama.cpp could account for its 30% faster performance, as they observed both looping and vectorization in disassembly.
- LLaMA Naming Mix-Up: One user experienced a humorous mix-up with message parameters, setting themselves as âZâ and then forgetting about it, leading to some confusion when messages appeared as if LLaMA was talking to itself.
- Pseudonymous Intrusion Causes Confusion: Another user recounted an unusual event where someone joined a chat under the name âkimkardashian,â causing a bizarre situation. However, the anomaly could not be replicated in subsequent runs.
- Technology Integration Troubles: A user struggled to integrate LLaMA with a Plush-for-comfyUI node. Despite the node functioning with other OpenAI endpoints, it failed to operate correctly with llamafile.
- LLaMA3 Compatibility and Support Communication: Thereâs an acknowledged issue with running LLaMA3:8b on M1 Macbook Air specifically with llamafile, whereas it runs without problem on Ollama. A pledge was made to prioritize M1 compatibility testing once other ongoing issues with LLaMA3 are resolved.
Interconnects (Nathan Lambert) â· #ideas-and-feedback (1 messages):
Since the provided message appears to be the only one or part of a single message without additional context or other messages, a summarization cannot be performed. Please provide a set of messages from the âideas-and-feedbackâ channel, so that I can create an appropriate summary.
Interconnects (Nathan Lambert) â· #news (4 messages):
- Exploring OLMo with Hanna Hajishirzi: A recent talk by Hanna Hajishirzi from AI2 on âOLMo: Findings of Training an Open LMâ has been shared, held at the Open-Source Generative AI Workshop at Cornell Tech. The slides for the talk can be accessed here.
- Intensity of the Information Flow: A member reveals that Hanna Hajishirzi is their manager who moves at an incredibly fast pace, suggesting the depth and complexity of her lectures.
- OLMo Presentation Overwhelming but Impressive: Another member finds the content of Hannaâs 25-minute talk â covering topics like OLMo, Dolma, Tulu â quite vast and a bit overwhelming, yet acknowledges her impressive profile and the value such information may have for students.
Link mentioned: Hanna Hajishirzi (AI2) - OLMo: Findings of Training an Open LM: Talk from the Open-Source Generative AI Workshop at Cornell Tech. Speaker: https://homes.cs.washington.edu/~hannaneh/Slides - https://drive.google.com/file/dâŠ
Interconnects (Nathan Lambert) â· #reads (2 messages):
-
Insights from John Schulman through Gist: A GitHub Gist provided valuable insights, summarizing a talk by John Schulman related to reinforcement learning for language model-based systems.
-
Questioning the Utility of AI Leaderboards: A blog post by Sayash Kapoor and Benedikt Stroebl claims thereâs no current accurate method to determine the best AI for code generation. They highlight that the LLM debugger (LDB), while topping the HumanEval leaderboard for code generation, is a costly agent due to its reliance on running costly language models like GPT-4.
Links mentioned:
- AI leaderboards are no longer useful. It's time to switch to Pareto curves.: What spending $2,000 can tell us about evaluating AI agents
- rl-for-llms.md: GitHub Gist: instantly share code, notes, and snippets.
Interconnects (Nathan Lambert) â· #posts (1 messages):
SnailBot News: <@&1216534966205284433>
LLM Perf Enthusiasts AI â· #jobs (1 messages):
-
AI Engineer Wanted at Renowned AI-Powered Gamma: Gamma, ranked #16 on a16zâs top 100 consumer AI apps, is on the lookout for an AI engineer to innovate in presentation and website design through AI. The role includes prompt engineering, metrics/evaluations, fine-tuning, and creating features with cutting-edge models, with the job details available at Gamma Careers.
-
Pushing the Limits of Large Language Models: Candidates without extensive engineering experience are considered if they possess practical expertise in maximizing the potential of Large Language Models (LLMs). The position is based in San Francisco and requires in-person collaboration.
-
Gammaâs Impressive AI-Powered Growth and Culture: Gamma boasts over 10 million users grown organically, is profitable with $10M+ in funding, operates with a lean 16-member team, and promotes an office culture with a hybrid workweek in San Francisco.
-
Inventive Content Creation at Scale: With an ambition of simplifying content creation, Gamma creates over a million images and processes millions of LLM requests daily. They aim to eliminate the complexities involved in crafting engaging presentations and websites.
Link mentioned: AI Engineer: AI Engineer San Francisco Click here to apply
LLM Perf Enthusiasts AI â· #openai (3 messages):
- Speculation on GPT-4.5 Leak: A tweet by @phill__1 sparked discussions as it suggested the gpt2-chatbot feels like GPT-4.5, boasting âinsane domain knowledgeâ. The link to the tweet: phill__1âs observation.
- Community Buzzing About Potential Leak: Members in the channel expressed belief that the gpt2-chatbot could be an inadvertent preview of GPT-4.5.
- Concise Praise for the Mystery Bot: A terse endorsement was shared by a member, simply stating, âItâs goodâ.
Link mentioned: Tweet from Phil (@phill__1): Whatever gpt2-chatbot might be, it definitely feels like gpt4.5. It has insane domain knowledge I have never seen before
Datasette - LLM (@SimonW) â· #llm (3 messages):
-
Custom Grammar for Code-Generation Talk: A user showed interest in passing a custom grammar, potentially as a model-specific option, to focus on semantic errors in code generation rather than syntax ones.
-
User Experience Brainstorm for Datasette: Ideas were sought for a UX design on Datasetteâs front page that would allow users to select options from a drop-down, like choosing a country to generate a summary table.
-
Direct Data Access via Dropdown Selection: A member proposed two UX approaches: one by updating the URL upon an event to direct the user to relevant data, and another allowing users to âbuildâ the homepage by updating canned queries based on their selections.
DiscoResearch â· #general (1 messages):
- Fast Loading on Local Machine: Discussion revolved around the observation that a process loads in 3 seconds when running on the machine, yet there seems to be an issue when doing the same through submitting a job. This suggests storage may not be the contributing factor to the problem in a job submission context.
DiscoResearch â· #benchmark_dev (1 messages):
le_mess: llama 3 seems to beat gpt4 on scandeval https://scandeval.com/german-nlg/