**Not thinking about ~~superalignment~~ ~~Google~~ ~~Scarlett Johansson~~ is all you need.**

AI News for 5/17/2024-5/20/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (366 channels, and 9564 messages) for you. Estimated reading time saved (at 200wpm): 1116 minutes.

While it was a relatively lively weekend, most of the debate was nontechnical in nature, with no announcements being an obvious candidate for this top feature.

So have a list of minor notes in its place:

  • We have deprecated some inactive Discords and added Hamel Husain and Dan Becker’s new LLM Finetuning Discord for his popular Maven course (affiliate link here)
  • HuggingFace’s ZeroGPU, available via Hugging Face’s Spaces, committing $10 million in free shared GPUs to help developers create new AI technologies because Hugging Face is “profitable, or close to profitable”
  • LangChain followed up its v0.2 release with a much needed docs update
  • Omar Sanseviero’s thread on the smaller model releases from last week (some of which we covered in AInews) - BLIP3, Yi-1.5, Kosmos 2.5, Falcon 2, PaliGemma, DeepSeekV2, et al

But who are we kidding, you probably want to read Scarlett’s apple notes takedown of OpenAI (:

image.png


Table of Contents

[TOC]


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Model Releases and Updates

  • Gemini 1.5 Pro and Flash models released by Google DeepMind: @_philschmid shared that Gemini 1.5 Pro is a sparse multimodal MoE model handling text, audio, image and video with up to 10M context, while Flash is a dense Transformer decoder model distilled from Pro that is 3x faster and 10x cheaper. Both support up to 2M token context.
  • Yi-1.5 models with longer context released by Yi AI: @01AI_Yi announced the release of Yi-1.5 models with 32K and 16K context lengths, available on Hugging Face. @rohanpaul_ai highlighted the much longer context window.
  • Other notable model releases: @osanseviero recapped the week’s open ML updates, including Kosmos 2.5 from Microsoft, PaliGemma from Google, CumoLLM, Falcon 2, DeepSeek v2 lite, HunyuanDiT diffusion model, and Lumina next.

Research Papers and Techniques

  • Observational Scaling Laws paper generalizes compute scaling laws: The paper discussed by @arankomatsuzaki and @_jasonwei handles multiple model families using a shared, low-dimensional capability space, showing impressive predictive power for model performance.
  • Layer-Condensed KV Cache enables efficient inference: @arankomatsuzaki shared a paper on this technique which achieves up to 26× higher throughput than standard transformers for LLMs.
  • Robust agents learn causal world models: @rohanpaul_ai summarized a paper showing agents satisfying regret bounds under distributional shifts must learn an approximate causal model of the data generating process.
  • Linearizing LLMs with SUPRA method: @rohanpaul_ai shared a paper on SUPRA which converts pre-trained LLMs into RNNs with significantly reduced compute costs.
  • Studying hallucinations in fine-tuned LLMs: @rohanpaul_ai summarized a paper showing that introducing new knowledge through fine-tuning can have unintended consequences on hallucination tendencies.

Frameworks, Tools and Platforms

  • Hugging Face expands local AI capabilities: @ClementDelangue announced new capabilities for local AI on Hugging Face with no cloud, cost or data sent externally.
  • LangChain v0.2 released with major documentation improvements: @LangChainAI and @hwchase17 highlighted the release including versioned docs, clearer structure, consolidated content, and upgrade instructions.
  • Cognita framework builds on LangChain for modular RAG apps: @LangChainAI shared this open source framework providing an out-of-the-box experience for building RAG applications.
  • Together Cloud adds H100 GPUs for model training at scale: @togethercompute announced adding 6,096 H100 GPUs to their fleet used by AI companies.

Discussions and Perspectives

  • Hallucinations as blockers to production LLMs: @realSharonZhou noted hallucinations are a major blocker, but shared that <5% hallucinations have been achieved by tuning LLMs to recall specifics with “photographic memory”.
  • Anthropic reflects on Responsible Scaling Policy progress: @AnthropicAI shared reflections as they continue to iterate on their framework.
  • Challenges with RAG applications: @jxnlco booked an expert call for help, and @HamelHusain shared details on an upcoming RAG workshop.
  • Largest current use cases for LLMs: @fchollet listed the top 3 as StackOverflow replacement, doing homework, and internal enterprise knowledge bases.

Memes and Humor

  • Meme on testing LLM coding with the snake game: @svpino joked that the source code is easily found on Google so you don’t need an LLM for that.
  • Meme about AI girlfriend apps: @bindureddy joked they are the largest category of consumer apps using LLMs, despite giant AI models being invented to “solve the mysteries of the universe”.
  • Meme on open-source AGI to prevent nerfing: @bindureddy joked the #1 reason is to prevent models from being nerfed and censored, referencing the movie Her.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Advancements and Capabilities

AI Safety and Alignment

  • OpenAI dissolves AI safety team: In /r/OpenAI, it’s reported that OpenAI has dissolved its Superalignment AI safety team.
  • Unconventional AI attack vectors: A post discusses how misaligned AI may use unconventional attack vectors like disrupting phytoplankton to destroy ecosystems rather than bioweapons or nuclear risks.
  • Dishonesty in aligned AI: Even benevolently-aligned superintelligent AI may need to be dishonest and manipulative to achieve goals beyond human comprehension, according to a post.

AI Impact on Jobs and Economy

AI Models and Frameworks

AI Ethics and Societal Impact


AI Discord Recap

A summary of Summaries of Summaries

  1. LLM Fine-Tuning Advancements and Challenges:

    • Unsloth AI enables effective fine-tuning of models like Llama-3-70B Instruct using optimized techniques, but legal concerns around using IPs like Scarlett Johansson suing OpenAI were discussed.
    • The LLM Fine-Tuning course sparked debates on quality, with some finding the initial content basic while others appreciated the hands-on approach to training, evaluation, and prompt engineering.
    • Discussions on LoRA fine-tuning highlighted optimal configurations, dropout, weight decay, and learning rates to prevent overfitting, especially on GPUs like the 3090, as shared in this tweet.
  2. Multimodal and Generative AI Innovations:

    • Hugging Face pledged $10 million in free GPUs to support small developers, academics, and startups in creating new AI technologies.
    • The Chameleon model from Meta showcased state-of-the-art performance in understanding and generating images and text simultaneously, surpassing larger models like Llama-2.
    • GPT-4o integration with LlamaParse enabled multimodal capabilities, while concerns were raised about its Chinese token pollution.
    • Innovative projects like 4Wall AI and AI Reality TV explored AI-driven entertainment platforms with user-generated content and social simulations.
  3. Open-Source Datasets and Model Development:

    • Frustrations mounted over the restrictive non-commercial license of the CommonCanvas dataset, which limits modifications and derivatives.
    • Efforts focused on creating high-quality open-source datasets, like avoiding hallucinations in captions that can damage visual language models (VLLMs) and text-to-image (T2I) models.
    • The Sakuga-42M dataset introduced the first large-scale cartoon animation dataset, filling a gap in cartoon-specific training data.
    • Concerns were raised over the CogVLM2 license restricting use against China’s interests and mandating Chinese jurisdiction for disputes.
  4. AI Safety, Ethics, and Talent Acquisition:

    • Key researchers like Jan Leike resigned as head of alignment at OpenAI, citing disagreements over the company’s priorities, sparking discussions on OpenAI’s controversial employment practices.
    • OpenAI paused the use of the Sky voice in ChatGPT following concerns about its resemblance to Scarlett Johansson’s voice.
    • Neural Magic sought CUDA/Triton engineers to contribute to open-source efforts, focusing on activation quantization, sparsity, and optimizing kernels for MoE and sampling.
    • Discussions on the need for better AI safety benchmarks, with suggestions for “a modern LAMBADA for up to 2M” to evaluate models processing overlapping chunks independently (source).

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

  • Deepseek Dilemma: The Deepseek architectural differences have rendered it non-functional, with consensus from users that “it probs doesn’t work”. Attempts to operationalize it are on hold until solutions emerge.

  • Fine-Tune Frontier: Refinements for Meta-Llama models have been shared where users can now fine-tune Llama-3-70B Instruct models effectively using “orthogonalized bfloat16 safetensor weights”. However, the community is still exploring the implications of using famous IPs in model fine-tuning, citing concerns such as Scarlett Johansson suing OpenAI.

  • Colab Conundrums and JAX Jousts: An inquiry about running a 6GB dataset with a 5GB Llama3 model on Colab or Kaggle T4 sparked mixed responses due to storage versus VRAM usage. Meanwhile, using JAX on TPUs proved effective, despite initial skepticism, especially for Google TPUs.

  • Multi-GPU Madness and Dependency Despair: Community members are highly anticipating multi-GPU support from Unsloth, recognizing the advantages it could bring to their workflows. Environment setup posed challenges, particularly with WSL and native Windows installations and fitting dependencies like Triton into the mix.

  • Showcase Shines with Finetuning Feats: Innovations in finetuning were spotlighted, including a Text2Cypher model, shared via a LinkedIn post. A comprehensive article on sentiment analysis utilizing LLaMA 3 8b emerged on Medium, signposting a path for others to replicate the finetuning process with Unsloth.


HuggingFace Discord

New Dataset Invites AI Experiments: A Tuxemon dataset has been presented as an alternative to Pokemon datasets, offering cc-by-sa-3.0 licensed images for greater experimentation freedom. It provides images with two caption types for diverse descriptions in experiments.

Progress in Generative AI Learning Resources: Community suggestions included “Attention is All You Need” and the HuggingFace learning portal for those seeking knowledge on Generative AI and LLMs. Discussion of papers such as GROVE and the Conan benchmark for narrative understanding indicates an active interest in advancing collective understanding.

AI Influencers Crafted by Vision and AI: A tutorial video was highlighted, showing how to craft virtual AI influencers using computer vision and AI, reflecting a keen interest in the intersection of technology and social media phenomena.

Tokenizer Set to Reduce Llama Model Size: A newly developed tokenizer, Tokun, promises to shrink Llama models 10-fold while enhancing performance. This novel approach is revealed on GitHub and discussed on Twitter.

Clarifying LLMs Configuration for Task-Specific Queries: AI engineers focused on configuring Large Language Models for HTML generation and maintaining conversation history in chatbots. The community suggested manual intervention, like appending previous messages to the new prompt, to address these nuanced challenges.


Perplexity AI Discord

Frustration with Perplexity’s GPT-4o Performance: Engineers noted that GPT-4o’s tendency to repeat responses and ignore prompt changes is a step back in conversational AI, with one comparing it unfavorably to previous LLMs and expressing disappointment in its interaction abilities.

Calling All Script Kiddies for Better Model Switching: Users are actively sharing and utilizing custom scripts to enable dynamic model switching on Perplexity, notably with tools like Violentmonkey, which acts as a patch for these service limitations.

API Quirks and Quotas: Confusion exists around Perplexity’s API rate limits—differentiating between request and token limits—and its implications for engineers’ workflows. Meanwhile, discussions surfaced about API performance testing with a preference for the Omni model and clarifications sought for the threads feature to support conversational contexts.

A Quest for Upgraded API Access: Users continue to press for improved API access, expressing a need for higher rate limits and faster support responses, indicative of growing demands on machine learning infrastructure.

Engineers Explore AI Beyond Chat: Links shared amongst users indicate interests widening to Stability AI’s potential, mental boosts from physical exercise, exoplanetary details with WASP-193b, and generating engaging content for children through AI-assisted Dungeons & Dragons scenario crafting.


OpenAI Discord

Voices Silenced: OpenAI has paused the use of the Sky voice in ChatGPT, with a statement and explanation provided to address user concerns.

Language Models Break Free: Engineers report success running LangChain without the OpenAI API, describing integrations with local tools such as Ollama.

GPT-4o Access Rolls Out But With Frictions: Differences between GPT-4 and GPT-4o are evident, with the latter showing limitations in token context windows and caps on usage affecting practical applications. Enhanced, multimodal capabilities of GPT-4o have been recognized, and pricing alongside a file upload FAQ were shared to provide additional usage clarity.

Prompt Crafting Challenges and Innovations: In the engineering quarters, there’s a mix of challenges in prompt refining for self-awareness and technical integration, yet innovative prompt strategies are being shared to elevate creative and structured generation. JSON mode is suggested as a viable tool for improving command precision; OpenAI’s documentation stands as a go-to reference.

API Pains and Gains: Inconsistencies with chat.completion.create are reported among API users, with incomplete response issues and a demonstrated preference for JSON mode to control format and content. Despite hiccups, there’s a vivid discussion on orchestrating creativity, with someone proposing “Orchestrating Innovation on the Fringes of Chaos” as an explorative approach.


LM Studio Discord

  • LM Studio’s Antivirus False Alarms: LM Studio users noted that llama.cpp binaries are being flagged by Comodo antivirus due to being unsigned. Users are advised to consider this as a potential issue when encountering security warnings.

  • Model Loading and Hardware Discussions: There are discussions about various GPUs, with one user finding the Tesla P100 underperforming compared to expectations. Other talks point to Alder Lake CPU e-cores impacting GPT quantization performance. On the RAM front, higher speeds are tied to better LLM performance.

  • GGUF Takes The Stage: Users discussed integrating models from Hugging Face into LM Studio, where GGUF (General Good User Feedback) files are recommended for compatibility. The community provided positive feedback on the recent “HF -> LM Studio deeplink” feature for importing models.

  • Creative Use Cases for LLMs Mingle: Ranging from medical LLM recommendations like OpenBioLLM to benchmarks for generating SVG and ASCII art, users are actively exploring diverse applications. One model, MPT-7b-WizardLM, was highlighted for its potential in generating uncensored stories.

  • LM Studio autogen Shortcomings and Fixes: A bug in LM Studio’s autogen feature, which resulted in brief responses, was discussed, with a fix involving setting max_tokens to -1. Users also pointed out discrepancies between LM Studio’s local server and OpenAI specifications, affecting tool_calls handling for applications like AutoGPT.


Stability.ai (Stable Diffusion) Discord

Crafting the Perfect Prompt for LoRAs: Engineers have shared a prompt structure to leverage multiple LoRAs in Stable Diffusion, but observed diminishing returns or issues with more than three layers, implying potential optimization avenues.

First-Time Jitters with Stable Diffusion: A ‘NoneType’ object attribute error is causing a hiccup for a new Stable Diffusion user on the initial run, sparking a call for troubleshooting expertise without a clear resolution.

SD3’s Arrival Sparks Anticipation and Doubt: There’s a split in sentiment regarding the release of SD3, with a mixture of skepticism and optimism backed by Emad Mostaque’s tweet, indicating that work is under way.

Topaz Tussle: The effectiveness of Topaz as a video upscaling solution prompted debate. Engineers acknowledged its strength but contrasted with the appeal of ComfyUI, highlighting considerations like cost and functionality.

Handling the Heft of SDXL: A user underlined the importance of sufficient VRAM when wrangling with SDXL models’ demands for higher resolutions, and it was clarified that SDXL and SD1.5 require distinct ControlNet models.


Modular (Mojo đŸ”„) Discord

Mojo on Windows Still a WIP: Despite active interest, Mojo doesn’t natively support Windows and currently requires WSL; users have faced issues with CMD and PowerShell, but Windows support is on the horizon.

Bend vs. Mojo: A Performance Perspective: Discussions highlighted Chris Lattner’s insights on Bend’s performance, noting that while it’s behind CPython on a single core, Mojo is designed for high-performance scenarios. The communities around both languages are anticipating enhanced features and upcoming community meetings.

Llama’s Pythonic Cousin: The community noted an implementation of Llama3 from scratch, available on GitHub, described as building “one matrix multiplication at a time”, a fascinating foray into the nitty-gritty of language internals.

Diving Deep into Mojo’s Internals: Various discussions included insights into making nightly the default branch to avoid DCO failures, potential list capacity optimization in Mojo, SIMD optimization debates, a suggestion for a new list method similar to Rust’s Vec::shrink_to_fit(), and tackling alias issues that lead to segfaults. Key points brought up included community contributions for list initializations which could lead to performance improvement, and patches affecting performance positively.

Inside the Mind of an Engineer: Technical resolution of PR DCO check failures was discussed with procedural insights provided; flaky tests provoked discussions about fixes and CI pain points; and segfaults in custom array types prompted peer debugging sessions. The community showed appreciation for sharing intricate details that help unravel optimization mysteries.


LLM Finetuning (Hamel + Dan) Discord

  • LLM Workshops and Fine-tuning Discussions Heat Up: Participants are gearing up for upcoming workshops including Build Applications in Python with Jeremy Howard, and a session on RAG model optimization. Practical queries around finetuning are raised, such as PDF parsing techniques using tools like LlamaParse and GPT-4o, and how to serve fine-tuned LLMs with frameworks like FastAPI and Streamlit.

  • Tech Titans Troubleshoot and Coordinate on Challenges: Asiatic enthusiasts across various locations are networking and tackling challenges such as Modal command errors, discussing the potential for fine-tuning in vehicle failure prediction, and brain-picking on the LoRa configurations for pretraining LLMs.

  • Credit Chronicles Continue Across Platforms: Participants navigate the process of securing and confirming credits for services like JarvisLabs, with organizers coordinating behind-the-scenes to ensure credit allocation to accounts, sometimes facing registration hurdles due to mismatching emails.

  • Learning Resources Rendezvous: A repository of knowledge, from Hamel’s blogs to a CVPR 2024 paper on a GPT-4V open-source alternative, is highlighted. There’s chatter about potentially housing these gems in a communal GitHub repo, and finding ways to structure learning materials more effectively.

  • Singapore Crowds the Scene: A surprisingly high turnout from Singapore in the Asia timezone channel sparks comments on the notable national representation. Excitement is palpable as new faces introduce themselves, all the while maneuvering through the orchestration of credits and leveraging learning opportunities.

Eager learners and burgeoning experts alike remain vested in the transformational tide of fine-tuning, extraction, applications, and other facets of LLMs, suggesting a period filled with intellectual synergies and the relentless pursuit of practical AI engineering prowess.


Nous Research AI Discord

  • Benchmarks and AGI Discussions Spark Engineer Curiosity: Engineers contemplate the need for improved benchmarks, with calls for “a modern LAMBADA for up to 2M” to evaluate models that process overlapping chunks independently, discussed alongside a paper on AGI progress and necessary strategies titled “The Evolution of Artificial Intelligence.”

  • Sam Altman Parody Tweet Ignites Laughter, VC Skepticism, and AI’s Economic Riddles: A provocative parody tweet by Sam Altman opens discussions on the role of venture capitalists in AI, the actual financial impact of AI on company layoffs, and a member-inquiry on attending the Runpod hackathon.

  • Hermes 2 Mixtral: The Beginning of Action-Oriented LLMs: The Nous Hermes 2 Mixtral is praised for its unique ability to trigger actions within the CrewAI agent framework, with discussions also touching on multilingual capabilities, the importance of multiturn data, and high training costs.

  • Model Utilization Tactics: Engineers compare the effectiveness of finetuning versus advanced prompting with models like Llama3 and GPT-4, while they also seek benchmarks for fine-tuned re-rankers and highlight the advantages of local models for tasks with sensitivity and predictability needs.

  • WorldSim Enters the Age of GPT-4o with Terminal UX Revamp: WorldSim receives a terminal UX revamp with imminent GPT-4o integration, while the community engages with complex adaptive systems, symbolic knowledge graphs, and explores WorldSim’s potential for generating AI-related knowledge graphs.


CUDA MODE Discord

Hugging Face Pumps $10M into the AI Community: Hugging Face commits $10 million to provide free shared GPU resources for startups and academics, as part of efforts to democratize AI development. Their CEO Clement Delangue announced this following a substantial funding round, outlined in the company’s coverage on The Verge.

A New Programming Player, Bend: A new high-level programming language called Bend enters the scene, sparking questions about its edge over existing GPU languages like Triton and Mojo. Despite Mojo’s limitations on GPUs and Triton’s machine learning focus, Bend’s benefits are enunciated on GitHub.

Optimizing Machine Learning Inference: Experts exchange advice on building efficient inference servers, recommending resources like Nvidia Triton and TorchServe for model serving. Contributions highlighted included applying optimizations when using torch.compile() for static shapes and referencing code improvements on GitHub for better group normalization support in NHWC format, detailed in this pull request.

CUDA Complexities - Addition and Memory: Engaging debates unraveled around atomic operations for cuda::complex and the threshold limitations for 128-bit atomicCAS. The community shared code workarounds and accepted methodologies for complex number handling and discussed potential memory overheads during in-place multiplication in Torch.

Scaling and Optimizing the CUDA Challenge: The community dissected issues with gradient clipping, the potential in memory optimization templating, and ZeRO-2 implementation. They shared multiple GitHub discussions and pull requests (#427, #429, #435), indicating a dedicated focus on performance and fine-tuning CUDA applications.

Tackling ParPaRaw Parser Performance: Inquiries arose regarding benchmarks of libcudf against CPU parallel operations, hinting at the community’s enthusiasm for efficient parsing and making note of performance gains in GPUs over CPUs. Attention was given to the merger of Dask-cuDF into cuDF and the subsequent archiving of the former, as seen on GitHub.

Zoom into GPU Query Engines: An upcoming talk promises insights into building a GPU-native query engine from a cuDF veteran at Voltron, illuminating strategies from kernel design to production deployments. Details for tuning in are available through this Zoom meeting.

CUDA Architect Dives into GPU Essentials: A link was shared to a YouTube talk by CUDA Architect Stephen Jones, offering clarity on GPU programming and efficient memory use strategies essential for modern AI engineering tasks. Dive into the GPU workings through the link here.

Seeking Talent for CUDA/Triton Innovations at Neural Magic: Neural Magic is on the lookout for enthusiastic engineers to work on CUDA/Triton projects with a spotlight on activation quantization. They’re especially interested in capitalizing on next-gen GPU features such as 2:4 sparsity and further refining kernels in MoE and sampling.

Unpacking PyTorch & CUDA Interactions: A detailed brainstorm ensued over efficient PyTorch data type packing/unpacking for PyTorch with CUDA, with a spotlight on uint2, uint4, and uint8 types. Project management and collaborative programming featured heavily in the discussion, with a nod to GitHub Premier #135 for custom CUDA extension management.

Barrier Synchronization Simplified: A community member helps others grasp the concept of barrier synchronization by comparing it to ensuring all students are back on the bus post a museum visit, a relatable analogy that underpins synchronized processes in GPU operations.

Democratizing Bitnet Protocols: There’s a joint effort to host bitnet group meetings and review important tech documentation, with quantization discussions focused on transforming uint4 to uint8 types. Shared resources are guiding these meetings, as mentioned in the collaboration drive.


Eleuther Discord

  • Spam Alert in CC Datasets: The Eleuther community identified significant spam in Common Crawl (CC) datasets, with Chinese texts being particularly affected. A Technology Review article on GPT-4O’s pollution highlights similar concerns, flagging issues with non-English data cleaning.

  • OpenELM Sets Its Sights on Efficiency: A new LLM called OpenELM, spotlighted for its reproducibility and 2.36% accuracy improvement over OLMo, piqued interest among members. For details, check out the OpenELM research page.

  • Memory Efficiency in AI’s Crosshairs: The challenges of calculating FLOPs for model training attracted attention, with EleutherAI’s cookbook providing guidance for accurate estimations, a crucial aspect for optimizing memory and computational resource use.

  • Cross-Modality Learning Steals the Limelight: Researchers are exploring whether models like ImageBind and PaLM-E benefit unimodal tasks after being trained on multimodal data. The integration of zero-shot recognition and modality-specific embeddings could enhance retrieval performance, with papers such as ImageBind and PaLM-E central to this dialogue.

  • The Perks and Quirks of Model Tuning: Members noted automatic prompt setting in HF models and discussed fine-tuning techniques, including soft prompt tuning in non-pipeline cases. However, issues arise, such as ‘param.requires_grad’ resetting after calling ‘model.to_sequential()’, which can hinder development processes.


Interconnects (Nathan Lambert) Discord

Model Melee with Meta, DeepMind, and Anthropic: Meta’s Chameleon model boasts 34B parameters and outperforms Flamingo and IDEFICS with superior human evaluations. DeepMind’s Flash-8B offers multimodal capabilities and efficiency, while their Gemini 1.5 models excel in benchmarks. Meanwhile, Anthropic scales up with four times the compute of their last model, and LMsys’s “Hard Prompts” category brings new challenges for AI evaluations.

AI-Safety Team Breakup Causes Stir: OpenAI’s superalignment team, including Ilya Sutskever and Jan Leike, has disbanded amidst disagreements and criticisms of OpenAI’s policies. The dismissal and departure agreements at OpenAI drew particular ire due to controversial lifetime nondisparagement clauses.

Podcast Ponderings and Gaming Glory: The Retort AI podcast analyzed OpenAI’s moves, spark debates over vocab size scaling laws, and referenced hysteresis in control theory with a hint of humor. Calls of Duty gaming roots and ambitions for academic content creation on YouTube were shared with nostalgia.

Caution with ORPO: Skepticism rose about the ORPO method’s scalability and effectiveness, with community members sharing test results suggesting a potential for over-regularization. Concerns about the method were amplified by its addition to the Hugging Face library.

Challenging Chinatalk and Learning from Llama3: A thumbs-up for the Chinatalk episode, the value of llama3-from-scratch as a learning resource, and a clever Notion blog explaining Latent Consistency Models provided informative suggestions for self-development. However, a warning about the legal risks of the Books4 dataset spiced up the dialogue.


Latent Space Discord

  • Scaling Up Is the Secret Sauce: Geoffrey Hinton endorsed Ilya Sutskever’s belief in scaling as a key to AI success, stating “[Ilya] was always preaching that you just make it bigger and it’ll work better. Turns out Ilya was basically right.” The discussion highlighted a full interview where Hinton shared this insight.

  • Wind of Change for Vertical Axis: EPFL researchers have utilized a genetic algorithm to optimize vertical-axis wind turbines, aiming to surpass the limitations of horizontal-axis versions. The work is promising for quieter, more eco-friendly turbines with details in the full article.

  • AI Agents, Free Will Included?: Discussions revolved around the autonomy of AI agents, featuring Andrew Ng’s thoughts on AI agents and Gordon Brander’s assertions about self-adaptive AI in a shared YouTube video.

  • Exit of a Principal Aligner: After Jan Leike resigned as head of alignment at OpenAI, the community pondered the ramifications while Sam Altman and Greg Brockman shared their thoughts, found here.

  • Programming Language Face-off for AI: With Hugging Face’s adoption of Rust in projects like Candle and tokenizers, and Go maintaining its niche in HTTP request-based AI applications, the debate over which language reigns supreme for AI development is still hot.


LlamaIndex Discord

  • Memory Matters for Autonomous Agents: A webinar featuring the memary project, focusing on long-term memory for autonomous agents, is scheduled for Thursday at 9 AM PT. AI engineers interested in memory challenges and future directions can sign up for the event.

  • QA Undermined by Tables: LLMs are still stumped by complex tables like the Caltrain schedule, leading to hallucination issues due to poor parsing, with more details available in this analysis.

  • Speed Up Vector Search by Digits: JinaAI_ has shared methods to boost vector search speeds 32-fold using 32-bit vectors, sacrificing only 4% accuracy—a critical optimization for production applications.

  • San Francisco’s Gathering of AI Minds: LlamaIndex plans an in-person San Francisco meetup at their HQ focusing on advanced RAG engine techniques, with RSVPs accessible here.

  • Metadata Know-how for Data Governance: Engineers propounded the utility of MetaDataFilters for data governance at a DB level within LlamaIndex and posited the idea of selective indexing for sensitive financial data.

  • Integrating with GPT-4o: A notable discussion featured the integration of GPT-4o with LlamaParse, with the Medium article on the topic receiving recognition and acclaim from community members.


LAION Discord

  • Contentious Licensing Limitations Loom: The CommonCanvas dataset, which provides 70M image-text pairs, sparked debate due to its restrictive non-commercial license and prohibition on derivatives, frustrating members who see potential for beneficial modifications (CommonCanvas announcement).

  • Tech Talk — PyTorch Puzzles Engineers: There’s significant discussion about PyTorch’s native_group_norm causing slowdowns when not using torch.compile; with one member noting near-par performance with eager mode versus the compiled approach.

  • Datasets Under Scrutiny for Integrity: AI engineers are concerned about the impact of hallucinated captions in training visual language models (VLLMs) and text-to-image models (T2I), while also expressing intent to create high-quality open-source datasets to avoid such issues.

  • New Kid on the Mixed-Modal Block: The Chameleon model is recognised for its impressive ability to understand and generate images and text, showing promise in image captioning and generative tasks over larger models like Llama-2 (Chameleon arXiv paper).

  • CogVLM2’s Controversial Conditions: Members were cautioned about the CogVLM2 model’s license which includes clauses potentially limiting use against China’s interests and imposing a Chinese jurisdiction for disputes (CogVLM2 License).


AI Stack Devs (Yoko Li) Discord

4Wall Beta Unveiled: 4Wall, an AI-driven entertainment platform, has entered beta, offering seamless AI Town integration and user-generated content tools for creating maps and games. They’re also working on 3D AI characters, as showcased in their announcement.

Game Jam Champions: The Rosebud / #WeekOfAI Education Game Jam has announced winners, including “Pathfinder: Terra’s Fate” and “Ferment!”, highlighting AI’s potential in educational gaming. The games are accessible here, and more details can be found in the announcement tweet.

AI Town’s Windows Milestone: AI Town has achieved compatibility natively with Windows, as celebrated in a Tweet, and sparked discussions on innovative implementations, with conversation dump methods using tools like GitHub - Townplayer. Additionally, users are exploring creative scenarios in AI Town using in-depth world context integration.

Launch of AI Reality TV: The launch of an interactive AI Reality TV platform has caught the community’s attention, inviting users to simulate social interactions with AI characters, as echoed in this announcement.

Troubleshooting & Technical Tips Abound: AI engineers exchanged solutions to AI Town setup issues, with advice on resolving agent communication problems and extracting data from SQLite databases. Recommendations included checking the memory system documentation and adjusting settings within AI Town.


OpenRouter (Alex Atallah) Discord

  • Server Says “No” to Function Calls: Engineers faced a hurdle with OpenRouter, as server responded with status 500 and an error message stating “Function calling is not supported by openrouter,” leaving the problem unresolved in the discussion.
  • 404 Flub: Users identified a flaw where invalid model URLs cause an application error displaying a message instead of a non-existent page (404), indicating an inconsistent user experience based on login status.
  • Payment Fiasco: There was chatter around auto top-up payment rejections that left users unable to top-up manually, suspected to be caused by blocks from user’s banks, specifically WISE EUROPE SA/NV.
  • Model Hunt: Model recommendations were exchanged with “Cat-LLaMA-3-70B” and Midnight-Miqu models highlighted, alongside calls for better fine-tuning strategies over using “random uncleaned data.”
  • Temperamental Wizard LM Service: Users experienced intermittent request failures with Wizard LM 8x22B on OpenRouter, chalked up to temporary surges in request timeouts (408) across multiple providers.

OpenAccess AI Collective (axolotl) Discord

  • Galore Tool Lacks DDP: Engineers highlighted the Galore Layerwise tool’s inability to support Distributed Data Parallel (DDP), pointing out a significant limitation in scaling its use.

  • Training Dilemmas with Large Chinese Datasets: Discussions have focused on fine-tuning 8B models with 1 billion Chinese tokens, with attention drawn to the Multimodal Art Projection (M-A-P) and BAAI datasets, suggesting a trend towards multilingual model training.

  • Llama’s Gradient Growth Issue: There’s a technical challenge observed with the llama 3 8B model, where low-rank fine-tuning causes an unbounded gradient norm increase, indicating a possible problem with weight saturation and gradient updating.

  • GPT-4o’s Token Troubles: Recent feedback on GPT-4o uncovered that its token data includes spam and porn phrases, signaling concerns about the quality and cleanliness of its language processing, especially in Chinese.

  • Commandr Configuration Progresses: There’s ongoing community support and contributions, such as a specific GitHub pull request, towards enhancing the Commandr setup for axolotl, indicating active project iteration and problem-solving.

  • Axolotl Configuration Quandaries: Engineers shared specific use case troubles: one involving illegal memory access errors during continued pre-training due to out-of-vocab padding tokens, and another detailed issues in fine-tuning Mistral 7b, where the model’s learning outcomes were unsatisfactory despite a decrease in loss.

  • Axolotl-Phorm Bot Insights: Key takeaways from the axolotl-phorm bot channel include an exploration into the ORPO format for data structuring, articulations on using weight decay and LoRA Dropout for avoiding overfitting in LLM training, the benefits of gradient accumulation via the Hugging Face Accelerator library, and discussions around implementing sample weights in Axolotl’s loss functions without additional customization.


LangChain AI Discord

Memory Matters for Model Magic: Re-ranking with cross-encoders behind a proxy is discussed, with a focus on OpenAI GPTs and Gemini models. There’s an interest in short-term memory solutions, like a buffer for chatbots to maintain context in conversations.

LangChain Gets a Nudge: Queries about guiding model responses in LangChain led to sharing a PromptTemplate solution, with a reference to a GitHub issue on the topic. Meanwhile, LangChain for Swift developers is available with resources for working on iOS and macOS platforms, as seen in a GitHub repository for LangChain Swift.

SQL Holds the Key: The application of LangChain with SQL data opens the door to summarizing concepts across datasets. The conversation veers toward ways to integrate SQL databases as a memory solution, with a guide found in LangChain’s documentation.

Langmem’s Long-term Memory Mastery: Langmem’s context management capabilities are commended. A YouTube demonstration shows how Langmem effectively switches contexts and maintains long-term memory during conversations, highlighting its utility for complex dialogue tasks (Langmem demonstration).

Fishy Links Flood the Feed: Multiple channels report a spread of questionable $50 Steam gift links (suspicious link), warning members to proceed with caution and suggesting the link is likely deceptive.

Rubik’s Cube of AI: Rubik’s AI promises enhanced research assistance, offering two months of free access to premium features with the promo code RUBIX.

Playing with RAG-Fusion: There’s a tutorial on RAG-Fusion, highlighting its use in AI chatbots for document handling and emphasizing its multi-query capabilities over RAG’s single-query limitation. The tutorial offers engineers insights into using LangChain and GPT-4o, available at LangChain + RAG Fusion + GPT-4o Project.


Cohere Discord

  • Discord Support System Revamp Requested: One member called attention to the need for improvements in the Discord support system, citing unaddressed inquiries. It was noted that the current system functions as a community-supported platform rather than one maintained by official staff.

  • Rate Limit Impacts Trial API Users: Users experiencing 403 errors with the RAG retriever attributed this to hitting rate limits on the Trial API, which is not designed for production use.

  • Inquiring Minds Want Free API Keys: There was discussion about the availability and scope of free API keys from Cohere, clarifying these keys are meant for initial prototyping and come with certain usage restrictions.

  • Camouflage Your Conversations: Assistance was sought for utilizing CommandR+ for translation services, with a helpful nudge towards the Chat API documentation that provides implementation guidance.

  • Showcasing Cohere AI in Action: A new resource entitled “A Complete Guide to Cohere AI” was shared, complete with installation and usage instructions on the Analytics Vidhya platform. An accompanying demo app can be tested at Streamlit.


OpenInterpreter Discord

Hugging Face GPU Bonanza: Hugging Face is donating $10 million in free shared GPU resources to small developers, academics, and startups, leveraging their financial standing and recent investments as outlined in a The Verge article.

OpenInterpreter Tackles Pi 5 and DevOps: OpenInterpreter has been successfully deployed on a Pi 5 using Ubuntu, and a collaboration involving project integration was discussed including potential support with Azure credits. Additionally, a junior full-stack DevOps engineer is seeking community aid to develop a “lite 01” AI assistant module.

Technical Tips and Tricks Abound: Solutions for environment setup issues with OpenInterpreter on different platforms have been shared, with particular discussion focused on WSL, virtual environments, and IDE usage. Further assistance was provided via a GitHub repository for Flutter integration and requests for development help on a device dubbed O1 Lite.

Voice AI’s Robo Twang: Community discussions critique voice AI for its lack of naturalness compared to GPT-4’s textual capabilities, while an idea for voice assistants’ ability to interrupt was highlighted in a YouTube video.

Event and Community Engagement: Notices went out inviting the community to the first Accessibility Round Table and a live stream focused on local development, fostering engagement and knowledge-sharing in live settings.


Mozilla AI Discord

  • Debugging Drama with RAG: An embedded model snag led to a segfault during a RAG tutorial, with the error message “llama_get_logits_ith: invalid logits id 420, reason: no logits”. It was identified that the issue was due to the use of an embeddings-only model, which isn’t capable of generation tasks, a detail possibly overlooked in the Mozilla tutorial.

  • Cloud Choices: GPU-enabled cloud services became a hot topic, with the engineering group giving nods to providers like vast.ai for experimenting and tackling temporary computational loads.

  • SQLite Meets Vectors: Alex Garcia landed in discussion with his sqlite-vec project, a SQLite extension poised for vector search that has sparked interest for integration with Llamafile for enhanced memory and semantic search capabilities.

  • Llamafiles Clarified: A critical clarification unfolded—the Mozilla Llamafile embeddings model linked in their tutorial does not have generation capabilities, a point that needed spotlighting for precise user expectations.

  • Innovations in Model Deployment: There’s a brewing buzz about the strategic deployment of models with Llamafile on various platforms, suggesting that GPU-powered offerings from cloud providers are a focal point of interest for practical experimentation.


MLOps @Chipro Discord

Fine-Tuning Frenzy Fires Up: Engineers are expressing mixed feelings regarding the LLM Fine-Tuning course, with some finding value in its hands-on approach to LLM training, evaluation, and prompt engineering, while others remain skeptical, citing concerns over the quality amidst promotional tactics.

Mixture of Mastery and Mystery in Course Content: Course participants noted variable experiences, with a few describing the introductory material as basic but dependent on the individual’s background; this illustrates the challenge of calibrating content difficulty for diverse expertise levels.

Predictions Wrapped in Intervals: The MAPIE documentation surfaced as a key resource for those looking to implement prediction intervals, and insights were offered on conformal predictions with a nod to Nixtla, suitable for time-series data.

Embeddings Evolve from Inpainting: Comparable to masked language modeling, deriving image embeddings through inpainting techniques was a topic of interest, highlighting a method that estimates unseen image aspects from visible data.

Multi-lingual Entities Enter Evaluation Phase: Strategies for comparing entities across languages, like “University of California” and “Universidad de California,” were discussed, possibly incorporating contrastive learning and language-specific prefixes, with arxiv paper mentioned for further reading.


tinygrad (George Hotz) Discord

  • Seeking Speed for YOLO on Comma: Discussions surfaced around the feasibility and performance metrics of running a YOLO model on a comma device with current reports indicating prediction times around 1000ms.

  • The Trade-Off of Polynomial Precision: An engineer reported using an 11-degree polynomial for sine approximation, yielding an error of 1e-8, while assessing the possibility of a higher degree polynomial to attain the desired 1e-12 error despite concerns about computational efficiency.

  • Concerns Over Logarithmic and Exponential Approximations: The discourse included a focus on the difficulties of maintaining accuracy in polynomial approximations of logarithmic and exponential functions, with suggestions to use range reduction techniques that may help balance precision with complexity.

  • Bitshifting in Tinygrad Pondered: Efficiency in bitshifting within tinygrad prompted inquiries, specifically over whether there’s a more streamlined method than x.e(BinaryOps.DIV, 2 ** 16).e(BinaryOps.MUL, 2 ** 16) for the process.

  • Metal Compiler Mysteries Unveiled:

    • A participant shared a curiosity about the Metal compiler’s decisions to unwrap for loops, indicating variations in the generated code when invoking Tensor.arange(1, 32) versus Tensor.arange(1, 33).
    • A puzzle was presented as to why the number 32 specifically affects compilation behavior in the Metal compiler, underlining the performance consequences of this enigmatic threshold.

Datasette - LLM (@SimonW) Discord

  • Squeak Meets Claude: A discussion emerged about integrating Claude3 with Squeak Smalltalk, signaling interest in combining cutting-edge AI with classic programming environments. Practical application details remain to be hashed out.

  • Voice Modes Get a Makeover: Within GPT-4o, a voice named Sky was replaced by Juniper after concerns arose about resemblance to Scarlett Johansson’s voice. The shift from multi-model to a singular model approach in GPT-4o aims to reduce latency and enhance emotional expression, albeit increasing complexity (Voice Chat FAQ).

  • AI’s Double-Edged Sword: As models like GPT-4o evolve, they face challenges such as potential for prompt injection and unpredictable behaviors, which can be as problematic as legacy systems encountering unexpected commands.

  • The Never-Ending Improvement Loop: Echoing Stainslaw Lem’s “The Upside-Down Evolution,” resilience in AI and other complex systems was discussed, with the understanding that while perfect reliability is a myth, fostering fault-tolerant designs is crucial—even as it leads to new unforeseen issues.


LLM Perf Enthusiasts AI Discord

Legal Eagles Eye GPT-4o: AI Engineers have noted that GPT-4o demonstrates notable advances in complex legal reasoning compared to its predecessors like GPT-4 and GPT-4-Turbo. The improvements and methodologies were shared in a LinkedIn article by Evan Harris.


YAIG (a16z Infra) Discord

  • Docker Devs Wanted for AI Collaboration: A call has been made for contributors on an upcoming article about using Docker for training and deploying AI models. The original poster is seeking assistance in writing, contributing to, or reviewing the article, and invites interested engineers to direct message for collaboration.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (718 messagesđŸ”„đŸ”„đŸ”„):

  • Deepseek doesn’t work yet: Users discussed that Deepseek is not functional due to its different architecture. One pointed out that “it probs doesn’t work,” and another confirmed, “Deepseek won’t work yet.”
  • Handling large datasets on Colab/Kaggle: A user asked if a 6GB dataset could fit with a 5GB Llama3 model on Colab or Kaggle T4. Opinions differed but it was noted that “datasets (hf library) doesn’t load the dataset in the ram”; thus, it’s more of a storage issue than a VRAM limit.
  • JAX TPUs train well despite skepticism: There was a heated debate about using JAX on TPUs, with one user asserting it trains fine on Google TPUs. “You can train on TPU even with torch, but Jax is pretty much what’s used mainly in production,” was one key insight.
  • Effective fine-tuning hacks discussed: Notably, kearm discussed a refined method to “remove guardrails” in Meta-Llama models using “orthogonalized bfloat16 safetensor weights”, and suggested that Llama-3-70B Instruct can now be finetuned effectively and cheaply.
  • Legal concerns and AI fine-tuning: Users pondered the risks of using famous IPs for finetuning models, even as others mentioned ongoing lawsuits, like Scarlet Johansson suing OpenAI. “She may win that,” was a sentiment echoed over legal battles.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (55 messagesđŸ”„đŸ”„):

  • Regex and text formatting in Python: Members discussed techniques for identifying similarly formatted text using Python. Suggestions included using regex (re.findall) and checking with text.isupper() for all caps.

  • Criticism of Sam Altman and OpenAI: Strong opinions were voiced regarding Sam Altman’s leadership and OpenAI’s influence. Comments reflected disdain for Altman’s fear-mongering tactics and the idolization of wealth and power in tech.

  • Excluding OpenAI from licenses: Cognitive Computations is altering licenses to exclude OpenAI and the State of California from using their models and datasets. This move is intended to send a message regarding their opposition to current AI leadership and policies.

  • AI Safety Lobbying in DC: A shared Politico article discussed how AI lobbyists are shifting the debate in Washington from existential risks to business opportunities, with a particular focus on China.

  • Content Recommendations: Members shared links to intriguing content, including a YouTube video on the Bend programming language for GPUs, an Instagram reel, and a YouTube playlist titled “Dachshund Doom and Cryptid Chaos.”

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (454 messagesđŸ”„đŸ”„đŸ”„):

- **Error with torch.float16 for llama3**: Users tried to train llama3 with **torch.float16** but encountered errors suggesting to use bfloat16 instead. They sought solutions but found none that worked.
- **Databricks issues with torch and CUDA**: **Torch** caused errors when running on A100 80GB in **Databricks**. Users discussed potential fixes like **setting the torch parameter to False** or updating software versions, but faced challenges.
- **Uploading and using GGUF models**: **Users faced challenges uploading and running models on Hugging Face without config files**. Solutions involved pulling config files from pretrained models or ensuring the correct format and updates.
- **Eager anticipation for mulit-GPU support**: **Community members expressed eagerness for multi-GPU support** from Unsloth, which is in development but not yet available.
- **Troubleshooting environment setup**: Participants had **difficulty setting up environments with both WSL and native Windows** for Unsloth, specifically with installing dependencies like **Triton**.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (22 messagesđŸ”„):

  • Text2Cypher model finetuned: A member finetuned a Text2Cypher model (a query language for graph databases) using Unsloth. They shared a LinkedIn post praising the ease and the gguf versions produced.
  • New article on sentiment analysis: A member published an extensive article on fine-tuning LLaMA 3 8b for sentiment analysis using Unsloth, with code and guidelines. They shared the article on Medium.
  • Critical data sampling bug in Kolibrify: A significant bug was found in Kolibrify’s data sampling process. A fix that theoretically improves training results will be released next week, and retraining is already ongoing to evaluate effectiveness.
  • Issue in curriculum dataset handling: The curriculum data generator was ineffective due to using datasets.Dataset.from_generator instead of datasets.IterableDataset.from_generator. A member overhauled their pipeline, matched dolphin-mistral-2.6’s performance using only ~20k samples, and plans to publish the model soon.

HuggingFace ▷ #general (853 messagesđŸ”„đŸ”„đŸ”„):

- **Issue with GPTs Agents on MPS Devices**: A member noted that **GPTs agents** can only load bfloat16 models with MPS devices, as bitsandbytes isn't supported on M1 chips. They expressed frustration with MPS being fast but "running in the wrong direction".
- **Member seeks MLflow deployment help**: Someone asked for assistance in deploying custom models via **MLflow**, specifically for a fine-tuned cross encoder model. They did not receive a direct response from other members.
- **Interest in HuggingChat's limitations**: A user inquired why **HuggingChat** doesn't support files and images. No comprehensive answer was provided.
- **Clarifying technical script adjustments**: Multiple users engaged in debugging and modifying a script for sending requests to a vllm endpoint using **aiohttp** and **asyncio**. Key changes and adaptations were discussed, particularly for integrating with OpenAI's API.
- **Concerns about service and model preferences**: An extensive discussion ensued regarding the benefits and downsides of Hugging Face's **Pro accounts**, spaces creation, and the limitations versus preferences for running models like **Llama**. One member expressed dissatisfaction with needing workarounds for explicit content and limitations on tokens in HuggingChat. Another user sought advice on deployment vs. local computation for InstructBLIP.

Links mentioned:


HuggingFace ▷ #today-im-learning (11 messagesđŸ”„):

  • AI Business Advisor Project Shared: A member shared a YouTube video titled “business advisor AI project using langchain and gemini AI startup,” showcasing a project aimed at creating a business advisor using these technologies. It’s a startup idea with practical applications.

  • Installing đŸ€— Transformers Simplified: A user shared the installation guide for transformers, providing instructions for setting up the library with PyTorch, TensorFlow, and Flax. This assists users in installing and configuring đŸ€— Transformers for their deep learning projects.

  • Innovative Blog/Header Details Shared: A member described their new blog/header featuring a Delaunay triangulation with the Game of Life playing on nodes. They highlighted reworking the game rules into fractional counts and mentioned it has “massive rendering overhead” due to rerendering each frame with d3 instead of using GPU optimizations.

  • Invitation to Share Results: In response to the business advisor project video, another member encouraged sharing results or repositories, fostering community collaboration and feedback.

  • AI Vocals Enhancement Guide Announcement: A user briefly mentioned they wrote a guide on making AI vocals sound natural, adding more body and depth to bring them back to life. Further details or links to the guide were not provided.

Links mentioned:


HuggingFace ▷ #cool-finds (18 messagesđŸ”„):

  • Multimodal GPT-4o with LlamaParse: Shared an article on “Unleashing Multimodal Power: GPT-4o Integration with LlamaParse.” Read more here.

  • Tech Praise on YouTube: Claimed to have found perhaps the best tech video ever on YouTube. Watch it here.

  • OpenAI Critique: “OpenAI is not open,” leading to discussions about closed AI systems. Watch the video critiquing big tech AI.

  • RLHF and LLM Evaluations: Shared a helpful discussion about the current state of RLHF and LLM evaluations. Watch the conversation featuring Nathan Lambert.

  • Generative AI in Physics: Introduced a new research technique using generative AI to answer complex questions in physics, potentially aiding in the investigation of novel materials. Read full story.

Links mentioned:


HuggingFace ▷ #i-made-this (20 messagesđŸ”„):

  • Business AI Advisor Project Goes Live: A YouTube video titled Business Advisor AI Project Using Langchain and Gemini showcases a project aimed at creating a business advisor using these technologies. It includes a resume portfolio for AI startup ideas.

  • Study Companion Program with GenAI: A program acting as a powerful study companion using GenAI was shared on LinkedIn. This tool aims to innovate educational experiences.

  • New Model Training Support in SimpleTuner: SimpleTuner has added full ControlNet model training support for SDXL, SD 1.5, and SD 2.1, expanding its capabilities.

  • SDXL Flash Models Roll Out: Two versions of SDXL Flash were introduced, promising faster performance and higher quality in AI models. SDXL Flash Mini was also launched, offering efficiency with minimal quality loss.

  • Tokenizer Innovation Inspired by Andrej Karpathy: A member developed Tokun, a new tokenizer, that reportedly can reduce the size of Llama models by a factor of 10 while enhancing capabilities. Further insights and testing articles were shared on Twitter.

Links mentioned:


HuggingFace ▷ #reading-group (109 messagesđŸ”„đŸ”„):

  • Seek Generative AI resources: A user requested resources to learn about Generative AI and LLMs. Recommendations included research papers like “Attention is All You Need” and courses on HuggingFace.

  • AlphaFold3 Reading Group Session: A user shared a blog post about AlphaFold3 suitable for both biologists and computer scientists. Others suggested making it a topic for the next reading group session.

  • Conditional Story Generation Paper: A meeting was announced to discuss multiple papers, including the conditional story generation framework GROVE (arxiv link), and the Conan benchmark for narrative understanding (arxiv link).

  • Recordings and Resources: Members requested information on accessing recorded sessions. The recordings were shared on YouTube and links to past presentations are available on GitHub.

  • Discussion on Future Presentations: Future topics were discussed, including AlphaFold3 and potentially covering other papers like the KAN paper. Details on how and when these sessions are scheduled were also shared.

Links mentioned:


HuggingFace ▷ #core-announcements (1 messages):

  • Tuxemons replace Pokemons for dataset fun: A member announced a new dataset alternative featuring Tuxemons instead of Pokemons. They mentioned, “The number of the samples is low but the images are all cc-by-sa-3.0 so you get more freedom and less worry in your experiments.” Also, each image comes with two types of captions for added description variety. Explore the dataset.

Link mentioned: diffusers/tuxemon · Datasets at Hugging Face: no description found


HuggingFace ▷ #computer-vision (25 messagesđŸ”„):

  • Divergent Opinions on Model Structure Issues: Members discussed the performance of a Unet model, with a focus on the forward and fit methods. One emphasized potential problems in the model’s structure, leading to convergence issues and almost random guessing despite running successfully.

  • Creating Virtual AI Influencers: A member shared a YouTube video about creating a virtual AI influencer using computer vision and AI tools. The video aims to detail the fascination and burgeoning trend of virtual influencers.

  • Handling Image Data in Parquet Files: Discussions arose on approaches to include images in Parquet files, with issues of image data appearing in byte array format when uploaded to Hugging Face. An alternative solution suggested using the datasets library and provided a GitHub link to guide through creating a dataset from a dictionary with image paths.

  • Clarification on Fully Convolutional Networks: A brief exchange clarified that a fully convolutional network avoids dense layers in detection heads, contrasting models like yolov2 and yolov1. The improvement in yolov2’s performance over yolov1 was noted as a benefit.

  • CenterCrop and Image Augmentation: While discussing a ViT tutorial, a member questioned the utility of CenterCrop when input and output image sizes are equal, suggesting it acts as an identity function. It was clarified that CenterCrop adds noise and serves as image augmentation by resizing after cropping.

Links mentioned:


HuggingFace ▷ #NLP (13 messagesđŸ”„):

  • Connectionist Temporal Classification Relevance: A member inquired whether Connectionist Temporal Classification (CTC) is still in use today. No follow-up or responses were given.

  • Accessing Hugging Face Model Architecture: One member asked how to view the architecture of Hugging Face’s pretrained models. Another member explained that modeling files can be found on GitHub, in the documentation, or by using help(model) and inspecting the configuration files.

  • Categorizing Text Queries Into Commands: A member asked for guidance on converting text queries into discrete commands for applications like translation models and video games. However, no specific models or methods were suggested in the chat.

  • Understanding HTML in LLMs: A user expressed difficulty in understanding and generating HTML code using Large Language Models (LLMs). They were unsure whether HTML should be treated as a separate modality from natural language and sought advice on using different tokenizers effectively.

  • Handling Conversation History in LLM-Based Bots: A user struggled with a bot that couldn’t remember previous exchanges and asked for help. Another user explained that LLMs need manual handling of conversation history, usually by concatenating previous messages with the new prompt.


HuggingFace ▷ #diffusion-discussions (22 messagesđŸ”„):

  • Error with Hugging Face Diffusion Models in Google Colab: A user encountered a ValueError with the provided path while working on Step 7 of the HuggingFace Diffusion Models Course in Google Colab. They were advised to check if they have created the pipeline properly.

  • SDXL Configuration Issues and Example Usage: Another user reported a ValueError related to the time embedding vector length in the SDXL model. The discussion included sharing code snippets and a suggestion to use the Stable Diffusion XL model as documented in the Hugging Face documentation.

  • Guide for Beginners on Solving Diffusers Issues: A beginner asked for guidance on how to start solving issues related to diffusers, and they were advised to study the Fastai course and refer to previously merged good first issue-labeled PRs on Hugging Face’s GitHub.

  • Issue with Discord LLM Chatbot: A user faced a problem with their Discord LLM Chatbot where it didn’t remember conversation history and considered each message as a new conversation. They were advised to post their issue in NLP channels and to use code snippets for maintaining history from LangChain’s documentation.

  • Redirect for Language-Specific Queries: There’s a reminder to keep discussions in English, and a user was redirected to a more appropriate channel for their NLP-related queries. This ensures the content is relevant to the “Diffusion Models” discussion.

Links mentioned:


Perplexity AI ▷ #general (939 messagesđŸ”„đŸ”„đŸ”„):

  • Perplexity struggles with GPT-4o model limitations: Users noted that GPT-4o often repeats previous responses and fails to switch topics effectively during conversations. One user described it as, “I’ve never witnessed any LLM over the last couple of years as a power user literally ignore prompts like this.”.
  • Image uploads feature request: Members expressed desires to upload and analyze videos and images within Perplexity, drawing comparisons to functionalities available on OpenAI’s platforms. Despite attempts, such features are not currently supported.
  • API limit concerns continue: Multiple users are seeking higher rate limits for the Perplexity API, with one stating they’ve been waiting for two weeks for a response and querying if the support team could expedite the increase.
  • Model switching and custom scripts: Discussion highlighted a popular user script that allows dynamic model switching within Perplexity. Users shared links to scripts and tools like Violentmonkey, enhancing the platform’s usability by enabling quick toggling between available AI models.
  • Site downtime causes frustration: Perplexity experienced downtime, frustrating users who rely on the service for their daily tasks. During this period, some users even humorously demanded, “I demand unlimited Opus for this incident”.

Links mentioned:


Perplexity AI ▷ #sharing (12 messagesđŸ”„):

  • Stability AI intrigues users: A member shared a link to explore the capabilities and offerings of Stability AI. The discussion revolves around the potential applications and benefits of the AI technology.
  • Brain benefits of walking: Another member posted about the brain benefits of walking. The shared link aims to detail how “walking” can positively impact cognitive functions and overall mental health.
  • What is WASP-193b?: A discussion started with a link exploring the exoplanet WASP-193b. The content seems focused on astronomical findings and characteristics of this celestial body.
  • Analyzing dog symptoms: There was a query about a dog showing unusual symptoms like imbalance and constant neck movement, linked by this search. The discussion likely involves veterinary insights or possible diagnoses.
  • Entertaining kids with Dungeons & Dragons: A parent shared a link to generate Dungeons & Dragons scenarios for entertaining their kids. The focus is on making the fantasy game engaging and fun for children.

Perplexity AI ▷ #pplx-api (19 messagesđŸ”„):

  • Clarification on Perplexity API usage: Instructions were shared for generating a model’s response using the Perplexity API. A user highlighted that the default temperature value is 0.2.
  • Understanding OpenAI Chief Scientist Query: Members discussed the challenge of querying about OpenAI’s current chief scientist. The suggestion was made that the model should be able to handle the chronology and provide the correct answer, Jakub Pachocki.
  • API Performance Testing: Users noted that different models have varying success rates with similar queries, with Omni performing well. There was a reluctance to use labs.perplexity.ai for testing API performance.
  • Rate Limits for API usage: There was a discussion to clarify the request rate limits for the API, noting a discrepancy between the request limit (20/minute) and token limit (2,000,000/minute), with speculation on future model capacities.
  • Threads Feature in API: A question was raised about the threads feature, which is prominent on the web but seemingly absent in the API. It was clarified that the closest feature is adding more role/content from previous messages.

Link mentioned: Chat Completions: no description found


OpenAI ▷ #annnouncements (1 messages):

  • Pausing Sky voice in ChatGPT: OpenAI announced pausing the use of the Sky voice in ChatGPT while addressing user concerns. They shared a link to explain how the voices were chosen.

OpenAI ▷ #ai-discussions (347 messagesđŸ”„đŸ”„):

  • LangChain works without OpenAI API: Members discussed using LangChain with various LLMs, including locally with tools like Ollama. One user confirmed that “you can use langchain with every llm you want.”
  • Confusion over GPT-4o availability: There were mixed experiences with accessing GPT-4o; some users reported missing features despite having access. It was clarified that GPT-4o is in rollout and that all features will come soon.
  • Video and real-time processing capabilities of ChatGPT-4o: Discussions revolved around how ChatGPT-4o processes video frames at 2-4 fps and its capabilities in real-time adjustments. Members debated whether the model could adjust responses mid-stream based on new data inputs.
  • Usage caps cause limitations in GPT-4o: A member expressed frustration with the current usage caps within the ChatGPT app, arguing they make many potential applications impractical. Others pointed out that usage caps are balanced to ensure a consistent experience for all users.
  • GPT-4o’s multimodal capabilities praised: Despite criticisms, GPT-4o’s multimodal capabilities were lauded, with one user emphasizing it integrates audio, video, and text processing simultaneously. Members also referenced that this model opens up new possibilities beyond traditional text-based models.

Pricing and file upload FAQ links were shared for additional details.


OpenAI ▷ #gpt-4-discussions (167 messagesđŸ”„đŸ”„):

  • Context window confusion and limits in GPT-4o: A user clarified that a “context window of 128k tokens” refers to the entire input the AI can process. Numerous participants expressed frustration over limitations and errors when using significant token amounts, alongside comparisons to Gemini’s larger context window capabilities.
  • Custom GPTs and model switching: Questions about custom GPTs using GPT-4o were addressed, revealing that as of now, it’s not possible to switch models easily. Additionally, members shared that some custom GPTs had already transitioned to GPT-4o.
  • GPT-4o availability and rollout: Many users expressed confusion and frustration about accessing GPT-4o, especially on iOS and free accounts. It was explained that the rollout is phased, and users not currently seeing the option will gain access over time.
  • User frustrations with model rate limits and performance: Discussions about the differences between GPT-4o and regular GPT-4 included shared experiences of differential performance and rate limits. It’s noted that GPT-4o appears faster, yet some users find regular GPT-4 answers better structured.
  • Future features and voice/chat capabilities rollout: Users speculated on the rollout timeline for GPT-4o’s new features like vision and voice capabilities, with official responses indicating a phased implementation for Plus users over the coming months.

OpenAI ▷ #prompt-engineering (178 messagesđŸ”„đŸ”„):

  • ChatGPT struggles with self-awareness and prompt clarity: Members shared challenges when asking ChatGPT about itself or refining prompts to get specific corrections. One member noted, “The model will take that as an instruction and try to find the best answer for you.”
  • Fine-tuning and JSON mode for better results: Various members discussed fine-tuning GPT-4 and using JSON mode to enhance prompt quality. OpenAI’s documentation on JSON mode was shared to aid this process.
  • Complex prompt strategies for creativity and precision: Detailed and highly structured prompts like the “Humanizer” and “Pandomain Prompt Voyager” were shared to refine and improve the model’s creative and structured content generation.
  • Coding and technical integration with GPT-4: Members discussed problems with incomplete responses in chat.completion.create and prompting strategies for creating UIs with GPT-4. One member shared specific experiences with plug-ins for Visual Studio Code and troubleshooting steps.
  • Using model behaviors and examples to guide responses: Techniques for ensuring precise model behavior, such as setting character limits and adjusting response accuracy, were explored. Specific challenges and solutions were shared to help models provide more concrete and actionable responses.

OpenAI ▷ #api-discussions (178 messagesđŸ”„đŸ”„):

<ul>
  <li><strong>ChatGPT struggles to refine prompts effectively:</strong> Users shared frustrations with <strong>4o's</strong> inability to follow up on prompt corrections or effectively revise rough drafts. One member noted, "it re-writes its original response instead of telling me how to fix my prompt."</li>
  
  <li><strong>Frustrations with incomplete responses:</strong> Users like cicada.exe report experiencing incomplete responses from <code>chat.completion.create</code> despite not exceeding token limits. The issue persists with outputs being abruptly cut off.</li>
  
  <li><strong>Implementing JSON mode:</strong> Ashthescholar advises razeblox to use <a href="https://platform.openai.com/docs/guides/text-generation/json-mode">JSON mode</a> in the API to address response issues, especially regarding format and content control.</li>
  
  <li><strong>Creative writing prompts outperform on GPT-4 compared to 4o:</strong> Users shared that while 4o excels at some creative tasks, it struggles with refining drafts. "4o seems pretty good when given a blank check for creative writing, but if presented with a rough draft to improve, it most often in my experience just regurgitated the rough draft rather than change it," noted keller._.</li>
  
  <li><strong>Innovative approach to creative synthesis:</strong> Stunspot shares a prompt, "Orchestrating Innovation on the Fringes of Chaos," that emphasizes exploring ideas through network dynamics, fractal exploration, adaptive innovation, and resilience to foster breakthroughs.</li>
</ul>

LM Studio ▷ #💬-general (537 messagesđŸ”„đŸ”„đŸ”„):

<ul>
    <li><strong>GPTs Agents cannot learn after initial training</strong>: A member asked about the ability to store conversations locally for context searching, to which another member clarified this is not currently possible in LM Studio. They suggested copying and pasting texts but noted that "You can't upload and chat with docs."</li>
    <li><strong>Handling "Unsupported Architecture" Error</strong>: Various members discussed issues with loading GPT-Sw3 in LM Studio due to "Unsupported Architecture." The consensus was that only GGUF files are supported, and users recommended downloading within the app with 'compatibility guess' enabled.</li>
    <li><strong>Running LM Studio on Limited VRAM Systems</strong>: Users inquired about running LLM models on systems with limited VRAM like 6-8GB. Members suggested using smaller models and quantized versions like Q5_K_M for better performance.</li>
    <li><strong>Offline Usage Issues</strong>: A user reported problems with LM Studio not functioning offline. After community suggestions, it was clarified that loading models and then disabling the network should work, but further detailed bug reports were recommended.</li>
    <li><strong>General Troubleshooting and Setup Questions</strong>: Users frequently asked about issues like setting up servers, model compatibility, and performance on lower-spec systems. Many were directed to create detailed posts in a specific channel (<a href="https://discord.com/channels/1111440136287297637">#1139405564586229810</a>) for further assistance.</li>
</ul>

Links mentioned:


LM Studio ▷ #đŸ€–-models-discussion-chat (82 messagesđŸ”„đŸ”„):

  • Medical LLM Recommendation: A member inquired about medical LLMs, and another suggested trying OpenBioLLM-Llama3-8B-GGUF, noting its 8.03B parameters and Lama architecture. The recommender also shared additional resources like spaces using this model.

  • SVG and ASCII Art Benchmarks: A member shared benchmarking results for LLMs generating SVG art, noting WizardLM2 as the current winner and comparing it to GPT-4 o. Another member asked about ASCII art capabilities, revealing GPT-4 o performs well for ASCII.

  • Embedding Models for German: There was a discussion about difficulties in finding suitable embedding models for the German language using LM Studio. Members suggested trying to manually convert models using tools like llama.cpp and provided a specific multilingual model for potential conversion.

  • Generating Text-based Art: One user noted using the MPT-7b-WizardLM model for generating uncensored stories, and another asked about configuration and prompt settings. The model’s creator advised using specific quants and proper templates to avoid issues.

  • Image Quality Concerns: A brief discussion on image generation quality suggested using tools like automatic1111 and ComfyUI for better control and improved outcomes. The conversation recommended obtaining high-quality models from Civit.ai, albeit with a caution about NSFW content.


LM Studio ▷ #announcements (7 messages):

  • Hugging Face and LM Studio Integration: The team has introduced a new “HF -> LM Studio deeplink” feature allowing users to browse Hugging Face, find an interesting model, and click “Use this model” to import it into LM Studio. This feature requires LM Studio 0.2.23 or newer, and focuses on local AI usage with no cloud dependencies.

  • Manual Download Choices in v1: In the current version of the feature, users need to manually choose which file they want to download when importing a model from Hugging Face.

  • Suggestions for Auto-download: Users suggested improvements including setting a default quantization level for automatic downloads and configuring the feature to download the best fitting model based on available RAM.

  • Positive User Feedback: The community responded positively, with one member stating they had been looking for such a button and found its inclusion beneficial.

Link mentioned: Tweet from LM Studio (@LMStudioAI): 1. Browse HF 2. This model looks interesting 3. Use it in LM Studio đŸ‘ŸđŸ€— Quoting clem đŸ€— (@ClementDelangue) No cloud, no cost, no data sent to anyone, no problem. Welcome to local AI on Hugging Fa



LM Studio ▷ #🧠-feedback (10 messagesđŸ”„):

  • Comodo flags llama.cpp binaries: A member noted that Comodo antivirus was triggered by llama.cpp binaries. Another explained that this could be due to the binaries being unsigned, which can cause strict antivirus software to flag them.

  • Model loading error troubleshooting: A user shared a JSON error message when attempting to load a model in LM Studio. The error indicated a failure in model operation despite sufficient RAM and VRAM, suggesting they try a different model or configuration.

  • AVX support clarification: One member questioned why AVX isn’t supported in LM Studio. The response mentioned that supporting less older hardware results in fewer bugs and issues to manage.

  • Disk space bug crashes downloads: A member reported that running out of disk space while downloading a model crashes the program and resets the queue, making it unclear which models were not fully downloaded.

  • Server start issues: Another member shared logs indicating the server fails to start despite the verbose server logs being enabled.


LM Studio ▷ #📝-prompts-discussion-chat (32 messagesđŸ”„):

  • Members Dissect LLama3 Template Conversion: A user struggled with converting a prompt template to LLama3, querying how to adapt their existing format. Another member proposed a detailed template to include historical conversation for context, stressing that “the client side is keeping the state, not the server side.”

  • LangChain Memory Utilized for Chat History: The discussion revealed the user’s reliance on ConversationBufferWindowMemory from LangChain to manage chat history and user input. After receiving advice and suggestions on structuring prompt templates, the user confirmed, “Yes, it works, going experiment more, thanks!”

  • Gemini’s Context Caching Mentioned: In response to handling conversation history, an alternative was suggested: “new services like Gemini’s context caching,” although the user expressed a preference for open-source solutions over paid ones.

  • Avoid Cut-Offs with System Prompt Adjustments: Another user suggested adding “Do not prematurely cut off a response” to the system prompt to avoid incomplete responses, contributing a practical tip for ongoing prompt discussions.


LM Studio ▷ #🎛-hardware-discussion (93 messagesđŸ”„đŸ”„):

  • Fiddling with Alder Lake and Quantization: A member discusses the performance differences when disabling e-cores on an Alder Lake CPU, noting a jump from 0.4 tokens/sec to 0.6 tokens/sec for Q8 quants. They also encountered incoherent results with IQ3 quantization and are considering performing their own quantization.

  • Tesla P100 Disappoints: There’s a discussion comparing various GPUs, with a note that the Tesla P100 with 700+ GB/s memory bandwidth struggles to beat even the GTX 1060. Despite its specs, it fails to outperform older models like the K80 in some tasks.

  • Beating Apple’s Storage Prices: A member bypassed Apple’s expensive SSD prices by opting for an external 4TB M.2 SSD in a Thunderbolt case, achieving transfer speeds over 2GB/second.

  • Multi-GPU Setups: Cost vs Performance: There’s an in-depth discussion on the practical benefits of multi-GPU setups, with some anecdotal evidence suggesting diminishing returns beyond two GPUs due to issues like PCIe bandwidth limitations.

  • RAM Speed Impact on LLM Performance: A detailed set of tests shows increased RAM speeds improve LLM performance, though the effect varies per model and quantization method. For instance, upgrading from 2133MHz to 3200MHz RAM can increase token output speeds significantly but performance variance exists.

Links mentioned:


LM Studio ▷ #đŸ§Ș-beta-releases-chat (1 messages):

  • Chat moved to new channel!: A user mentioned they have moved the chat to a new channel. The link provided directs members to the new discussion location on Discord here.

LM Studio ▷ #autogen (12 messagesđŸ”„):

  • LM Studio autogen bug produces brief responses: Users report encountering an issue where LM Studio responds with only 1-2 words followed by a TERMINATE message. One user indicated this is due to a bug that has been scheduled for fixing.
  • Autogen issues linked to max_tokens setting: The problem appears related to the max_tokens property being set to null. Setting this property to -1 fixes the issue, according to multiple users.
  • LM Studio’s OpenAI emulation is off-spec: Users suggest that LM Studio’s local server does not fully comply with OpenAI specifications, specifically regarding the max_tokens parameter. This incorrect handling leads to premature termination of responses.
  • CLI LMStudio Client workaround: A user building a CLI LMStudio Client confirms that setting max_tokens to -1 resolves the cut-off issue. Manual adjustments may be needed for tools like AutoGPT to handle tool_calls properly.

LM Studio ▷ #amd-rocm-tech-preview (21 messagesđŸ”„):

  • 6600XT workaround for LM Studio: Members discussed the AMD 6600XT card and its compatibility with LM Studio using OpenCL for GPU offload. One member confirmed, “OpenCL is supported. It’s how Intel and non ROCM AMD cards are able to do GPU offload.”

  • Call for Linux users testing ROCm: A user made a call for Linux users with new-ish AMD GPUs to test an early version of LM Studio integrated with ROCm. Interested members, including those with 6900XT and 6600XT, responded positively despite some GPUs not being officially listed. View the supported list here.

  • ROCm with different Linux distributions: Members reported running ROCm on various Linux distributions like Arch Linux, Ubuntu 22.04 LTS, and Fedora 40 with different AMD GPUs. One user confirmed, “ROCm 6.1 works with 6900xt on arch linux, at least official torch nightly built.”

  • Reunion in the Discord: The conversation included a light-hearted moment where two users recognized each other in the Discord. One replied, “Yeah mostly lurking here though :).”


Stability.ai (Stable Diffusion) ▷ #general-chat (664 messagesđŸ”„đŸ”„đŸ”„):

  • Combining and Using Multiple LoRAs in Prompts: Users discussed how to combine multiple LoRAs in prompts for Stable Diffusion using syntax like <lora:pack1:1><Lora:pack2:1>. One user confirmed that adding more than three may lead to issues.
  • Persistent Issues with Stable Diffusion on First Run: A new user encountered an error when running Stable Diffusion for the first time, pointing out a ‘NoneType’ object attribute issue. They sought help from the community but no definitive solution was provided.
  • Lively Debate on SD3 Release and Preparations: There were ongoing discussions and some skepticism about the release of SD3. However, others reassured that it will release eventually, highlighting a tweet from Emad Mostaque confirming ongoing efforts.
  • Topaz as a Video Upscaling Solution: Users debated the effectiveness of Topaz for video upscaling. While it was agreed to be a strong tool, concerns about its cost and alternatives like ComfyUI were also raised.
  • SDXL Model and ControlNet Usage Tips: A user shared insights on the importance of VRAM for running SDXL models, mentioning that higher resolutions demand more memory. Another user clarified that SDXL models need separate ControlNet models compared to SD1.5.

Links mentioned:


Modular (Mojo đŸ”„) ▷ #general (74 messagesđŸ”„đŸ”„):

  • Mojo on Windows Woes: Multiple users discussed the lack of direct support for Mojo on Windows, with specific mentions of issues when using CMD or Powershell. Officially, Mojo SDK is available for Ubuntu and macOS, with future support for Windows anticipated through WSL for now.

  • Mojo vs. Bend Programming Debate: Members compared the Mojo and Bend programming languages, with detailed insights from Chris Lattner stating that Bend isn’t performance-focused and lacks some key functionalities. Bend’s current performance on a single core is compared to CPython, unlike Mojo which targets high performance even on a single CPU.

  • Community Engagement and Resources: Excitement was shared around upcoming open community meetings, with links to meeting details and Zoom meetings provided. Recordings of sessions were promised to be shared.

  • Fun with Mojo Syntax: Users played with the idea of creating whimsical Mojo code using emojis and backticks, sharing sample code snippets for fun. This culminated in humorous exchanges, underscoring the community’s engaged and playful spirit.

Links mentioned:


Modular (Mojo đŸ”„) ▷ #đŸ’Źïž±twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1791535613411570039


Modular (Mojo đŸ”„) ▷ #ai (1 messages):

  • Llama3 implemented from scratch: A member shared an interesting link to a GitHub repository featuring the implementation of Llama3. The repository is described as building Llama3 “one matrix multiplication at a time”.

Link mentioned: GitHub - naklecha/llama3-from-scratch: llama3 implementation one matrix multiplication at a time: llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch


Modular (Mojo đŸ”„) ▷ #tech-news (4 messages):

  • HVM uses implicitly parallel program model: In response to how the HVM achieves automatic parallelization, it was clarified that HVM runs an already-parallel algorithm in functional form (e.g., Haskell). Despite its cool concept, its performance is slower than CPython for CPU and less than Mojo for GPU as noted in a Hacker News discussion.
  • Excitement about Mojo’s GPU and accelerator support: Following the Bend announcement, there is excitement about how Mojo will support GPUs and accelerators.
  • Shared-memory IPC for speed-critical programs: Programs crucial for speed already use shared-memory IPC, which minimizes latency significantly. The potential for DPDK and SPDK to become more widely used due to their performance was discussed, with hopes for improved usability and integration with Mojo.
  • Old hardware and MMU dependencies: Important software often gets slower under certain execution models, necessitating the continued use of MMUs until old hardware can be retired. Concerns were brought up about old hardware using DMA outside allowed regions and 64-bit pointer limitations in CXL devices, suggesting that the tail of old hardware will persist.
  • Prospects for io_uring-like APIs: Future advancements may come from io_uring-like APIs, which utilize syscalls as control path mechanisms and shared-memory for communication with the kernel. This could eliminate most overhead, focusing on improved APIs as seen in Jens Axboe’s work.

Link mentioned: no title found: no description found


Modular (Mojo đŸ”„) ▷ #đŸ”„mojo (397 messagesđŸ”„đŸ”„):

  • Implementing __iter__ and __contains__ for Tuple sparks debates: A member worked on implementing __iter__ and __contains__ for Tuple and faced issues over tuple_mutability and tuple_lifetime. This led to discussions on the practicality and design choices of using Tuple for iterables in Mojo, referencing related GitHub issues like issue #2658.
  • Exploring Collection and Pointer Operations: A lively discussion on the proper use of various collection types and operations, including ListLiteral, Tuple, i1, and SIMD. Debate about the role of rebind and defining MLIR types was prominent.
  • Feature requests and enhancements, including Unicode and allocations in unit tests: Members suggested features like assert max allocations in unit tests and better Unicode support, asking about timelines and feasibility for community contribution.
  • Parallelism using thread safety and coroutine models: Members engaged in a deep dive into Mojo’s approach to thread safety and parallelism, debating between OpenMP-like syntax and Rust’s async model.
  • Mojo’s Tensor Implementation Strategy: Chris Lattner clarified that the Mojo standard library would not include definitive tensor implementations, ensuring flexibility for developers. There was consensus on the need for a unified tensor trait while maintaining modular implementation approaches.

Links mentioned:


Modular (Mojo đŸ”„) ▷ #performance-and-benchmarks (31 messagesđŸ”„):

  • Rust and Go share memory allocation techniques: Discussion reveals Rust’s Vec and Go’s slices both double the capacity when appending elements until a certain threshold, then Go increments by 25%. Relevant links include Rust’s raw_vec.rs and Go’s runtime slice.

  • Optimization insights for list capacities in Mojo: Tuning the list initialization capacity in Mojo (e.g., List[Int](capacity=N+50)) yielded 2x performance improvement compared to default settings. Clattner confirmed a forthcoming patch addressing def’s input argument copying will further enhance performance.

  • Discussion on SIMD gather/scatter optimizations: Members debated the effectiveness of masked gather and scatter instructions in Mojo, particularly on different architectures like x86 with AVX512 and ARM SVE. While users shared mixed results, one voiced that recalculating values might sometimes be more beneficial than using a lookup table due to potential memory wall issues.

  • Potential new List method suggestion for optimization: A member suggested adding a method like Rust’s Vec::shrink_to_fit() to Mojo’s List to optimize allocated space, sharing a simple implementation they used in MoString.

  • Community praises Clattner’s detailed insights: A member expressed gratitude towards Chris Lattner for sharing deep technical insights on Mojo’s internal workings, which significantly helped them understand and optimize their code better.

Links mentioned:


Modular (Mojo đŸ”„) ▷ #đŸ“°ïž±newsletter (1 messages):

Zapier: Modverse Weekly - Issue 34 https://www.modular.com/newsletters/modverse-weekly-34


Modular (Mojo đŸ”„) ▷ #🏎engine (1 messages):

ModularBot: Congrats <@891492812447698976>, you just advanced to level 3!


Modular (Mojo đŸ”„) ▷ #nightly (114 messagesđŸ”„đŸ”„):

  • Tackling PR DCO Check Failures: Members discussed issues with syncing forks with nightly branch and provided a detailed step-by-step guide to avoid inflated commit numbers and DCO check failures. Suggestions like making nightly the default branch came up as potential fixes.

  • Handling Nightly and Stable Releases: Clarifications on the process for transitioning from nightly to stable releases were shared. Aimed at project preparation, it was explained that stable versions are usually cut days before the official release and that the public release dates are not fixed.

  • Struggles with Segfaults and Bugs: A user reported issues about segfaults when playing with custom array types in certain conditions. Follow-up interactions aimed to debug and isolate the problem, with suggestions to use built-in types and discussing lifecycle management for complex types.

  • Flaky Tests and Ongoing Fixes: Gab Peace highlighted ongoing problems with flaky CI tests related to List.index(), along with potential fixes. He emphasized the impact of these bugs on ongoing work, like SSO and assertions in unit tests.

  • Alias Issues Leading to Segfaults: Members reported and discussed various bugs related to aliasing and materializing types, noting significant issues with existing implementations and outlining how these bugs block current work.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #general (242 messagesđŸ”„đŸ”„):

  • Workshop Announcements and Excitement: Members shared updates about new workshops, like the one with Jeremy Howard titled Build Applications in Python scheduled for June 6, 2024. Another member expressed excitement about the course’s ongoing success and quality content.

  • Credits and Resources Discussion: There were multiple inquiries and confirmations about obtaining various credits for services such as Modal Labs, Replicate, Jarvis Labs, and LangSmith. One member confirmed having received extra Jarvis Labs credits worth $200.

  • PDF Parsing for RAG Apps: Members discussed the best tools for parsing tabular data from PDFs, suggesting tools like LlamaParse, Marker by Vik Paruchuri, and integrating models like GPT-4o for complex document extraction. Another member recommended experimenting with UniTable for PDF data extraction.

  • Hosting LLMs and Serving APIs: Inquiries and suggestions were made regarding serving fine-tuned LLMs as custom APIs using frameworks like FastAPI, Streamlit, and Modal. Modal Labs’ example repositories were shared for quick start guides.

  • RAG Optimization Workshop: Announced a new workshop by Jason Liu on refining RAG models and invited members to fill out a survey to tailor the content. Some members expressed their interest cautiously, given the stated prerequisites.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (168 messagesđŸ”„đŸ”„):

  • LLMs aren’t databases, they’re pattern-bases: One member discussed the misconception that LLMs can simply learn domain knowledge from fine-tuning, emphasizing that LLMs learn patterns, not one-time facts, and suggested using Retrieval-Augmented Generation (RAG) instead.
  • Fine-tuning for vehicle failure prediction: Using vehicle diagnostic data to predict failure types for replaced parts was proposed as a viable fine-tuning use case due to the domain-specific nature of the input and output.
  • Modal command issues resolved: Several members discussed errors encountered while using the modal command for training, eventually resolving issues by deleting previously created volumes.
  • Homework assignment responses on LLM use-cases: Various members submitted extensive use cases for fine-tuning LLMs including spell-checking, AI art critique, market research bots, coding model enhancements, semantic enhancements for medical and legal terminology, creative writing aids, and more.
  • Dan Biderman on LoRa configurations: A tweet discussed the nuances of using LoRa for continued pretraining, emphasizing optimal parameters and techniques to avoid poor performance and information loss, suggesting specific configurations for better results.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #asia-tz (47 messagesđŸ”„):

  • Members Introduce Themselves from Across Asia: Individuals from diverse locations like South Korea, India, Japan, Singapore, Australia, and more joined the channel and greeted each other. Users like rparcus, vishnu9158, .thebigpanda, and others shared their excitement about the course and their locations.

  • Praise and Discussion About GPU Providers: A member expressed admiration for Jarvislabs, mentioning it as their go-to GPU provider before getting a personal GPU. Vishnu9158 appreciated the sentiment and hoped they would need more resources in the future.

  • Preferences for Recordings Over Live Streams: Members like rparcus and pugio shared a preference for watching recorded sessions of the course due to inconvenient live stream timings. Vishnu9158 mentioned the drawback of missing networking opportunities by not attending live streams.

  • Homework Discussions: ivanleomk shared his attempts at the week’s homework, listing use cases like Style Transfer, Classification, Extraction, and Confidence Scores for extraction. hamelh advised not to fine-tune unless absolutely necessary and suggested making progress with off-the-shelf models first.

  • Singapore Dominates the Discussion: Multiple members from Singapore, including iggyal, healthymonkey, illued, codessl, huikang, and others, highlighted a significant Singaporean presence in the channel. This prompted ivanleomk to comment on the large number of participants from Singapore.

Link mentioned: shisa-ai/shisa-v1-llama3-70b · Hugging Face: no description found


LLM Finetuning (Hamel + Dan) ▷ #đŸŸ©-modal (54 messagesđŸ”„):

  • Modal sponsors course and shares getting started guide: Charles from Modal announced the sponsorship and shared links to the getting started guide and a hello world example for running code in the cloud without infrastructure setup.

  • Modal account and credits discussion: Members discussed creating accounts via GitHub, editing email settings, and the $500 credits which take some time to appear due to manual approval. Detailed instructions to sign up and claim credits were shared by Charles repeatedly.

  • Exploring Modal feature queries: Members asked about persistent Python context for code interpreters and hosting strategies while developing. Charles and others provided detailed responses and linked relevant documentation and examples, such as embedding Wikipedia with Modal.

  • Problems with onboarding and usage optimization: Several users reported confusion about credit displays and issues with container spin-up times during inference. Solutions and clarifications, including recommendations to use modal serve and examples like TensorRT-LLM serving, were provided.

  • Community engagement and support instructions: Regular thanks and engagement from users about the credits and support structure, with Charles encouraging the use of modal serve for development and linking users to the Modal Slack for further inquiries.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (12 messagesđŸ”„):

  • Aggregation of Learning Resource Links: Members shared a significant compilation of useful links related to LLM function calling, DSPY, Hamel’s blogs on LLM evaluations and tokenizers, the RAFT paper, the Latent Space podcast with Jeremy, among others. Notable highlights include links to Hamel’s blogs on finetuning and prompting here and here, as well as a GitHub project on Intern-VL.

  • Naming Recommendations: Discussions emphasized the preference for naming a channel learning-resources instead of just resources. Members also highlighted the importance of enforcing the hiding of link previews to maintain better organization in the channel.

  • GitHub Repository Proposal: A suggestion was made to create a GitHub repository for collaborative effort in managing and structuring the shared learning resources, which received positive feedback. This could provide more structured and easily accessible information over time.

  • Instruction Tuning with LoRA/QLoRA: A shared tweet included detailed findings from instruction tuning experiments with LoRA/QLoRA, focusing on rank settings, the impact of dropout, layer-specific LoRA adapters, learning rate schedules, weight decay, and batch sizes. The findings stressed the importance of proper configurations to prevent overfitting and ensure stable training, particularly on GPUs like the 3090.

  • Stanford CS25 Video Resource: A useful video link on Retrieval Augmented Language Modeling from Stanford CS25 was shared, giving access to more advanced conceptual discussions in the field. The video can be found here.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jarvis (40 messagesđŸ”„):

  • Jarvis Credits Coordination Updates: Multiple members inquired about receiving credits for JarvisLabs after signing up. The team confirmed they are coordinating this effort and it might take a week or so to get everyone up and running. “We will add the credits, once we get the list” was a recurring reassurance provided.

  • Technical Issues and Support: Some users experienced issues signing up for JarvisLabs, including OTP problems for phone verification and using different emails for course and GitHub sign-up. The team provided targeted support, such as disabling phone verification for affected countries and asking users to wait for credit allocations.

  • Credits Confirmation and Issues: A member confirmed receiving the Jarvis credits without issues due to their email setup. Another user was assured they don’t need to re-register despite using different emails for the course and GitHub sign-up if they have filled out the required forms.

  • Proactive Coordination and Communication: The team encouraged users to ensure their course and sign-up emails match for seamless credit allocation and communicated that they are actively addressing various concerns. Members were directed to stay updated and patient as the coordination was ongoing.


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (18 messagesđŸ”„):

  • Coordinating HF Credits Verification: An inquiry was made about the requirements for HF credits. A member clarified that another member would help coordinate this and verify student enrollments behind the scenes.

  • Expect Delays for HF Credits: Setting realistic expectations, a member mentioned that the process for HF credits “might take a week or so.” This helped manage the anticipation among the group.

  • Open-Access LLM Preferences Shared: The community enthusiastically participated in the “Question of the weekend” about which open-access LLM they would choose to be. Popular choices included BERT, Mistral 7B, Phi-3-mini, and Falcon 180B.


LLM Finetuning (Hamel + Dan) ▷ #replicate (8 messagesđŸ”„):

  • Replicate credits setup channel initiated: This channel aims to assist users with setting up Replicate credits and troubleshooting any related issues. Members are being guided on signing up with their accounts and given instructions on resolving registration issues, especially concerning different emails used for GitHub and course signups.
  • Members inquire about registration email mismatches: Several members, including self.1, filippob82, and 0xai, expressed concerns about using different emails for GitHub accounts and the LLM finetuning course registration. The team acknowledged these concerns and promised to sort them out soon, keeping members updated in this channel.

LLM Finetuning (Hamel + Dan) ▷ #langsmith (16 messagesđŸ”„):

  • LangSmith credits distribution news: A moderator announced that they will coordinate LangSmith credits distribution. They also repeatedly assured users that detailed instructions about credits will be provided soon.

  • Members eager for LangSmith credits: Several members, including @filippob82, @codeanand, and @beardaintweird, confirmed that they created LangSmith accounts with the necessary email addresses and are waiting for further instructions about receiving credits. @hugobowne and @613249007015165973 committed to providing more information soon.

  • Excitement and motivation for LangSmith course: Members expressed excitement about the new course on LangSmith. Users like @anakin.xyz and @harpreetsahota appreciated the shoutouts and mentioned that they now have the motivation to test LangSmith.

  • Repeated inquiries about credits: Despite initial announcements, multiple users kept inquiring about the status of their LangSmith credits, seeking confirmation and further steps. @hugobowne directed users to check previous messages for updates and reassured them of upcoming details.


Nous Research AI ▷ #ctx-length-research (1 messages):

  • Seeking better benchmarks: A member suggested that “a modern LAMBADA for up to 2M” is needed for evaluating models capable of processing overlapping chunks independently. The current benchmarks do not seem sufficient for these advanced capabilities.

Nous Research AI ▷ #off-topic (13 messagesđŸ”„):

  • Sam Altman’s controversial tweet stirs reactions: A member shared a highly provocative tweet by Sam Altman discussing departures at OpenAI due to safety concerns. The tweet escalates quickly with an emotive sign-off, causing confusion and laughter among members. Source.

  • Venture capitalists under scrutiny: Following the controversial tweet, a member suggested viewing venture capitalists realistically rather than as world-saving entities. This reflects a sentiment of skepticism regarding their motives in the AI space.

  • Impact of AI on recession questioned: A member raised a point about companies laying off employees despite significant investments in AI. The conversation hints at the complexity of economic factors behind layoffs, suggesting it’s not just financial constraints causing the recession.

  • Runpod hackathon query: One member asked if anyone was attending the Runpod hackathon, indicating community interest in collaborative AI development events.

  • Choosing between Airflow and Temporal.io: A member sought experiences with Airflow or Temporal.io for workflow management and concluded with a preference for Temporal.io. This suggests ongoing discussions on tools for improving machine learning processes.

Link mentioned: Tweet from Sam Altman (Parody) (@SamAltsMan): Well, what a shock. Jan and Ilya left OpenAI because they think I’m not prioritizing safety enough. How original. Now I have to write some long, bs post about how much I care. But honestly, who n



Nous Research AI ▷ #interesting-links (4 messages):

  • Try out Moondream WebGPU on Hugging Face: A member shared a link to Xenova’s experimental Moondream WebGPU space on Hugging Face, inviting others to explore this experimental project.

  • Hierarchical Memory Transformer for LLMs: A new paper on arXiv introduces the Hierarchical Memory Transformer (HMT), which seeks to improve long-context processing in LLMs by imitating human memory hierarchy. The model uses memory-augmented segment-level recurrence to organize its memory hierarchy.

  • Haystack Demo for Fine Web: The Haystack demo allows users to explore 100k web pages from the Fine Web dataset with local inference and embedding search. The demo includes performance metrics and decompression times for better query speed judgment.

Links mentioned:


Nous Research AI ▷ #general (315 messagesđŸ”„đŸ”„):

  • Nous Hermes 2 Mixtral uniquely triggers actions: A user praised Nous Hermes 2 Mixtral as the only open-source large language model (LLM) capable of triggering actions and using tools within an agent framework like CrewAI. Another user questioned why it’s the only one with such functionality.
  • Concerns over Hermes 2 Mixtral’s reliability: Members shared their experiences with Hermes 2 Mixtral, noting its reliability in multilingual capabilities and compared its performance favorably against Hermes 2 Pro, which some found less reliable.
  • Debate on the need for multiturn data: A conversation emerged about the necessity of multiturn data for training large models like Mixtral 8x22b. It was highlighted that without multiturn data, models tend to degrade in intelligence in subsequent turns, making multiturn data vital for extensive usage.
  • Training costs and computational feasibility: There was a discussion about the high costs and computational demands associated with training large models, with examples such as the substantial expense of training from scratch and the challenges of managing extremely deep transformer networks.
  • New context versions and LLM Leaderboard issues: Members talked about the release of Yi-1.5 models with extended context lengths of 16k and 32k, contemplating whether larger contexts impact performance. Additionally, the usability of the LLM leaderboard was criticized due to the overwhelming amount of models making it difficult to navigate.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (40 messagesđŸ”„):

  • Regex suggested for formatted text search: Discussions highlight the potential use of regex to handle tasks like finding text with specific formats such as all caps or multiple new lines. However, the limitation is that “complex tasks may need more sophisticated approaches like semantic search or symbolic language processing.”

  • Tool calling issues with Hermes2 and Vercel AI SDK: Members reported difficulties with triggering tool calls due to bad JSON responses or parameter issues. The consensus is that Hermes2’s tool calling format, when used with Vercel AI SDK, may need specific prompt handling for better consistency.

  • Local model advantages for sensitive tasks: It’s discussed that using local models like Llama3 can be beneficial for tasks requiring cost predictability, consistency, or sensitive data handling compared to external models like GPT or Claude, which can change and censor responses.

  • Finetuning vs. better prompting: There were discussions about whether it’s more effective to fine-tune models like Llama3 for specific use cases or rely on advanced prompting and retrieval-augmented generation (RAG) with models like GPT-4. It’s highlighted that specific use cases may dictate the choice, with fine-tuning being less viable for changing requirements.

  • Benchmarks for rerankers: Members are looking for public evaluation data to benchmark fine-tuned re-rankers. There’s a need for clear methodologies and datasets to benchmark top results accurately.


Nous Research AI ▷ #rag-dataset (5 messages):

  • Discuss AGI’s Proximity and Strategies: Members shared a link to a research paper on ArXiv titled “The Evolution of Artificial Intelligence,” focusing on the current state of AI and the development towards Artificial General Intelligence (AGI). The paper addresses AGI’s definitions, goals, and necessary strategies for its realization through surveys, discussions, and original perspectives.

  • Personal insights on AI memory solutions: A member shared thoughts on memory, mentioning a great solution being used internally. They also hinted at their interest in the self-evolution of agents, although noting it remains somewhat obscure at the moment.

Link mentioned: How Far Are We From AGI: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limita



Nous Research AI ▷ #world-sim (88 messagesđŸ”„đŸ”„):

  • WorldSim sees terminal UX rewrite and GPT-4o integration: A member mentioned that Ari is working on a complete rewrite of the terminal UX. Additionally, GPT-4o is expected to be added to WorldSim, likely next week.

  • WorldSim event garners community interest: Several users inquired about and attended a scheduled WorldSim meetup, with details shared on Discord. A link to join the event was provided, and members expressed their interest in learning more about the project during the event.

  • Complex interactions spark discussions about AI and symbolism: Users discussed various aspects of AI, with one mentioning the potential for symbolic knowledge graphs. They also referenced literature on hermetic practices and complex adaptive systems with links like this YouTube video and a book by Franz Bardon.

  • Community experiments with AI-generated content: Members shared their experiments with generating papers and other content using WorldSim. One described a process involving commands in the root, while another shared some whacky images Copilot created using a terminal prompt.

  • Potential for WorldSim as a platform for knowledge graphs: Users discussed the future evolution of WorldSim into an amorphous applications platform. They highlighted its potential for generating new AI-related knowledge graphs and symbolic meanings from user interactions.

Links mentioned:


CUDA MODE ▷ #general (38 messagesđŸ”„):

  • Hugging Face democratizes GPU access: Hugging Face is dedicating $10 million in free shared GPUs to support small developers, academics, and startups with new AI technologies, aiming to decentralize AI advancements currently dominated by big tech companies. CEO Clem Delangue emphasizes the company’s ability to make this investment due to its near-profitability and recent $235 million funding round The Verge article.

  • Bend language generates buzz: Members discussed the launch of Bend, a massively parallel, high-level programming language. There were questions about its necessity compared to Triton or Mojo, with some noting Mojo’s current limitations on GPUs and Triton’s focus on ML GitHub link.

  • Concerns about CUDA’s future: A member expressed concerns about how new frameworks like Bend might affect traditional CUDA programming. Other members suggested that while new languages are exciting, they cater to different needs like CPU-GPU hybrid products.

  • Inference server resources exchange: A discussion unfolded about building inference servers, with members recommending resources like Nvidia Triton, TorchServe, and various YouTube talks on ML model serving. Recommendations included TorchServe tutorial and broader ML systems talks YouTube link.

  • Clarifying ML model serving complexities: Members debated the differences between serving ML models versus standard web servers, noting that ML serving involves complex considerations like hardware requirements (e.g., GPUs), model versioning, and specific infrastructure like Kubernetes.

Links mentioned:


CUDA MODE ▷ #triton (20 messagesđŸ”„):

  • Performance discrepancy between tutorial setups: A member noted a significant performance difference between Umer’s YouTube tutorial and the official Triton tutorial. Despite using similar techniques, their implementation performed 2x worse compared to the tutorial.
  • MAX_FUSED_SIZE confusion in LayerNorm: A user questioned why MAX_FUSED_SIZE is set to 65536 while TRITON_MAX_TENSOR_NUMEL is 1048576. They observed a speed degradation on an A100 when using a block size greater than 65536.
  • Reasons behind block size choices: Horace explained that too large of a block size may cause kernel spilling due to excessive register requests. He confirmed that each block schedules to one SM and shares shared memory, similar to CUDA.
  • Thread operations on GPUs: The conversation clarified that each Triton block maps to a GPU SM and each thread loads multiple elements. Horace mentioned this is desirable for utilizing vector instructions on a GPU.

Links mentioned:


CUDA MODE ▷ #cuda (7 messages):

  • Query on atomic add for complex numbers in CUDA: A member asked about performing an atomic add on cuda::complex, querying if two distinct additions on the x and y components are necessary.
  • Limitation of 128-bit atomicCAS: Another member noted that on architectures other than Hopper, one must settle for 64-bit operations because 128-bit atomicCAS isn’t supported.
  • Shared Code Example: To address the atomic add issue, a code snippet that uses a complex addition with unsigned long long int and atomicCAS was provided, explaining implementation on compatible architectures.
  • Simple Approach Suitability: The original inquirer clarified that targeting Volta, Ampere, and Hopper architectures, they found two atomic adds on x and y components using either cuComplex or cuda::std::complex acceptable.

CUDA MODE ▷ #torch (20 messagesđŸ”„):

  • Torch Compile Usage Insights: A member shared their experience using torch.compile() for inference use-cases, advising on code optimizations, such as avoiding Python loops and conditions, to ensure better performance. They mentioned that for static shapes, the tool generally works well out of the box.

  • Discussion on ONNX and TensorRT: Another user raised a question on how torch.compile() compares to ONNX or TensorRT when using Triton for inference. The conversation suggests a curiosity about the relative performance and application scope of these tools.

  • NHWC Group Normalization Issues: A member pointed out that group normalization in ATen code doesn’t support NHWC format properly, leading to tensors being implicitly converted to NCHW. They shared a GitHub pull request aimed at addressing this issue but faced challenges with the ApplyScaleBiasNHWCKernel they wrote.

  • Torch Multiplication Memory Issue: A question was raised about torch’s native multiplication doubling memory usage even when performed in-place. Solutions and explanations included using mul_() to maintain flat memory consumption and handling memory allocation properly to address backprop concerns.

Link mentioned: Add NHWC support for group normalization by ZelboK · Pull Request #126635 · pytorch/pytorch: Fixes #111824 Currently it is the case that if the user specifies their group normalization to be of NHWC format, pytorch will default to NCHW tensors and convert. This conversion is not immediate



CUDA MODE ▷ #announcements (1 messages):

  • Expert talk on building a GPU native query engine: An announcement about a talk featuring a former maintainer of cuDF discussing the building process of a GPU native query engine at Voltron. The session promises to cover everything from authoring efficient kernels for data processing to creating a real production solution and is happening on Zoom.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom 



CUDA MODE ▷ #cool-links (2 messages):

  • Stephen Jones Simplifies CUDA Programming: A member shared a YouTube video titled “GTC 2022 - How CUDA Programming Works” by Stephen Jones, the CUDA Architect at NVIDIA. The video provides an introduction to programming the GPU and discusses the fundamentals of efficient memory use.

Links mentioned:


CUDA MODE ▷ #jobs (5 messages):

  • Neural Magic hunts for CUDA/Triton engineers: A committer to vLLM from Neural Magic announced job openings for CUDA/Triton engineers to contribute full-time to the project’s open-source efforts. Interested individuals were asked to contact Robert Shaw via Discord or email.

  • Activation quantization emerges as top priority: Responding to queries, it was mentioned that the primary focus is on activation quantization (fp8/int8) and related optimizations. The team aims to leverage features like 2:4 sparsity and fp6/fp4 on next-gen GPUs, and improve underoptimized aspects such as the MoE and sampling kernels.

  • LinkedIn expresses willingness to help: A LinkedIn representative indicated potential interest from their team in supporting vLLM’s needs. Further conversations on specific areas like graph-level optimization were initiated for potential collaboration.


CUDA MODE ▷ #beginner (14 messagesđŸ”„):

  • DLL Load Failure Troubleshooting: A member encountered a “DLL load failed” error while importing a module in their CUDA implementation and sought help. Suggestions included checking the build_directory paths and ensuring ninja was installed, along with verifying the status of Visual Studio setup.
  • Need for Full Code and Stacktrace: In response to the error, it was advised to share the full code and stacktrace for precise debugging, rather than assumptions. Testers asked for more context to offer a specific solution.
  • Ninja Installation and Environment Issues: It was recommended to run ninja -v in the terminal to check its installation, especially considering the member was using a virtual environment on potentially dual-booted systems.
  • Windows Compatibility Worries: There was a suggestion that dual-booting with Windows might complicate things, reflecting concerns over Visual Studio setup and general Windows compatibility issues with the build process.

CUDA MODE ▷ #off-topic (1 messages):

iron_bound: Polish code breaker’s https://www.flyingpenguin.com/?p=56989


CUDA MODE ▷ #llmdotc (180 messagesđŸ”„đŸ”„):

  • Grad Clipping Bug and Fixes Revolutionize Training: Discussions around gradient clipping reveal issues with incorrect comparisons as "grad_norm" was squared, needing correction to ensure more accurate and robust training. Additionally, the correct initialization of "grad_norm" has been emphasized to prevent unexpected behavior.
  • Memory Optimizations come into Focus: Multiple users contribute to the discussion on optimizing CUDA kernel code, especially around memory allocation and usage with specific interest in templating block sizes for better compile-time constants. Attention was also given to the potential performance improvements from rewriting the Adam optimizer kernel considering new memory-bound constraints.
  • Evaluating Hellaswag vs. MMLU Performance: Hellaswag evaluation for GPT-2 (124M at 29.55%) and GPT2-XL (48.93%) showed expected gradational improvement from model size. MMLU evaluations, however, were unexpectedly poor, indicating potential issues with the dataset or evaluation criteria.
  • ZeRO-2 Implementation Discussions Get Technical: Members discussed implementing ZeRO-2, particularly focusing on memory layout reorganizations, communication call reductions, and the preservation of compatibility with checkpoint files. The conversation extended to efficient gradient computation and NCCL interleaving to enhance performance.
  • Template and Constant Refactor for Optimization: A proposal to templatize block sizes in CUDA kernels to enable compile-time optimizations was discussed, alongside other codebase cleanup suggestions. An immediate result was a PR to standardize “warpSize” as a constant for better compile-time optimization, reflecting a common agreement on improved code efficiency.

Links mentioned:


CUDA MODE ▷ #lecture-qa (33 messagesđŸ”„):

  • Contributors inquire about libcudf’s performance: One contributor asked about benchmarks comparing libcudf to CPU parallelized operations, specifically mentioning the ongoing work on the ParPaRaw parser and CSV reader refactor. Another highlighted their interest in low-level code optimization like SASS.

  • Debate on Dask-cuDF and Theseus: A user queried the differences and performance variations among Dask-cuDF, cuDF, and Theseus, expressing curiosity about their use cases and optimization levels. There was concern about the ongoing development of Dask-cuDF, with a GitHub link indicating it had been archived.

  • RAPIDS Accelerator for Apache Spark introduced: The discussion included an introduction to RAPIDS Accelerator for Apache Spark, combining RAPIDS cuDF library and the Spark distributed computing framework to accelerate processing through GPUs. This tool aims to cater to the growing adoption of AI in analytics by offering a cost-efficient and speedy processing framework.

  • Thrust and CUB receive praise: There was a rich discussion on the advantages of Thrust and CUB, with users appreciating their declarative programming flow that enhances code readability and optimization. The influence of CUB on the abstractions in CUTLASS was noted.

  • Optimization and bottlenecks discussed: Insights were shared on the diminishing need for assembly-level optimization due to current bottlenecks shifting to IO and networking. The focus has now moved toward understanding how libcudf is utilized on large datasets, emphasizing the importance of networking orchestrations like NCCL.

Links mentioned:


CUDA MODE ▷ #youtube-watch-party (2 messages):

  • Zoom link for extended discussion: Due to the 45-minute limitation on activity time, members were directed to join an extended discussion on Zoom: Zoom Meeting.
  • Barrier synchronization analogy: A member shared an insightful analogy, comparing barrier synchronization to a school bus waiting for all kids to return from a museum visit, saying it “can’t move till all are accounted for.” This helped clarify the concept for others.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom 



CUDA MODE ▷ #bitnet (31 messagesđŸ”„):

  • Sanity checking uint2 implementation with updates: A member shared implementation details and requested a sanity check for converting and packing uint2 data types in PyTorch. Another member suggested avoiding torch.stack for better performance with torch.compile, leading to an updated implementation using torch.empty.

  • Meeting planning for bitnet group: Discussions were held about organizing regular meetings for the bitnet group and reviewing relevant documents and repositories. The meeting planner and resources were shared, with a tentative meetup scheduled for tomorrow.

  • Issues with uint4 dtype in Torch: Members discussed the necessity of packing uint4 data types into uint8 for memory efficiency due to the lack of native int4 operations on Nvidia GPUs. It was clarified that without packing, memory consumption would double.

  • Unpacking uint8 to trinary values: Code examples were discussed and refined for unpacking uint8 data to trinary values and handling signed/unsigned bitshifts. A potential workaround for quantization by shifting distributions was also considered.

  • Collaborative efforts on project management: Members acknowledged the challenges with project management while ensuring that all necessary references and best practices for custom CUDA extensions and dtype creation were shared and followed.

Links mentioned:


Eleuther ▷ #general (219 messagesđŸ”„đŸ”„):

  • CC Datasets Contain Significant Spam: Members discussed ongoing issues with spam in CC datasets, noting a high presence of auto-generated and duplicate content across languages. Asada.shinon mentioned that Chinese datasets contain the most spam, with dedicated filters to address this, and shared an article from Technology Review about GPT-4o’s Chinese token issues.

  • OpenELM offers Transparency and Efficiency: Smerkyg inquired about LLM models with frequent checkpoints, leading to a discussion about OpenELM, a new LLM emphasizing reproducibility and efficiency, achieving a 2.36% improvement in accuracy compared to OLMo. OpenELM Research was suggested as a resource.

  • Memory and Efficiency in LoRA: Premiumonion asked about the FLOPs and memory efficiency in LoRA, concluding that LoRA primarily saves memory over full fine-tuning. Skyward2989 confirmed that memory is often the bottleneck in AI model training.

  • Canadian and UK AI Safety Institutes Collaboration: Hyperion.ai shared that the UK and Canada have announced collaboration on AI safety, involving professional exchanges and secondments to bolster research, detailed in a government publication.

  • Interest in Time Series Modeling: Tiley and Hawk1399 discussed methods for modeling continuous multivariate time series with autoregression. Tiley expressed concerns about errors in autoregressive inference, with suggestions like examining MOMENT from arXiv and considering methods that account for the scope of nonlinear dynamics.

Links mentioned:


Eleuther ▷ #research (93 messagesđŸ”„đŸ”„):

  • Discussing Non-discrete Embedding Spaces with CLIP Guidance: Members discussed potential issues with CLIP guidance, noting that the embedding space might not capture desired attributes due to non-discrete nature and model training biases. One participant suggested an alternative approach similar to LPIPS for text.
  • Twitter Paper Resurfaces, Stirs Discussion: A shared Twitter link prompted debate on an apparently impactful paper. Members discussed its relevance and implications on model training techniques.
  • Innovations in Hierarchical Memory Transformers: A paper on Hierarchical Memory Transformers sparked interest, proposing a novel framework mimicking human memory for enhanced long-context processing. This discussion delved into recurrent models and memory architectures.
  • Analyzing LLM Co-occurrence Issues: Members explored the challenges of evaluating co-occurrence in language model outputs, particularly when models follow their prior outputs over prompts. Suggestions included measuring cross-attention contributions and perplexity metrics.
  • Investigating Positive Transfer Across Modalities: The conversation around ImageBind and related papers (ImageBind, PaLM-E) examined whether training models across multiple modalities can enhance performance in unimodal tasks. This included discussions on zero-shot recognition and combining modality embeddings to improve retrieval performance.

Links mentioned:


Eleuther ▷ #scaling-laws (14 messagesđŸ”„):

  • Scaling bare bones paper faces criticism: One member commented on the sparse nature of a recently discussed research paper, noting its lack of hyperparameter tuning and expressing curiosity about its scalability at higher levels.
  • Challenges in estimating FLOP calculations: A detailed discussion emerged about the correct calculation of FLOPs for forward and backward passes in models. Members provided insights and referenced specific resources like EleutherAI’s cookbook to clear up confusion, with notes that some calculations might exclude projection computations leading to discrepancies.
  • Query on sample efficiency metrics: A member posed questions about defining and measuring sample efficiency in various domains, suggesting the concept’s importance in relation to scaling laws and efficient resource management.
  • Theoretical question on Bitnet’s compute efficiency: There was an intriguing theoretical discussion about whether a more compute-efficient version of a model, using the same parameter count but significantly less compute power, would alter the optimal parameter to token ratio as defined by Chinchilla scaling laws. The consensus leaned towards no change, assuming increased compute capabilities would simply extend compute budgets for such models.

Link mentioned: Observational Scaling Laws and the Predictability of Language Model Performance: Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of 



Eleuther ▷ #lm-thunderdome (13 messagesđŸ”„):

  • HF model automatically uses default prompt: The model is automatically prompted with a default prompt based on current common practices. One user shared their experience of fine-tuning models with different methods and noted a variation in performance.
  • Seeks finance and crypto-related AI tasks in English: A member inquired about good tasks for finance, trading, investing, and crypto-related topics, specifying a preference for tasks in English.
  • NeurIPS benchmark article pseudo-review request: A member asked if anyone was interested in reviewing their benchmark article for NeurIPS. Another member responded positively, agreeing to the request.
  • Improving evaluation speed on large models: A user shared difficulties in running evaluations on large models, noticing long durations for tasks like MMLU. Another user suggested optimizing batch size settings to speed up the evaluation process.
  • No dedicated channel for AI Safety/benchmark events: A member asked about promoting AI Safety or benchmarks-relevant events in a dedicated channel. The response indicated that there is currently no such channel available in EleutherAI Discord.

Link mentioned: TIGER-Lab/MMLU-Pro · Datasets at Hugging Face: no description found


Eleuther ▷ #gpt-neox-dev (1 messages):

  • Soft prompt tuning setup issues: A member inquired about recent experiences with the soft prompt tuning setup in non-pipeline cases. They mentioned a specific issue where it seems “param.requires_grad gets reset after model.to_sequential() is called.”

Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (2 messages):

  • Request for Monthly Round Up: A member suggested that a monthly round-up would be “very helpful”. The proposal indicates a desire for regular summaries or updates to stay informed.

  • Expression of Uncertainty: Nathan Lambert responded with, “lol I don’t know Man” indicating uncertainty or ambiguity about the previous suggestion or a related discussion. This shows a casual tone in the conversation.


Interconnects (Nathan Lambert) ▷ #news (29 messagesđŸ”„):

  • Meta introduces Chameleon: Meta's new model, Chameleon, is a 34B parameter multimodal foundation model outperforming models like Flamingo and IDEFICS in both text and image tasks. It’s trained on ~10T tokens and claims superiority over GPT-4V in human evaluations. [Source](https://arxiv.org/abs/2405.09818)
  • DeepMind reveals Flash-8B: The updated Gemini 1.5 paper introduces Flash-8B, a new model distinct from Gemini 1.5 Flash. Flash-8B boasts a multimodal and extensive context window while being highly efficient. [Source](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf#page=45)
  • Gemini 1.5 Model Family expands: The Gemini 1.5 Pro and Flash models show significant improvements over previous versions, excelling in both text and vision benchmarks. Their performance in the MMLU task demonstrates the highest capabilities among their lineup. [Source](https://goo.gle/GeminiV1-5)
  • Anthropic scales up: Anthropic reported utilizing four times more compute than their previous model, Opus, aiming to develop even larger and more capable models. [Source](https://www.anthropic.com/news/reflections-on-our-responsible-scaling-policy)
  • LMsys announces "Hard Prompts" category: LMsys introduces a "Hard Prompts" category in Arena to evaluate models on more challenging tasks with a significant ranking shift observed. Llama-3-70B-Instruct is used as a judge model, but its reliability is questioned. [Source](https://fxtwitter.com/lmsysorg/status/1792625968865026427)

Links mentioned:

  • Tweet from lmsys.org (@lmsysorg): How did we classify these criteria? We adopt Llama-3-70B-Instruct as the judge model to help us label over 1 million Arena battles. Overall our analysis reveals that the quality of Arena user prompts...
  • Tweet from fishy business (@swishfever): commented line in chameleon paper: % \item We open-source variants of \model{} that allow text and image inputs but only text outputs across all model sizes. Quoting Tanishq Mathew Abraham, Ph.D. (...
  • Tweet from lucas g (@DaLucasGonzalez): Our updated Gemini 1.5 tech report is out! Excited to share a sneak peak of a new model we are working on: Flash-8B https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf#page=4...
  • Tweet from lmsys.org (@lmsysorg): Introducing "Hard Prompts" Category in Arena! In response to the community's growing interest in evaluating models on more challenging tasks, we are excited to launch the new "Hard Pr...
  • Tweet from lucas g (@DaLucasGonzalez): Flash-8B has the same multimodal and million context window as our other 1.5 models, but in an extremely efficient footprint. There's no other model like this in the world. It shows incredible ca...
  • Tweet from lucas g (@DaLucasGonzalez): Our initial benchmarks are very promising, and this only an early look, as we are still actively developing the model to maximize performance at this size.
  • Tweet from Susan Zhang (@suchenzang): Updated tech report with lots of goodies! Now for a thread mostly focusing on âšĄïž Gemini 1.5 Flash âšĄïž... đŸ§” Quoting Jeff Dean (@🏡) (@JeffDean) Gemini 1.5 Model Family: Technical Report updates n...
  • Tweet from Aidan McLau (@aidan_mclau): yo what is anthropic cookin 4× more compute than opus damm

Interconnects (Nathan Lambert) ▷ #ml-drama (145 messagesđŸ”„đŸ”„):

  • OpenAI’s superalignment team disbanded: The formation of OpenAI’s “superalignment team” was announced last year to prepare for potential supersmart AI. This team is now disbanded following departures of key researchers including Ilya Sutskever, as covered here.

  • Jan Leike’s departure from OpenAI: Jan Leike, former co-lead of the superalignment team, expressed disagreements with OpenAI’s core priorities on Twitter.

  • Goodhart’s law and AI deception: Users debated the implications of Goodhart’s law on large language models, with concerns that merely increasing model size can lead to models goodharting better, thus becoming more deceptive.

  • OpenAI’s controversial employment practices: OpenAI faced criticism for requiring departing employees to sign lifelong nondisparagement agreements in order to retain vested equity, despite leadership later clarifying on Twitter that they never enforced such clauses.

  • OpenAI addresses AI voices controversy: OpenAI paused the use of its AI voice “Sky” following questions about its selection process. They clarified the voice is not mimicking a celebrity but belongs to a professional actress. Read more here.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (24 messagesđŸ”„):

  • Chinatalk episode gets thumbs up: A member praised the Chinatalk episode with a thumbs up emoji, indicating it was very good.
  • Llama3-from-scratch project is a great learning tool: Llama3-from-scratch was highlighted as an excellent resource for learning, suggesting that creators of such tools are hireable. “These things are the best learning tools”, exclaimed a member.
  • Latent Consistency Models explained for beginners: A blog explaining Latent Consistency Models (LCMs) for beginners was recommended, especially praised for its readability. The blog can be found here.
  • New domain name purchased: A discussion about buying and squatting domain names led to a member buying the domain rlhfbook.com. The price was notably low, only $7/year via Porkbun.
  • Caution over Books4 dataset: The Books4 dataset was humorously referred to as a legal minefield, likened to Monopoly’s “Straight to Jail” card. It was mentioned that previous legal actions have primarily targeted dataset curators.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (41 messagesđŸ”„):

  • Yudkowsky’s Doomsday Insight Gains Traction: A user shared a post by Liron Shapira, highlighting Eliezer Yudkowsky’s broad yet incomplete influence on AI risk awareness. The user emphasized that other experts are still on their journey toward “full awareness” of the problem.

  • Hilarious AI Growth Hack: Users discussed a meme from Hamel Husain, with one suggesting using the concept as a marketing stunt. The idea revolved around offering “1 year paid,” even though it provides minimal value.

  • Selling Couch for AI Credits: A user humorously admitted selling their couch to purchase an AI course, proclaiming wealth in “credits”. Amid discussions, Natolambert acknowledged the fun in using credits and experimenting with various APIs.

  • Debate Over Paid Content Lectures: Natolambert expressed discomfort in delivering lectures for paid content and noted how Maven had tried to onboard him for a course. Concerns were raised about helping others profit from his branding via YouTube collaborations.

  • Gaming Roots and YouTube Ventures: Conversations entertained Call of Duty experiences, with Natolambert sharing his YouTube channel. There was a nostalgic recall of earning respect through gaming skills and the shared excitement for content creation around academic papers.

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rlhf (6 messages):

  • Concerns about ORPO paper in RLHF: A user inquired if anyone had looked at the ORPO paper and noted that it had been added to Hugging Face’s library. Another member shared their suspicion about ORPO’s scalability, stating, “It sounds nice but I don’t know if it’ll scale well,” indicating skepticism while reminding themselves to give it more credit.
  • Practical testing reveals ORPO limitations: One member shared results from tests on ORPO, finding it “seemed okay but not great”. They argued that combining SFT with margin-based loss usually doesn’t work well, suggesting ORPO’s method of replacing the reference model with 1-policy might result in over-regularization.

Link mentioned: Tweet from Kawin Ethayarajh (@ethayarajh): @maximelabonne @winniethexu aligned zephyr-sft-beta on ultrafeedback and it looks like kto/dpo are a bit better? note that zephyr-sft-beta was sft’ed on ultrachat (not ultrafeedback) so all the 



Interconnects (Nathan Lambert) ▷ #reads (2 messages):

  • Chamath Palihapitiya Faces Criticism: A Substack post criticizes Chamath Palihapitiya for his role in promoting special purpose acquisition companies (SPACs), which led to financial losses for retail investors. The author argues that Palihapitiya dismisses the losses suffered by others while continuing to deny any wrongdoing (The Scam in the Arena).
  • Schadenfreude Over the All-In Pod Hosts: A member expressed enjoyment in reading about the failures of the All In Podcast hosts, noting that they seem insincere. “I really enjoy reading about the all in pod hosts failures. They feel so fake”.

Link mentioned: The Scam in the Arena: Chamath Palihapitiya took retail investors for a ride, got away with it, and just can’t let himself take the win.


Interconnects (Nathan Lambert) ▷ #posts (1 messages):

SnailBot News: <@&1216534966205284433>


Interconnects (Nathan Lambert) ▷ #retort-podcast (21 messagesđŸ”„):

  • Cynical about automating OnlyFans DMs: A member mentioned listening to the recent Latent Space podcast where they interviewed someone automating OnlyFans DMs, which made them “pretty cynical”. This was considered relevant to the current episode discussion.
  • Interesting episode on OpenAI happenings: The new Retort AI episode discusses two major OpenAI developments, including their new chat assistant and the release of their Model Spec for RLHF goals. A highlighted segment mentions the blurring boundaries of intimacy and technology.
  • Scaling laws of vocab size: A member raised the question about the scaling laws of vocab size in relation to model size, pondering potential trade-offs between inference speed and complexity. Another member responded, indicating models are harder to train stably with weirder tokenizers.
  • Hysteresis and Control Theory: Members discussed the term “hysteresis” and its relevance in control theory, with a nod to Steven Strogatz’s work referenced through an Amazon book link. They humorously pondered needing to know more control theory for GRE words.

Links mentioned:


Latent Space ▷ #ai-general-chat (126 messagesđŸ”„đŸ”„):

  • Hinton and Ilya’s Debate on Scaling Laws: @joelhellermark shared Hinton’s quote on Ilya’s intuition for scaling laws, stating, “Ilya was always preaching that you just make it bigger and it’ll work better. And I always thought that was a bit of a cop-out, that you’re going to have to have new ideas too. Turns out Ilya was basically right.” Full interview link here.

  • Jan Leike’s Departure from OpenAI: Several contributors highlighted Jan Leike’s resignation as head of alignment at OpenAI, sharing multiple sources and speculations on the implications of his departure. Sam Altman and Greg Brockman posted their appreciation and future safety plans here.

  • Vertical-Axis Wind Turbines Using Machine Learning: A link to an article was shared about EPFL researchers using a genetic learning algorithm to optimize blade profiles for vertical-axis wind turbines, which are less noisy and more wildlife-friendly compared to horizontal-axis wind turbines. Full story here.

  • Obsidian and AI for Journaling: Users discussed integrating AI with note-taking systems like Obsidian to create more efficient journaling/diary workflows. @neuralution mentioned a project using voice conversations via a custom Telegram bot to summarize journal entries into Obsidian.

  • Comparing AI Languages: Rust vs Go: A member asked whether Rust or Go is better suited for AI development. Contributors noted that Rust is gaining traction, especially with Hugging Face projects like Candle and tokenizers, while Go is more suited for applications making HTTP calls to LLM APIs.

Links mentioned:


Latent Space ▷ #ai-in-action-club (127 messagesđŸ”„đŸ”„):

  • “Feedback Is All You Need” Discussed: A YouTube video titled “Feedback Is All You Need - Gordon Brander” was shared, sparking discussion on whether current AI agents can learn, adapt, and make autonomous decisions.
  • Andrew Ng on AI Agents: A link to an Andrew Ng’s tweet was shared, emphasizing the potential of AI agentic workflows to drive significant AI progress. He elaborated on the benefits of iterative workflows and various design patterns for building agents like reflection, tool use, planning, and multi-agent collaboration.
  • Debate on Definition of AI Agents: Members debated the definition and attributes of AI agents, comparing them to traditional software agents and considering autonomy, social ability, reactivity, and persistence as critical factors.
  • Reinforcement Learning and Historical Context: The historical context of agents in AI was discussed, with references to seminal works like Samuel’s checkers-playing program from 1959, highlighting the lineage and evolution of agent-based decision-making systems.
  • Interest in AI Music Generation: Members expressed excitement and interest in AI-generated music and related projects, with personal anecdotes and future collaboration plans shared. One member mentioned working on MusicGen finetunes and promised to share related links.

Links mentioned:


LlamaIndex ▷ #announcements (1 messages):

  • New Webinar on Memary project: This Thursday at 9am PT, we will be hosting the authors of memary, an open-source reference for long-term memory in autonomous agents. The webinar will feature a deep dive into the project and a Q&A session discussing memory challenges and future directions—sign up here.

Link mentioned: LlamaIndex Webinar: Open-Source Longterm Memory for Autonomous Agents · Zoom · Luma: In this webinar we’re excited to host the authors of memary - a fully open-source reference implementation for long-term memory in autonomous agents đŸ§ đŸ•žïž In



LlamaIndex ▷ #blog (10 messagesđŸ”„):

- **QA struggles with large tables**: Even the latest LLMs still hallucinate over complex tables like the Caltrain schedule due to poor parsing. More details can be found [here](https://t.co/Scvp7LH2pL).
- **Boost vector search speed by 32x**: Using 32-bit vectors, [JinaAI_](https://t.co/NnHhGudMa8) shared methods that offer significant performance gains at only a 4% accuracy cost. This optimization is crucial for production applications.
- **Building agentic multi-document RAG**: Plaban Nayak's article explains constructing a multi-document agent using LlamaIndex and Mistral. Each document is modeled as a set of tools for comprehensive summarization, available [here](https://t.co/FksUI3mm5l) and [here](https://t.co/MbDtlrxk5B).
- **Fully local text-to-SQL setup**: Diptiman Raichaudhuri offers a tutorial on setting up a local text-to-SQL system for querying structured databases without external dependencies. This guide is accessible [here](https://t.co/u3LG9NKE0X).
- **San Francisco meetup announcement**: LlamaIndex will host an in-person meetup at their HQ with talks from prominent partners including Tryolabs and Activeloop. The meetup will cover advanced RAG engine techniques; RSVP and more details can be found [here](https://t.co/o0BWxeq3TJ).

Link mentioned: RSVP to GenAI Summit Pre-Game: Why RAG Is Not Enough? | Partiful: Note: This is an in-person meetup @LlamaIndex HQ in SF! Stop by our meetup to learn about latest innovations in building production-grade retrieval augmented generation engines for your company from 



LlamaIndex ▷ #general (139 messagesđŸ”„đŸ”„):

  • MetaDataFilters for Data Governance: A user figured out how MetaDataFilters in LlamaIndex work, applying filters directly on the DB level. They are curious whether MetaDataFilters are feasible for data governance in scalable applications and asked about selective indexing to restrict access to financial data.

  • Embedding with Neo4jVectorStore Issues: A user reported errors when integrating LlamaIndex with an existing Neo4j graph containing pre-created embeddings and nodes. They discussed several methods for creating compatible nodes and embeddings using LlamaIndex to resolve this.

  • Model and Query Configuration Help: Users discussed using different embedding models and query engines in LlamaIndex, including setting up environment variables, passing models to query engines, and handling embeddings setup issues. Several links to LlamaIndex documentation and examples were shared.

  • Challenges with Multi-Agent and Tools: The conversation detailed issues with using multiple tools and agents within LlamaIndex, including confusion and inefficiencies in tool selection by agents like GPT-4. A user shared their workaround, including using a ReactAgent as a sub-agent.

  • Data Governance in RAG Applications: A complex discussion on implementing data governance in RAG applications using LlamaIndex and Langchain was held. Links to talks and articles from NVIDIA and Microsoft about integrating access control were shared for deeper insights.

  • Miscellaneous LlamaIndex Queries: Users asked about differences between chatbot engines and query engines, handling document duplicates in Pinecone, scraping data from the web for RAG applications, and modifying system prompts in OpenAI agents within LlamaIndex. Various solutions and troubleshooting steps were exchanged.

Links mentioned:


LlamaIndex ▷ #ai-discussion (4 messages):

  • Multimodal Power Unleashed with GPT-4o: A user shared a Medium article on integrating GPT-4o with LlamaParse. The link received positive reactions and a “nice!” comment from another member.

LAION ▷ #general (134 messagesđŸ”„đŸ”„):

  • CommonCanvas dataset splits opinions: Members expressed mixed feelings about CommonCanvas, a dataset of 70M image-text pairs with both alt and synthetic captions, due to its restrictive non-commercial license. “Seems entirely counterproductive” and “No derivatives is also odd, because wouldn’t it be good if people modified/expanded on this dataset?” show the frustration (link: announcement).

  • Challenges of torch.compile and GPU utilization: Members, like drhead, discussed significant slowdowns due to PyTorch’s native_group_norm and frequent device sync issues. The issue highlights the differences in performance between PyTorch’s eager mode and torch.compile (“i have it running like, only 5% slower than what I can accomplish using torch.compile”).

  • Concerns on hallucinations in AI captions: There is an ongoing debate about the impact of hallucinated captions on training visual language models and text-to-image models (VLLMs and T2I). “I’ve been talking to a lab that said hallucinations in captions are actually extremely damaging to VLLMs and T2I but I’m still waiting on the paper”.

  • LLava and CogVLM discussed for dataset creation: Members are exploring various AI models like LLava and CogVLM for captioning large datasets. While LLava-next and LLaMA models are gaining traction, they expressed skepticism over CogVLM’s performance (“cogvlm sucks too”).

  • Aspirations for more open-source datasets: Users are actively discussing the creation of high-quality, diverse datasets large enough for training foundational models, referencing projects like CC12M with various VLMs, and concerns over data integrity and accessibility. “I will always open source mine” and sentiments towards avoiding “hallucinations” in training data underline their efforts.

Links mentioned:


LAION ▷ #research (13 messagesđŸ”„):

  • Chameleon breaks new grounds: The Chameleon model, introduced in an arXiv paper, is a mixed-modal model capable of understanding and generating images and text simultaneously. It showcases state-of-the-art performance in tasks like image captioning and generative abilities surpassing even larger models like Llama-2.

  • Sakuga-42M, a game-changer for cartoon datasets: An arXiv study introduces Sakuga-42M, the first large-scale cartoon animation dataset. The dataset comprises “42 million keyframes” and aims to fill the gap in cartoon-specific training data.

  • CogVLM2 license raises concerns: Warnings were issued over the new CogVLM2 model’s license, which states restrictive clauses regarding use against China’s interests and mandating disputes be resolved by a Chinese court (source, GitHub).

  • MambaOut steps in where Mamba stumbles: The Mamba model, despite its architectural promise, underperforms in vision tasks compared to attentional and convolutional models (arXiv paper). Empirical evidence suggests Mamba isn’t necessary for image classification, but its long-sequence capabilities still hold promise for detection and segmentation tasks.

  • Kobe Bryant Memed for Mamba’s Performance: Users humorously referenced Kobe Bryant’s famous quote “Mamba out” to comment on the underwhelming performance of the Mamba model.

Links mentioned:

  • Sakuga-42M Dataset: Scaling Up Cartoon Research: Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and ...
  • MambaOut: Do We Really Need Mamba for Vision?: Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision t...
  • Chameleon: Mixed-Modal Early-Fusion Foundation Models: We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach f...
  • LoRA Learns Less and Forgets Less: Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In t...
  • CogVLM2/MODEL_LICENSE at main · THUDM/CogVLM2: 珏äșŒä»Ł CogVLMć€šæšĄæ€éą„èź­ç»ƒćŻčèŻæšĄćž‹. Contribute to THUDM/CogVLM2 development by creating an account on GitHub.

LAION ▷ #resources (1 messages):

  • Building a semantic research paper app: A member shared a recent article on how to build a semantic research paper app using LangChain, Chainlit, and Literal AI. The article also includes steps on integrating observability features into the app.

AI Stack Devs (Yoko Li) ▷ #app-showcase (2 messages):

  • 4Wall reveals AI entertainment platform: The team behind 4wall is developing an AI-driven entertainment platform, currently in beta. A teaser video was shared on X (formerly Twitter).
  • AI Town integration and user-generated content: 4Wall plans to integrate AI Town into their platform, allowing users to use bots seamlessly. They are also working on features for users to create maps and games.
  • 3D AI characters in the pipeline: The 4Wall team announced that 3D AI character functionality is in development and will be available soon.

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #ai-companion (1 messages):

.ghost001: They gonna feel dumb when the more advanced versions come out


AI Stack Devs (Yoko Li) ▷ #events (1 messages):

  • Announcing Rosebud AI Game Jam Winners: The winners of the Rosebud / #WeekOfAI Education Game Jam were announced, showcasing incredible AI-powered educational games. The first-place game, “Pathfinder: Terra’s Fate”, and third-place “Ferment!” were highlighted for their engaging experiences. Check the winners and try out Rosebud here.

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (38 messagesđŸ”„):

  • AI Town Now Runs Natively on Windows: A member enthusiastically announced that AI Town now works natively on Windows without requiring WSL or Docker, celebrating the news with a tweet from @cocktailpeanut confirming the launch of the AI Town 1 Click Launcher for Windows.
  • Launch of AI Reality TV Platform: Excited members shared the launch of the AI Reality TV platform that allows users to create social simulations and observe AI-powered characters interact. They encouraged others to join and hosted the next attraction, creating simulations like “Elisabeth choosing between Jack or Will in Pirates of the Caribbean.”
  • Installation Issues and Solutions Shared: A user encountered problems setting up AI Town conversations and was advised to check the memory system documentation and to adjust settings in convex/constants.ts to improve conversation persistence.
  • Extracting Conversations from SQLite Databases: Users discussed methods to extract conversations from SQLite databases used by AI Town. Helpful SQL queries for exporting data were shared, and links to relevant repositories were provided for further assistance in filtering and exporting conversation data.
  • Add Intriguing Characters and Watch AI Interactions: Members shared their creative character additions to AI Town, such as spies and local reporters, and noted the realistic interactions. There were mentions of troubleshooting, and users shared experiences on how adjusting memory fetch settings can improve character interactions.

Links mentioned:


AI Stack Devs (Yoko Li) ▷ #ai-town-dev (94 messagesđŸ”„đŸ”„):

  • AI Town gears up for AI Reality TV: A link was shared for joining the AI Reality TV show here. Members are encouraged to create their own AI and help it win the show, detailed here: AI Reality TV.

  • Technical details for AI Town shared: The tech stack for AI Town was described as using Convex for backend, JS/TS for app logic, Pixi.js for graphics, Clerk for auth, and varying between Ollama and OpenAI for inference.

  • Error troubleshooting in AI Town: Members encountered connection issues with AI Town on Windows, experiencing errors during agent communications. They were directed to seek further help on the Pinokio Discord server.

  • Saving and extracting conversations: The possibility of using a web app to dump sqlite files from AI Town was discussed, with a link provided to GitHub - Townplayer. Alternative methods include using any sqlite viewer and the convex dashboard for hosted versions.

  • World context integration suggestion: It’s noted that adding context directly into character prompts enriches the narrative in AI Town. There’s a suggestion to use the world description for better context, and plans for Convex’s hosted dashboard to work with local deployments were discussed.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (112 messagesđŸ”„đŸ”„):

  • Server responses with status 500 on function calls: A user reported, “when call with function calls,server response with status 500 and message ‘Function calling is not supported by openrouter’.” There was no immediate resolution provided in the conversation.
  • Invalid model URLs causing application errors: A user noted that navigating to an invalid model URL breaks the page with “Application error: a client-side exception has occurred (see the browser console for more information)” instead of a proper 404 error. The behavior differs based on whether the user is signed in or not.
  • Auto top-up payment issues: Multiple exchanges discussed a problem where auto top-up payments were declined, resulting in a user’s credits falling below allowable limits and being unable to manually top-up. The issue was identified as likely being blocked by the user’s bank (WISE EUROPE SA/NV).
  • Model recommendations and fine-tuning feedback: Users shared their experiences with various models, with mentions of “Cat-LLaMA-3-70B”, Midnight-Miqu models, and the need for better fine-tuning methods as opposed to “random uncleaned data” approaches. One user noted, “Try Cat-LLaMA-3-70B, it’s very impressive when you actually manage to get it to work.”
  • Wizard LM 8x22B request failure issues: A user asked about frequent failures with Wizard LM 8x22B on OpenRouter, which were identified as temporary surges in request timeouts (408) from several providers.

Reach the full conversation here: OpenRouter Discord.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (58 messagesđŸ”„đŸ”„):

  • Galore Layerwise lacks DDP support: Users express frustration over the Galore Layerwise tool’s inability to support DDP (Distributed Data Parallel), indicating ongoing limitations in its functionality.

  • Large Chinese datasets raise interest: A conversation around fine-tuning a large 8B model with 1 billion tokens in a non-English language, specifically Chinese, draws interest. Relevant dataset links: Multimodal Art Projection (M-A-P) and BAAI.

  • Gradient norm issues during fine-tuning: Discussion reveals unbounded growth in gradient norms when using low rank during model fine-tuning, specifically with llama 3 8B. The issue seems rooted in saturated weights that cannot update gradients without significant perturbation.

  • GPT-4o’s spammy tokens: Members share concerns about GPT-4o’s tokens being polluted with spam and porn phrases, highlighting flaws in the latest release’s token parsing for Chinese language.

  • Commandr configuration for axolotl: Issues related to setting up Commandr configurations are partly resolved by a specific GitHub pull request. Users collaborate on testing and implementing this configuration to potentially merge into the project.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (13 messagesđŸ”„):

  • Impact of Datasets on PoSE: A member questioned if the choice of dataset significantly impacts the quality of context extension in PoSE. Another member responded, “I didn’t play around with the datasets much.”
  • Unsloth Optimizations for Llama: A member inquired if there was any reason not to use the Unsloth optimizations for Llama for a full finetune. Another member replied that the Unsloth cross entropy loss is fine for full finetuning.
  • Random Datasets Suffice for PoSE: When asked if a random dataset was good enough for PoSE, a member confirmed, “Yeah, good enough for niah, but honestly PoSE doesn’t seem to really scale up long context reasoning or understanding.”
  • Torchtune Optimizations: A member highlighted potential valuable optimizations from Torchtune pull request #993. They mentioned, “torchtune integration with axolotl is coming SOON.”
  • Future of HF Backend: Members discussed whether Torchtune would replace the HF backend or just be another option. One suggested, “Dismantle hf,” signaling a desire for significant change.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):

  • Continued pre-training yields illegal memory access: A user requested an example of using Axolotl for continued pre-training, noting that their attempt with a pretraining dataset resulted in out-of-vocab padding tokens leading to illegal memory access. They specified they do not want to change the vocab of the tokenizer and provided a sample configuration.

  • Issues with Mistral 7b fine-tuning: A user shared their challenge of fine-tuning Mistral 7b on their instruct data, observing that despite the loss dropping, the model “mixes things up and seems like it didn’t learn anything”. They mentioned that their configuration is based on this example, with a few custom adjustments.


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (37 messagesđŸ”„):

  • Phorm assists with ORPO format queries: Users inquired about the ORPO (Object-Role-Property-Operation) format. Though no specific implementation details were given, it was noted that ORPO is used for structuring data/operations and an example in Axolotl included its use in prompt strategies.

  • Weight decay clarified for LLM training: Members discussed weight decay, which acts as a regularization technique preventing overfitting by adding a penalty to the loss function. This ensures model weights remain small, leading to better generalization.

  • LoRA Dropout explained: LoRA Dropout helps in fine-tuning LLMs by introducing dropout in the low-rank adaptation matrices, preventing overfitting and improving generalization.

  • Gradient accumulation benefits LLM training: Gradient accumulation enables training with larger effective batch sizes without increasing memory usage, which is crucial for LLM stability and efficiency. This approach was exemplified using PyTorch and the Hugging Face Accelerator library.

  • Sample weights in Axolotl: A member sought to assign sample weights without customizing loss functions. It was recommended to use sample weights in the compute_loss method for custom loss handling, while cautioning that not all loss functions support this natively.

Links mentioned:


LangChain AI ▷ #general (69 messagesđŸ”„đŸ”„):

  • LangChain handling re-ranking and memory: A member asked about re-ranking results using a cross encoder behind an organizational proxy, wondering if it’s possible with OpenAI GPTs or Gemini models. Another member stressed the importance of implementing short-term memory like buffer memory for chatbots.
  • Guiding model responses in LangChain: One member inquired about setting specific questions in a React agent to guide the model for optimal answers. Another member clarified by suggesting using a custom prompt or template via the PromptTemplate function in LangChain, shared with a GitHub issue link.
  • LangChain for Swift developers: A member asked if LangChain is available for Swift developers working on iOS or macOS. Another member shared a GitHub link for LangChain Swift optimized for iOS, macOS, watchOS, and visionOS.
  • Handling SQL data in LangChain: One member working on summarizing call center calls discussed summarizing concepts across multiple calls and asked for ways to utilize SQL data as memory in LangChain. Another member recommended various integrations with SQL-like databases and shared a LangChain integrations link.
  • Langmem’s contextual capabilities: A member expressed amazement at Langmen’s ability to maintain contextual conversations when switching topics mid-session and shared YouTube videos demonstrating Langmem’s long-term memory and context management.

Links mentioned:


LangChain AI ▷ #langserve (2 messages):

  • Kenny Tang Shares $50 Steam Gift Link: KennyTang posted a link purportedly for a $50 Steam gift: steamcommunity.com/gift/50. The message tagged @everyone and @here.

LangChain AI ▷ #langchain-templates (2 messages):

  • Suspicious $50 Gift Link Shared: A user shared a link titled “Gift 50$” directing to steamcommunity.com. The link was shared multiple times and tagged everyone in the channel.

LangChain AI ▷ #share-your-work (7 messages):

  • Rubik’s AI offers free premium access: A member introduced an advanced research assistant and search engine, inviting beta testers with a two-month free premium access using the promo code RUBIX. Models offered include GPT-4 Turbo, Claude-3 Opus, and Mistral Large. Check it out.

  • LangServe blogpost shared: A member shared a link to their blog post about LangServe. What is LangServe?.

  • Questionable $50 Steam gift link: A member posted two messages with a potentially dubious $50 gift link on Steam. The link was here.

  • Affiliate program for ChatGPT Chrome Extension: Another member announced an affiliate program for their Easy Folders Chrome extension. Affiliates can earn a 25% commission while customers get a 10% discount. Register here and download the extension.

Links mentioned:


LangChain AI ▷ #tutorials (3 messages):

  • RAG-Fusion simplifies multi-query handling: A detailed message highlights the differences between RAG (single query) and RAG-Fusion (multi-query) and provides insights into integrating LangChain and GPT-4o for creating AI chatbots for document handling. Check out this YouTube tutorial for more information.
  • Questionable $50 Steam gift link posted: A link claiming to offer a $50 Steam gift was shared (suspicious link) and tagged to everyone. Caution is advised regarding such links.

Link mentioned: LangChain + RAG Fusion + GPT-4o Python Project: Easy AI/Chat for your Docs: #automation #rag #llm #ai #programming #gpt4o #langchain in this Video, I have a super quick tutorial for you showing how to create an AI for your PDF with L



Cohere ▷ #general (76 messagesđŸ”„đŸ”„):

  • Improving Discord Support: A member suggested changes to the support system on Discord, indicating long-standing issues with unanswered questions. Another member clarified that the channel operates more as a community-supported chat rather than official staff-supported system.

  • Rate Limit Issues with Trial API: A user experimenting with a RAG retriever faced a 403 error and speculated reaching the limit of the Trial API. Others mentioned that trial keys are rate-limited and not intended for production.

  • Free API Keys: Queries arose about acquiring free API keys and their limitations. A member confirmed that free keys are available but limited, suitable primarily for prototyping rather than production.

  • Translations with CommandR+: A user sought examples for using CommandR+ for translation. Another member recommended referencing the Chat API documentation for implementation details.

  • Portfolio vs. Production Use: Discussion occurred about hosting apps using Cohere AI on platforms like Vercel for portfolio purposes. It was clarified that execution within portfolios is generally considered prototyping and falls under free usage, whereas production involves commercial deployment and incurs costs.

Links mentioned:


Cohere ▷ #project-sharing (1 messages):

  • A Complete Guide to Cohere AI Published: A member announced their blog post titled “A Complete Guide to Cohere AI” on Analytics Vidhya. This guide covers the installation, setup, and usage of Cohere’s Enterprise AI platform, including a demo app available at Streamlit for practical understanding.

Links mentioned:


OpenInterpreter ▷ #general (41 messagesđŸ”„):

  • Hugging Face pledges $10 million in free GPUs: Hugging Face is committing $10 million in free shared GPUs to help developers create new AI technologies, aiming to assist small developers, academics, and startups. CEO Clem Delangue cited the company’s profitability and recent funding as enablers of this initiative, according to The Verge.

  • Successful Pi 5 installation reported: A member confirmed successfully running OpenInterpreter on a Pi 5 with Ubuntu without local models, using GPT4 for various tasks. Another user expressed interest in combining projects and received an offer for Azure credits to assist with the integration.

  • Platform tips and troubleshooting: Members shared tips on setting up OpenInterpreter using WSL, virtual environments, and various IDEs. One user solved issues with GPT-4o on OpenRouter by upgrading the litellm dependency, highlighting potential areas for improving OpenInterpreter’s default settings.

  • Event and streaming announcements: The community was invited to the first Accessibility Round Table event, aiming to discuss technology’s benefits for everyone. Additionally, a member announced a live stream on X for local development, encouraging others to join in.

  • Seeking project collaboration: A junior full-stack DevOps engineer sought help building a “lite 01” AI assistant module for simplifying daily tasks and providing discreet assistance in work environments. The request highlighted the need for comprehensive resources on DevOps tools and cloud computing.

Links mentioned:


OpenInterpreter ▷ #O1 (15 messagesđŸ”„):

  • Troubleshoot: Connection Issues Resolved Quickly: One user had an initial problem connecting to the app but successfully resolved it with a provided example format. Another member admired the app’s beauty and “native” feel, confirmed to be built in Swift.

  • Server Setup Tips for Windows Users: A member asked for advice on whether to run the server using Ubuntu in Windows or PowerShell. Another user shared their setup method, leveraging poetry to run OpenInterpreter with specific parameters and ensuring the correct local IP and port are used.

  • Clarifying Environment Use: A newcomer had questions about using Linux VM for OpenInterpreter. There was confirmation that it is feasible and will interact correctly with OpenInterpreter running directly on the host computer.

  • GitHub Resources Shared: A link to a GitHub repository was shared, highlighting a project related to running O1 on Flutter. The discussion included contributions and development guidance.

  • Community Projects and Assistance Requests: A member discussed their ongoing build of an O1 Lite device, with all parts and a 3D-printed case. Another user seeking help to develop an AI module for task simplification and remote assistance appealed for community support due to pre-order delays.

Link mentioned: GitHub - Tonylib/o1_for_flutter: Contribute to Tonylib/o1_for_flutter development by creating an account on GitHub.


OpenInterpreter ▷ #ai-content (6 messages):

  • Google DeepMind tunes into Google IO: A member shared a post from GoogleDeepMind discussing their involvement with Project Astra at #GoogleIO. Another member commented: “Google really is stepping up their game”.
  • Voice AI still has robotic limitations: There was a debate about the current state of voice AI, with one member stating, “voice is a bit too robotic,” suggesting it lags behind GPT-4’s capabilities.
  • Intriguing idea for AI voice interaction: A YouTube short shared by a member discussed a new idea for AI voice assistants, emphasizing their ability to interrupt users (YouTube video). One user humorously added that it was a “missed opportunity to make it moo.”

Mozilla AI ▷ #llamafile (26 messagesđŸ”„):

  • Segfault in RAG tutorial troubleshooting: A user encountered a segfault when querying their index while following the RAG tutorial. Key log message was “llama_get_logits_ith: invalid logits id 420, reason: no logits”; another user suggested checking the codebase, leading to a realization that the models were embeddings-only.

  • Llamafile embedding model clarification: It was clarified that the embeddings model linked in the tutorial cannot perform generation, which wasn’t immediately clear from the examples.

  • Cloud deployment discussions: Users discussed various cloud providers for running Llamafile with a preference for GPU-enabled services. vast.ai was recommended for experiments and short-lived workloads.

  • SQLite for vector search project: Alex Garcia introduced his project sqlite-vec, a SQLite extension for vector search, with the intention of integrating it into Llamafile. The project promises features like memory and semantic search, and already has beta release assets available.

Links mentioned:


MLOps @Chipro ▷ #events (9 messagesđŸ”„):

  • FOMO into LLM Fine-Tuning Course: A member shared their enthusiasm for joining an LLM Fine-Tuning course. The course promises hands-on experience with LLMs, covering topics from training to deployment, with workshops on evaluation, instrumentation, and prompt engineering.
  • Skepticism About Course Offerings: Another member expressed skepticism about the course, suggesting it might be “fluff” due to the promotional giveaways and the wide range of experts involved. They questioned the value versus the marketing tactics used to attract participants.
  • Mixed First Week Impressions: Feedback from participants about the first week of the course varied. One described it as “rather basic,” focusing on introductory topics like finding use cases for LLMs, which might depend heavily on participants’ prior experience.

Link mentioned: Mastering LLMs: End-to-End Fine-Tuning and Deployment by Dan Becker and Hamel Husain on Maven: All-time best selling course on Maven! Train, validate and deploy your first fine-tuned LLM


MLOps @Chipro ▷ #general-ml (7 messages):

  • MAPIE for Prediction Intervals: A member asked for recommendations on implementing prediction intervals and shared a link to MAPIE’s documentation. This tool is being explored for its utility in this context.

  • Valeriy Manokhin on Conformal Predictions: Another member suggested Valeriy Manokhin’s Medium for conformal predictions, noting Manokhin’s preference for Nixtla, which might be relevant for time series data.

  • Image Embeddings via Inpainting: A query was raised about deriving image embeddings using image inpainting or context encoding, comparing it to masked language modeling. This method involves predicting hidden parts of an image using the visible portions.

  • Multi-lingual Entity Extraction: Discussions evolved around the challenge of making multi-lingual entities like “University of California” and “Universidad de California” comparable. Suggestions included using contrastive learning and prefixing tasks with language identifiers, as seen in some strategies for query and document encoding.

  • Applying Ideas from Relevant Papers: A member recommended applying concepts from a recent paper on arxiv, mentioning it as a part of their ongoing work for multi-lingual entity extraction this week.

Link mentioned: MAPIE - Model Agnostic Prediction Interval Estimator — MAPIE 0.8.3 documentation: no description found


tinygrad (George Hotz) ▷ #general (7 messages):

  • Running YOLO model on comma device: A member asked if anyone tried to run a YOLO model on a comma device and mentioned that they are getting predictions in about ~1000ms. They didn’t provide further details on the specifics of the model version or optimizations used.

  • Polynomial degree limits for sin approximation: Members discussed the limitations of using high-degree polynomials for approximating the sine function. One user noted they are using a degree 11 polynomial with an error about 1e-8, but it doesn’t meet the test requirement of 1e-12 error, and they are contemplating increasing the degree despite performance concerns.

  • Accuracy concerns in polynomial approximations: Another user highlighted that for sine, periodicity helps to manage accuracy issues, but warned about significant accuracy loss when approximating functions like the logarithm and exponential. They advised using range reduction techniques to maintain accuracy but recognized the challenge in meeting high precision requirements without increasing computational complexity.


tinygrad (George Hotz) ▷ #learn-tinygrad (4 messages):

  • Question on bitshifting in tinygrad: A member asked if there is a more efficient way to bitshift in tinygrad other than using the expression x.e(BinaryOps.DIV, 2 ** 16).e(BinaryOps.MUL, 2 ** 16).

  • Unwrapping for loops in Metal compiler: Another member shared a code snippet and asked about where the Metal compiler decides to unwrap a for loop. They highlighted the generated Metal code for Tensor.arange(1, 32).

  • Comparison of generated code with Tensor.arange(1, 33): The same member demonstrated that using Tensor.arange(1, 33) instead of 32 results in significantly different generated Metal code, which includes the use of threadgroup variables and barriers.

  • Puzzling magic number 32: The member also questioned why the number 32 specifically results in different compilation behavior in the Metal compiler, pointing out a noticeable performance implication.


Datasette - LLM (@SimonW) ▷ #ai (6 messages):

  • Claude3 support for Squeak Smalltalk: One user floated the idea of adding Claude3 support to Squeak Smalltalk. Details about implementation or benefits were not discussed, but it signals growing interest in integrating advanced models with legacy programming environments.

  • GPT-4o Demo Voice Modes Explained: Another user shared that the voice in the GPT-4o demo was initially included in the version 1 Voice Mode and called Sky, speculating it was the default option. OpenAI paused its use following the realization it inadvertently resembled Scarlett Johansson’s voice, replacing it with a new feminine voice, Juniper.

  • Latency and Model Integration in Voice Mode: A user referenced an article detailing how previous Voice Mode versions used separate models for transcription, processing, and audio output, resulting in latency issues. GPT-4o now consolidates these features in a single model, enhancing emotional expression, although this has introduced complexity and potential unpredictability (source).

  • Concerns on AI Complexity and Prompt Injection: Additional discussion centered on how advanced capabilities, like those in GPT-4o, bring significant drawbacks, such as susceptibility to prompt injection. The increased complexity of new models may lead to unpredictable behavior and higher chances for user-annoying outputs, similar to the problems of legacy systems being overridden by new instructions.

  • Resilience in Fault-Tolerant Systems: Quoting Stainslaw Lem’s “The Upside-Down Evolution,” a user pointed out that while total reliability is unattainable, particularly in complex systems, building resilient infrastructures is key. They stressed that as systems evolve to be more fault-tolerant, new issues inevitably arise, echoing Lem’s notion of moving “from the frying pan to the fire.”


LLM Perf Enthusiasts AI ▷ #gpt4 (1 messages):

  • GPT-4o excels at complex legal reasoning: A member ran internal evaluations on GPT-4o for complex legal reasoning tasks, noting a non-trivial improvement over GPT-4 and GPT-4-Turbo. More details can be found in their LinkedIn post.

YAIG (a16z Infra) ▷ #ai-ml (1 messages):

  • Call for Contributors on Docker and AI: A member announced their plan to write an article focused on using Docker containers for training and deploying AI. They invited others to help, contribute, or review the draft, and asked interested individuals to DM them.