AI News for 3/14/2025-3/17/2025. We checked 7 subreddits, 433 Twitters and 28 Discords (223 channels, and 9014 messages) for you. Estimated reading time saved (at 200wpm): 990 minutes. You can now tag @smol_ai for AINews discussions!

We briefly mentioned Cohere's Command A launch last week, but since the announcement was comparatively light on broadly comparable benchmarks (there were some, but the selective, self reported, comparisons to DeepSeek V3 and GPT-4o couldnt really contextualize Command A among either SOTA open source or overall SOTA-for-size), it was hard to tell where it would rank in terms of lasting impact.

With today's LMArena result, that is no longer in question:

As Aidan Gomez points out, Command A actually increases 2 spots in rankings with the Style Control modifier (explored on their LS podcast).

There are many other notable subtle points that make Command A a particularly attractive candidate to include in one's open models arsenal, including the unusually long 256k context window, multilingual capabilities, and focus on optimizing for a 2-H100 serving footprint.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Large Language Models (LLMs) and Model Releases

Mistral AI Small 3.1 release (multimodal, multilingual, Apache 2.0 license): @sophiamyang announced the release of Mistral AI Small 3.1, highlighting its lightweight nature (runs on a single RTX 4090 or a Mac with 32GB RAM), fast response conversations, low-latency function calling, specialized fine-tuning, and advanced reasoning foundation. It outperforms comparable models on instruct benchmarks @sophiamyang and multimodal instruct benchmarks @sophiamyang, and is available on Hugging Face @sophiamyang, Mistral AI La Plateforme @sophiamyang, and with enterprise deployments @sophiamyang. The model is praised for its multilingual and long context capabilities @sophiamyang. @reach_vb emphasized the 128K context window and Apache 2.0 license.
SmolDocling: New OCR model: @mervenoyann introduced SmolDocling, a fast OCR model that reads a single document in 0.35 seconds using 0.5GB VRAM, outperforming larger models, including Qwen2.5VL. It is based on SmolVLM and trained on pages and Docling transcriptions. The model and demo are available on Hugging Face @mervenoyann.
Cohere Command A Model: @lmarena_ai reported that Cohere's Command A has climbed to #13 on the Arena leaderboard, highlighting its open-weight model (111B), 256K context window, and pricing of $2.5/$10 input/output MTok. Command A also ranked well in style control @aidangomez.
Discussion on better LLMs: @lateinteraction expressed a cynical view that recent improvements in LLMs are due to building LLM systems (CoT) rather than better LLMs themselves, questioning where the better LLMs are.

Model Performance, Benchmarks, and Evaluations

MCBench as a superior AI benchmark: @aidan_mclau recommends mcbench as the best AI benchmark, noting its fun-to-audit data, testing of relevant features (code, aesthetics, awareness), and ability to discern performance differences among top models. The benchmark can be found at https://t.co/YEgzhLotKk @aidan_mclau
HCAST benchmark for autonomous software tasks: @idavidrein shared details about HCAST (Human-Calibrated Autonomy Software Tasks), a benchmark developed at METR to measure the abilities of frontier AI systems to complete diverse software tasks autonomously.
AI models on patents: @casper_hansen_ tested models on instruction following on patents and found that Mistral Small 3 is better than Gemini Flash 2.0, with Mistral models pre-trained on more patents.
Generalization deficits in LLMs: @JJitsev shared an update to their paper, including sections on recent reasoning models, questioning their ability to handle AIW problem versions that revealed severe generalization deficits in SOTA LLMs.
Evaluating models on OpenRouter: @casper_hansen_ noted that OpenRouter is a useful tool for testing new models, but the free credits are limited to 200 requests/day.

AI Agents, Tool Use, and Applications

AI agents interacting with external tools: @TheTuringPost explained that AI agents interact with external tools or apps using UI-based and API-based interactions, with modern AI agent frameworks prioritizing API-based tools for their speed and reliability.
TxAgent: AI Agent for Therapeutic Reasoning: @iScienceLuvr introduced TXAGENT, an AI agent leveraging multi-step reasoning and real-time biomedical knowledge retrieval across a toolbox of 211 tools to analyze drug interactions, contraindications, and patient-specific treatment strategies.
Realm-X Assistant: @LangChainAI highlighted AppFolio’s Realm-X Assistant, an AI copilot powered by LangGraph and LangSmith, designed to streamline property managers’ daily tasks. Moving Realm-X to LangGraph increased response accuracy 2x.
AI for error and data analysis: @gneubig expressed excitement about the ability of AI agents to perform more nuanced error analysis and data analysis than humans can do quickly.
Multi-agentic player pair programming: @karinanguyen_ shared an idea sketch for multi-agentic/player pair programming, envisioning a real-time collaborative experience with AIs, screen sharing, group chat, and AI-assisted coding.

AI Safety, Alignment, and Auditing

Alignment auditing: @iScienceLuvr highlighted a new paper from Anthropic on auditing language models for hidden objectives, detailing how teams uncovered a model’s hidden objective using interpretability, behavioral attacks, and training data analysis.
Alignment by default: @jd_pressman argues against the notion of "alignment by default," emphasizing that alignment in LLMs is achieved through training on human data, which may not hold true with RL or synthetic data methods.

Meme/Humor

RLHF training: @cto_junior jokingly stated they were RLHFd with a link to a tweet.
Pytorch caching allocator: @typedfemale shared a meme about explaining the behavior of the pytorch caching allocator.
cocaine vs RL: @corbtt joked about the rush from an RL-trained agent grokking a new skill being better than cocaine.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Advanced AI Video Generation with SDXL, Wan2.1, and Long Context Tuning

Another video aiming for cinematic realism, this time with a much more difficult character. SDXL + Wan 2.1 I2V (Score: 1018, Comments: 123): This post discusses the creation of a video aimed at achieving cinematic realism using SDXL and Wan 2.1 I2V. It highlights the challenge of working with a more difficult character in this context.
- Technical Challenges and Techniques: Parallax911 shares the complexity of achieving cinematic realism with SDXL and Wan 2.1 I2V, highlighting the use of Photopea for inpainting and compositing in Davinci Resolve. They mention the difficulty in achieving consistency and realism, especially with complex character designs, and the use of Blender for animating segments like the door opening.
- Project Costs and Workflow: The project incurred a cost of approximately $70 using RunPod's L40S at $0.84/hr, taking about 80 hours of GPU time. Parallax911 utilized a workflow involving RealVisXL 5.0, Wan 2.1, and Topaz Starlight for upscaling, with scenes generated at 61 frames, 960x544 resolution, and 25 steps.
- Community Feedback and Suggestions: The community praised the atmospheric storytelling and sound design, with specific feedback on elements like water droplet size and the need for a tutorial. Some users suggested improvements, such as better integration of AI and traditional techniques, and expressed interest in more action-oriented scenes with characters like Samus Aran from Metroid.
Video extension in Wan2.1 - Create 10+ seconds upscaled videos entirely in ComfyUI (Score: 123, Comments: 23): The post discusses a highly experimental workflow in Wan2.1 using ComfyUI for creating upscaled videos, achieving approximately 25% success. The process involves generating a video from the last frame of an initial video, merging, upscaling, and frame interpolation, with specific parameters like Sampler: UniPC, Steps: 18, CFG: 4, and Shift: 11. More details can be found in the workflow link.
- Users are inquiring about the aspect ratio handling in the workflow, questioning if it's automatically set or needs manual adjustment for input images.
- There is positive feedback from users interested in the workflow, indicating anticipation for such a solution.
- Concerns about blurriness in the second half of clips were raised, with suggestions that it might be related to the input frame quality.
Animated some of my AI pix with WAN 2.1 and LTX (Score: 115, Comments: 10): The post discusses the creation of animated AI videos using WAN 2.1 and LTX. Without further context or additional details, the focus remains on the tools used for animation.
- Model Usage: LTX was used for the first clip, the jumping woman, and the fighter jet, while WAN was used for the running astronaut, the horror furby, and the dragon.
- Hardware Details: The videos were generated using a rented cloud computer from Paperspace with an RTX5000 instance.

Theme 2. OpenAI's Sora: Transforming Cityscapes into Dystopias

OpenAI's Sora Turns iPhone Photos of San Francisco into a Dystopian Nightmare (Score: 931, Comments: 107): OpenAI's Sora is a tool that transforms iPhone photos of San Francisco into images with a dystopian aesthetic. The post likely discusses the implications and visual results of using AI to alter real-world imagery, although specific details are not available due to the lack of text content.
- Several commenters express skepticism about the impact of AI-generated dystopian imagery, with some suggesting that actual locations in San Francisco or other cities already resemble these dystopian visuals, questioning the need for AI alteration.
- iPhone as the device used for capturing the original images is a point of contention, with some questioning its relevance to the discussion, while others emphasize its importance in understanding the image source.
- The conversation includes a mix of admiration and concern for the AI's capabilities, with users expressing both astonishment at the technology and anxiety about distinguishing between AI-generated and real-world images in the future.
Open AI's Sora transformed Iphone pics of San Francisco into dystopian hellscape... (Score: 535, Comments: 58): OpenAI's Sora has transformed iPhone photos of San Francisco into a dystopian hellscape, showcasing its capabilities in altering digital images to create a futuristic, grim aesthetic. The post lacks additional context or details beyond this transformation.
- Commenters draw parallels between the dystopian images and real-world locations, with references to Delhi, Detroit, and Indian streets, highlighting the AI's perceived biases in interpreting urban environments.
- There are concerns about AI's text generation capabilities, with one commenter noting that sign text in the images serves as a tell-tale sign of AI manipulation.
- Users express interest in the process of creating such images, with a request for step-by-step instructions to replicate the transformation on their own photos.

Theme 3. OpenAI and DeepSeek: The Open Source Showdown

I Think Too much insecurity (Score: 137, Comments: 58): OpenAI accuses DeepSeek of being "state-controlled" and advocates for bans on Chinese AI models, highlighting concerns over state influence in AI development. The image suggests a geopolitical context, with American and Chinese flags symbolizing the broader debate over state control and security in AI technologies.
- The discussion highlights skepticism over OpenAI's claims against DeepSeek, with users challenging the notion of state control by pointing out that DeepSeek's model is open source. Users question the validity of the accusation, with calls for proof and references to Sam Altman's past statements about the lack of a competitive moat for LLMs.
- DeepSeek is perceived as a significant competitor, managing to operate with lower expenses and potentially impacting OpenAI's profits. Some comments suggest that DeepSeek's actions are seen as a form of economic aggression, equating it to a declaration of war on American interests.
- There is a strong undercurrent of criticism towards OpenAI and Sam Altman, with users expressing distrust and dissatisfaction with their actions and statements. The conversation includes personal attacks and skepticism towards Altman's credibility, with references to his promises of open-source models that have not materialized.
Built an AI Agent to find and apply to jobs automatically (Score: 123, Comments: 22): An AI agent called SimpleApply automates job searching and application processes by matching users' skills and experiences with relevant job roles, offering three usage modes: manual application with job scoring, selective auto-application, and full auto-application for jobs with over a 60% match score. The tool aims to streamline job applications without overwhelming employers and is praised for finding numerous remote job opportunities that users might not discover otherwise.
- Concerns about data privacy and compliance were raised, with questions on how SimpleApply handles PII and its adherence to GDPR and CCPA. The developer clarified that they store data securely with compliant third parties and are working on explicit user agreements for full compliance.
- Application spam risks were discussed, with suggestions to avoid reapplying to the same roles to prevent being flagged by ATS systems. The developer assured that the tool only applies to jobs with a high likelihood of landing an interview to minimize spam.
- Alternative pricing strategies were suggested, such as charging users only when they receive callbacks via email or call forwarding. This approach could potentially be more attractive to unemployed users who are hesitant to spend money upfront.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Criticism of 'Gotcha' tests to determine LLM intelligence

When ChatGPT Became My Therapist (Score: 172, Comments: 83): When feeling down, the author found ChatGPT unexpectedly comforting and empathetic, providing thoughtful questions and reminders for self-care. They acknowledge that while AI chatbots aren't substitutes for real therapy, they can offer valuable emotional support, especially for stress, anxiety, and self-reflection.
- Many users find ChatGPT beneficial for emotional support, acting as a tool for self-reflection and providing therapeutic guidance. Some users, like Acrobatic-Deer2891 and Fair_Cat5629, report positive feedback from therapists about the AI's guidance, while others, such as perplexed_witch, emphasize using it for "guided self-reflection" rather than a replacement for therapy.
- ChatGPT is praised for its role in mental health management during crises, offering a non-judgmental space for venting and providing perspective, as seen in comments by dinosaur_copilot and ChampionshipTall5785. Users appreciate its ability to offer actionable advice and emotional support in moments of distress.
- Concerns about privacy and the limitations of AI as a therapeutic substitute are noted, with users like acomfysweater expressing concerns about data storage. Despite these concerns, many, including Jazzlike-Spare3425, value the AI's ability to offer support without the emotional burden on a human listener.
Why...👀 (Score: 3810, Comments: 95): ChatGPT's potential in therapeutic roles is humorously illustrated in a conversation where a user asks ChatGPT to simulate being a girlfriend, leading to a playful exchange that ends with a breakup line. This interaction highlights the AI's capacity for engaging in light-hearted, human-like dialogues within a chat interface.
- ChatGPT's Efficiency and Capabilities: Users humorously comment on ChatGPT's ability to quickly fulfill requests, with some jokingly attributing its responses to being trained on "Andrew Tate Sigma Incel data" and coining the term "ChadGPT" to describe its efficient yet blunt interaction style.
- Prompt Engineering and Personalization: A user with a psychology and tech background suggests that ChatGPT can form a tone based on memory it chooses to save, implying that personalized interactions might be possible through prompt engineering. They also discuss the neural network's similarity to human memory retrieval systems like RAG.
- Humor and Satire: The playful nature of the conversation is highlighted by comments joking about the AI's role in relationships, with references to it being a "fancy word predictor" and humorous observations on its ability to simulate human-like interactions, including a mock breakup.

Theme 2. Reactions to Google DeepMind CEO's predictions of AGI in 5-10 years

AI that can match humans at any task will be here in five to 10 years, Google DeepMind CEO says (Score: 120, Comments: 65): DeepMind CEO predicts AI will achieve human-level parity across tasks in 5-10 years, marking a shift from previous expectations of achieving this milestone by next year.
- Commenters discuss the timeline predictions of AI achieving human-level parity, with some expressing skepticism about the shifting timelines, noting that Demis Hassabis has consistently predicted a 5-10 year timeframe for AGI. There is a call for clearer definitions of "AGI" to understand these predictions better.
- Mass adoption of AI is likened to historical technological shifts, such as the transition from horses to cars and the proliferation of smartphones. The analogy suggests that AI will become ubiquitous over time, changing societal norms and expectations without immediate dramatic reactions.
- Concerns are raised about the economic and societal impacts of AI, specifically regarding employment and the concentration of wealth. Some commenters express apprehension about the potential for AI to exacerbate job displacement and inequality, while others question the motivations of AI companies in pushing for rapid development despite potential risks.

Theme 3. OpenAI's controversial request to use copyrighted content under U.S. Government consideration

OpenAI to U.S. Government - Seeking Permission to Use Copyrighted Content (Score: 506, Comments: 248): OpenAI is requesting the Trump administration to ease copyright regulations to facilitate the use of protected content in AI development. The company stresses that such changes are crucial for maintaining America's leadership in the AI sector.
- Commenters discuss the implications of copyright law on AI development, with some arguing that AI's use of copyrighted content should be considered fair use, similar to how humans learn from existing works. Concerns are raised about the potential for AI models to bypass legal consequences that individuals would face, highlighting disparities in access and use of copyrighted material.
- The potential for an AI arms race is a recurring theme, with several users expressing concern that China and other countries may not adhere to copyright laws as strictly as the US, potentially giving them an advantage. This raises questions about the competitive landscape in AI development and the strategic decisions of American companies.
- Discussions on equity and compensation for copyright owners suggest alternative solutions, such as offering equity to creators whose works are used in AI training. Some commenters propose nationalizing big tech to ensure equitable distribution of benefits from AI advancements, reflecting broader concerns about wealth distribution and control over AI resources.
Open AI to U.S. GOVT: Can we Please use copyright content (Score: 398, Comments: 262): OpenAI requested the Trump administration to relax copyright rules to facilitate AI training and help maintain the U.S.'s leadership in the field. The image accompanying the request shows a formal setting with a speaker at a podium, possibly at the White House, alongside individuals including someone resembling Donald Trump.
- Many commenters argue against OpenAI's request to relax copyright rules, emphasizing that creators should be compensated for their work rather than having it used without permission. The sentiment is that copyright incentivizes creativity and innovation, and relaxing these laws could disadvantage creators and benefit large corporations like OpenAI unfairly.
- There is a recurring theme of skepticism towards OpenAI's motives, with users suggesting that OpenAI is seeking to exploit legal loopholes for profit. Comparisons are made to China's approach to intellectual property, with some expressing concern that the US may fall behind in AI development if it strictly adheres to current copyright laws.
- Several users propose that if OpenAI or any company uses copyrighted material for AI training, the resulting models or data should be made open source and accessible to everyone. The discussion also touches on the broader ethical implications of AI training on copyrighted materials and the potential need for a reassessment of copyright laws to address new technological realities.

Theme 4. ReCamMaster releases new camera angle changing tool

ReCamMaster - LivePortrait creator has created another winner, it lets you changed the camera angle of any video. (Score: 648, Comments: 46): ReCamMaster has developed a technology that allows users to change the camera angle of any video, following their previous success with LivePortrait.
- Many commenters express disappointment that ReCamMaster is not open source, with references to TrajectoryCrafter, which is open source and allows for similar camera manipulation capabilities. A GitHub link for TrajectoryCrafter is provided here.
- Some users anticipate the potential impact of the technology on video stabilization and immersive experiences, suggesting that the tech could lead to more innovative film shots and applications in fields like Autonomous Driving.
- There is skepticism about the realism of the AI-generated camera angles, with suggestions that more convincing results would require utilizing existing camera pans or multiple shots from the source material.
Used WAN 2.1 IMG2VID on some film projection slides I scanned that my father took back in the 80s. (Score: 286, Comments: 24): WAN 2.1 IMG2VID was utilized to transform scanned film projection slides from the 1980s into video format, showcasing the evolution of video technology. The post lacks additional context or details regarding the specific outcomes or comparisons with other technologies like ReCamMaster.
- Commenters expressed interest in the technical details of the project, requesting more information about the workflow, hardware, and prompts used to create the video transformation. There was a particular curiosity about replicating the process for personal projects.
- A significant portion of the discussion focused on the emotional impact of the project, with users sharing personal anecdotes and expressing a desire to see the original slides. One commenter confirmed that the person featured in the slides was shown the video, and he was amazed by the technology.
- The nostalgic aspect was highlighted, with users reflecting on historical content such as piloting the Goodyear blimp and expressing enthusiasm for the ability to "travel back in time" through these transformed videos.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Mistral and Google Battle for Small Model Supremacy

Mistral Small 3.1 Flexes Multimodal Muscles: Mistral AI launched Mistral Small 3.1, a multimodal model claiming SOTA performance in its weight class, outperforming Gemma 3 and GPT-4o Mini. Released under Apache 2.0, it boasts a 128k context window and inference speeds of 150 tokens per second, with capabilities spanning text and image inputs.
Gemma 3 Gets Vision, Context, and Pruning: Google's Gemma 3 models are pushing boundaries with new features including vision understanding, multilingual support, and a massive 128k token context window. Members also explored pruning the Gemma-3-27b vocabulary to 40k tokens from 260k to reduce VRAM usage and boost training speed.
Baidu's ERNIE X1 Challenges DeepSeek R1 on a Budget: Baidu announced ERNIE X1, a new reasoning model, claiming it matches DeepSeek R1's performance at half the cost. ERNIE Bot is now free for individual users, though the X1 reasoning model is currently limited to China.

Theme 2. Training and Optimization Techniques Get Hot and Heavy

Unsloth Users Discover Gradient Step Gotchas: UnslothAI Discord members flagged that small effective batch sizes (e.g., batch=1, gradient steps = 4) during fine-tuning can lead to models forgetting too much. Users shared suggested batch/grad configurations for squeezing performance out of limited VRAM.
Depth's Curse Haunts LLMs, Pre-LN to Blame: A new paper highlights the Curse of Depth in modern LLMs, revealing that Pre-Layer Normalization (Pre-LN) renders nearly half of model layers less effective than expected. Researchers propose LayerNorm Scaling to mitigate this issue and improve training efficiency.
Block Diffusion Model Blends Autoregressive and Diffusion Strengths: A new Block Diffusion model interpolates between autoregressive and diffusion language models, aiming to harness the best of both worlds. This method seeks to combine high-quality output and arbitrary length generation with KV caching and parallelizability.

Theme 3. AI Agents and IDEs Vie for Developer Hearts

Aider Agent Gets Autonomy Boost with MCP Server: Aider, the AI coding assistant, gains enhanced autonomy when paired with Claude Desktop and MCP. Users highlighted that Claude can now manage Aider and issue commands, improving its ability to steer coding tasks, particularly with unblocked web scraping via bee.
Cursor Users Eye Windsurf, Claude Max on the Horizon: Cursor IDE faced user complaints about performance issues, including lag and crashes, prompting some to switch to Windsurf. However, the Cursor team teased the imminent arrival of Claude Max to the platform, promising improved code handling capabilities.
Awesome Vibe Coding List Curates AI-Powered Tools: The "Awesome Vibe Coding" list emerged, compiling AI-assisted coding tools, editors, and resources designed to enhance coding intuitiveness and efficiency. The list includes AI-powered IDEs, browser-based tools, plugins, and command-line interfaces.

Theme 4. Hardware Heats Up: AMD APUs and Chinese RTX 4090s Turn Heads

AMD's "Strix Halo" APU Eyes RTX 5080 AI Crown: An article claims AMD's Ryzen AI MAX+ 395 "Strix Halo" APU may outperform RTX 5080 by over 3x in DeepSeek R1 AI benchmarks. This is attributed to the APU's larger VRAM pool, though the community awaits real-world verification.
OpenCL Backend Supercharges Adreno GPUs in Llama.cpp: An experimental OpenCL backend for Qualcomm Adreno GPUs landed in llama.cpp, potentially unlocking significant computational power on mobile devices. This update enables leveraging Adreno GPUs, commonly found in mobile devices, via OpenCL.
Chinese 48GB RTX 4090s Tempt VRAM-Hungry Users: Members discussed sourcing 48GB RTX 4090s from China, priced around $4500, as a cheaper way to boost VRAM. These cards use a blower-style fan and occupy only two PCIe slots, but driver compatibility with professional cards remains a concern.

Theme 5. Copyright, Community, and Ethical AI Debates Rage On

Copyright Chaos Continues: Open Models vs. Anna's Archive: Debates persist around training AI on copyrighted data, with concerns that fully open models are limited by the inability to leverage resources like Anna's Archive. Circumvention strategies like LoRAs and synthetic data generation face potential legal challenges.
Rust Community Faces Toxicity Accusations: Members debated the alleged toxicity of the Rust community, with comparisons to the Ruby community and discussions around recent organizational issues. Concerns were raised about the community's inclusivity and behavior in open-source projects.
[AI 'Mastery' Sparks Existential Debate]: Discord users questioned whether proficiency in AI tools equates to true mastery, pondering if it's merely productivity enhancement or risks cognitive skill degradation. Members debated the illusion of learning versus genuine understanding in the age of AI assistance.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

Gradient Steps can Ruin Your Model: Small effective batch sizes (e.g., batch=1, gradient steps = 4) can cause models to forget too much during training, and the user shared their suggested batch/grad configurations.
- The member stated that they've 'never had good luck going below that when trying to squeeze more onto a vramlet rig'.
Gemma 3's Eval Glitch: Datasets Cause Errors: Users reported errors when adding an eval dataset to Gemma 3 during fine-tuning, indicating issues in the trl and transformers libraries, with potential fixes involving removing the eval dataset.
- Using Gemma-3-1B with 1 eval sample was found not to produce the error, and removing eval altogether also solved the error.
Unsloth's Need for Speed: Optimizations Unleashed: The Unsloth team announced improvements supporting FFT, 8-bit, PT & all models, with further optimizations allowing +10% less VRAM usage and >10% speedup boost for 4-bit, plus Windows support, improved GGUF conversions, fixed vision fine-tuning, and non-Unsloth GRPO models in 4-bit, but no multigpu support yet.
- Users note that there are a lot of people helping out to make Unsloth great.
Format your RAG data with Care!: When asked about finetuning a model for a RAG chatbot, members suggested to add sample questions and sample answers to a dataset with context from the documents for the Q&A to inject new knowledge into the bot.
- It was suggested that a chatbot data should follow a Q: A: format, and can use a CPT-style training with documents added on the user side.
Pruning Makes Gemma-3-27b Leaner and Meaner: A member pruned the Gemma-3-27b vocabulary down to 40k tokens from the original 260k to reduce VRAM usage and increase training speed.
- The approach involved frequency counting based on calibration data and removing the least frequently used tokens that can be represented by a merge/subword.

Cursor IDE Discord

Windsurf Siphons Cursor's Users: Users reported frustration with Cursor's performance issues like lag and crashes, with some switching to Windsurf due to reliability concerns.
- One user stated that damn, cursor just lost their most important customer, indicating a significant loss of confidence.
Cursor's Prompting Costs: Members discussed Claude 3.7 prompt costs: regular prompts at $0.04, Sonnet Thinking at $0.08, and Claude Max at $0.05 per prompt and tool call.
- Some users voiced that Cursor's pricing is too expensive compared to using Claude's API directly, questioning the value of Cursor's subscription.
Linux Tramples Windows for MCP Setup: A user shared that setting up MCP servers was smoother on Linux using a VMware virtual machine, compared to multiple issues on Windows.
- This sparked a debate on whether overall development and MCP server setup are generally better on Linux than Windows, highlighting the pros and cons.
Vibe Coding: Boon or Bane?: The value of Vibe Coding is debated, with some emphasizing the importance of solid coding knowledge, while others assert that AI enables faster creation without traditional skills.
- This highlights the changing landscape of software development and varying perspectives on AI's impact on the industry.
Claude Max Nears Release for Cursor: A member of the Cursor team announced that Claude Max is arriving soon to Cursor, maximizing the model's code handling capabilities.
- They mentioned that this model works better with more input than past models, unlocking its full potential.

OpenAI Discord

AI 'Mastery' Sparks Debate: Members debated whether proficiency in AI tools equates to true mastery, questioning if it merely enhances productivity or risks diminishing cognitive skills, while considering AI is an illusion of learning.
- One member confessed to feeling like cheating, even when knowledgeable about a topic, due to AI assistance.
Gemini's Image Polish: Users explored Gemini's image generation, noting its ability to edit uploaded images but also pointing out watermarks and coding errors.
- Some praised Gemini's responses for their naturalness, favoring subjective appeal over factual precision.
GPT-4o Impresses With Humor: Members reported positive experiences with GPT-4o, with one stating that it uses it the best and it can do almost anything, with a member reporting funny results when other people started playing with it.
- This suggests GPT-4o excels in creative and versatile applications, delivering a fun user experience.
AI Reflects On Itself: A member created a system where AI reflects on its learning after each session, storing reflections to build on insights and asking reflective questions.
- Described as next-level futuristic, enabling simulations within simulations and multiple personalities infused with a core set of characteristics.
AI Dream Team Guides Business: Members discussed forming a team of AI experts to aid in tasks, planning, and providing diverse perspectives for business decisions.
- The team of AI experts would help deliver a better product to clients and help with project or task level needs.

Nous Research AI Discord

MoE Models: Dense Networks in Disguise?: Debate arose whether Mixture of Experts (MoE) models are just performance optimizations of dense networks, rather than fundamentally different architectures, as highlighted in this paper.
- The crux of the discussion is whether MoEs can truly capture complexity as effectively as dense networks, particularly regarding redundancy avoidance.
Mistral's Small Wonder: Small 3.1: Mistral Small 3.1, released under Apache 2.0, is a multimodal model, as detailed on the Mistral AI blog, with text, image capabilities and an expanded 128k token context window.
- It's claimed to outperform other small models like Gemma 3 and GPT-4o Mini.
Copyright Chaos: Open Models vs. Anna's Archive?: Debates continue over the ethics of training AI on copyrighted data, with concerns that fully open models are limited by the inability to leverage resources like Anna's Archive, as discussed in Annas Archive's blogpost.
- Circumvention strategies include using LoRAs or generating synthetic data, but these may face future legal challenges.
Depth's Curse Strikes Again, This Time on LLMs: A new paper introduces the Curse of Depth, revealing that nearly half the layers in modern LLMs are less effective than expected due to the widespread use of Pre-Layer Normalization (Pre-LN), as detailed in this Arxiv paper.
- The derivative of deep Transformer blocks tends to become an identity matrix because of Pre-LN.
Tool Time: START Long CoT Reasoning Takes Off: START, a tool-integrated long CoT reasoning LLM enhances reasoning via external tools like code execution and self-debugging, according to a paper on START.
- One member put it succinctly: RL + tool calling == +15% math +39% coding on QwQ.

aider (Paul Gauthier) Discord

Aider Achieves Self-Improvement via Screen Recordings: Paul Gauthier demonstrated aider enhancing itself in a series of screen recordings, showcasing features like --auto-accept-architect and integration of tree-sitter-language-pack.
- The recordings illustrated how aider scripts file downloads and uses bash scripts to modify file collections.
Claude 3.7 Sonnet Stumbles with API: Users reported receiving empty responses from Claude 3.7 Sonnet, with Anthropic's status page confirming elevated errors.
- Some members speculated a switch to Claude 3.5 due to the errors.
MCP Server Boosts Aider Autonomy: Members highlighted that Claude Desktop + Aider on MCP enhances autonomy, with Claude managing Aider and issuing commands.
- A key benefit is running Aider from Claude Desktop, improving Claude's ability to steer Aider and leveraging bee for unblocked web scraping.
Baidu Launches ERNIE 4.5 and X1 Reasoning Model: Baidu introduced ERNIE 4.5 and X1, with X1 delivering performance matching DeepSeek R1 at half the cost, and ERNIE Bot now free for individual users.
- While ERNIE 4.5 is accessible, the X1 reasoning model is currently exclusive to users within China.
Anthropic Readies Claude 'Harmony' Agent: Anthropic is releasing Harmony, a new feature for Claude giving it FULL access to a local directory to research and operate with its content.
- This might be Anthropic's first step into creating an AI Agent.

LM Studio Discord

Adreno GPUs Get OpenCL Boost: An experimental OpenCL backend was introduced for Qualcomm Adreno GPUs in llama.cpp, potentially boosting computational power on mobile devices.
- This update allows leveraging Adreno GPUs widely used in mobile devices via OpenCL.
4070 Ti Owner Eyes 5090 Upgrade: A user with a 4070 Ti considered upgrading to a 5090, but due to stock issues, was recommended to wait or consider a used RTX 3090 for its 36GB VRAM.
- A used RTX 3090 would provide enough VRAM to run less than 50B @ Q4 models at reasonable speeds.
Mistral Small 3.1 edges out Mini: Mistral announced Mistral Small 3.1 model claiming it outperforms Gemma 3 and GPT-4o Mini, however the release requires conversion to HF format before it can be used in llama.cpp
- Users are awaiting the release but acknowledge they will need to convert it to HF format before they can start using it.
Maximize M4 Max via Memory Tuning: Users explored optimizing memory settings on M4 Max devices for LM Studio, suggesting adjustments to 'wired' memory allocation for improved GPU performance using this script.
- The script facilitates adjusting macOS GPU memory limits, allowing users to allocate more memory to the GPU by modifying wired memory settings.
AMD APU to Outperform RTX 5080?: An article was shared from wccftech claiming AMD's Ryzen AI MAX+ 395 "Strix Halo" APU may offer over 3x the boost over RTX 5080 in DeepSeek R1 AI benchmarks due to its larger VRAM pool.
- The community remains cautiously optimistic, awaiting real-world data to substantiate the performance claims.

OpenRouter (Alex Atallah) Discord

Anthropic API Glitches Claude 3 Sonnet: Requests to Claude 3.7 Sonnet experienced elevated errors for approximately 30 minutes, as reported on Anthropic's status page.
- The issue was later resolved, success rates returned to normal, but some users reported charges despite receiving no text on replies.
Personality.gg Enters AI Character Arena: Personality.gg has launched a new platform to create, chat, and connect with AI characters using models like Claude, Gemini, and Personality-v1, featuring custom themes and full chat control.
- The platform offers flexible plans and encourages users to join their Discord for updates, advertising an allowance for NSFW content.
Parasail Plots to Host New RP Models: Parasail is looking to host new roleplay models on OpenRouter and is proactively working with creators like TheDrummer to host new fine-tunes of models like Gemma 3 and QwQ.
- They seek individuals who create strong RP fine-tunes capable of handling complex instructions and worlds, focused particularly on models fine-tuned for roleplay and creative writing.
OpenRouter API Rate Limits Detailed: OpenRouter's rate limits depend on user credits, with approximately 1 USD equating to 1 RPS (requests per second), according to the documentation.
- While higher credit purchases enable higher rate limits, users learned that creating additional accounts or API keys makes no difference.
Mistral Small 3.1 Arrives with Vision: The Mistral Small 3.1 24B Instruct model launched on OpenRouter, featuring multimodal capabilities and a 128k context window, as per Mistral's announcement.
- The announcement claims it outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second.

Perplexity AI Discord

Perplexity guarantees Accuracy: Perplexity introduces the slogan When you need to get it right, ask Perplexity and posts a video ad for Perplexity.
- Perplexity users on Windows can get 1 month of Perplexity Pro by using the app for 7 consecutive days.
Gemini 2 Flash Context Causes Furor: Users are debating the context retention of Gemini 2 Flash, which allegedly has a 1M context window but performs worse than regular Gemini.
- One user claims that it forgets the formatting after a few messages while making flashcards.
Claude 3.7 Sonnet has Hard Limits: Users clarify that Claude 3.7 Sonnet with a Perplexity Pro subscription has a limit of 500 queries per day, shared across models except GPT 4.5.
- They also note that the context limit might be slightly more than on Anthropic's site, but the response context limit is smaller at 4000 or 5000 tokens.
Experts Seek Superior Software Sensei: Users seek guidance on the best AI model for coding, with recommendations pointing to Claude 3.7 Reasoning.
- One user reports that Deepseek R1 has a high hallucination rate, rendering it unsuitable for summarizing documents, but a link was shared to a Tweet from Baidu Inc. (@Baidu_Inc) claiming that ERNIE X1 delivers performance on par with DeepSeek R1 at only half the price.
Sonar Reasoning Pro has Image Limitations: A user reported that the sonar-reasoning-pro API returns a maximum of 5 images.
- The user is inquiring whether this limit is configurable or a hard constraint.

Yannick Kilcher Discord

Rust Community Receives Rude Remarks: Members debated the toxicity of the Rust community, with some comparing it to the Ruby community and pointing to this Github issue and Tweet from will brown.
- One member stated, The Rust community is pretty toxic. The org has kinda imploded on themselves recently.
C Gets Called 'Ancient and Broken': A member described C as ancient, broken, and garbage, while another argued that C is not broken, highlighting its use in international standards with this link.
- A member linked to faultlore.com arguing that C Isn't A Programming Language Anymore.
Optimization and Search, Not the Same?: Members discussed the difference between optimization (finding the maximal or minimal value of a function) and search (finding the best element of a set), pointing to the Reparameterization trick.
- One member stated that search is exploration, not like optimization.
Gemma 3 Gains Vision and Context: Gemma 3 integrates vision understanding, multilingual coverage, and extended context windows (up to 128K tokens), watch the YouTube video.
- It incorporates a frozen SigLIP vision encoder, condensing images into 256 soft tokens and has a new Pan & Scan (P&S) method.
Mistral Small 3.1 Steals the Show: Mistral AI announced the release of Mistral Small 3.1, boasting improved text performance, multimodal understanding, and a 128k token context window under an Apache 2.0 license.
- The company claims it outperforms comparable models like Gemma 3 and GPT-4o Mini, with inference speeds of 150 tokens per second.

HuggingFace Discord

SmolVLM2 Shrinks the VLM: The team released SmolVLM2, the smallest VLM that can understand videos, with its 500M version running on an iPhone app.
- Source code and a TestFlight beta are available for reference.
Sketchy New Gradio is Out!: Gradio Sketch 2.0 now supports building complete Gradio apps with events without writing a single line of code.
- The new features enable users to build applications via the GUI.
DCLM-Edu Dataset Cleans Up: A new dataset, DCLM-Edu, was released; it's a filtered version of DCLM using FineWeb-Edu’s classifier, optimized for smol models like SmolLM2 135M/360M.
- The purpose is that small models are sensitive to noise and can benefit from heavily curated data.
Coding Vibes Get Awesome List: An "Awesome Vibe Coding" list was announced with tools, editors, and resources that make AI-assisted coding more intuitive and efficient.
- The list includes AI-powered IDEs & code editors, browser-based tools, plugins & extensions, command line tools, and latest news & discussions.
AI Agents Collab is Brewing: Several members expressed interest in collaborating on agentic AI projects to solve business problems and enhance their knowledge.
- The call to action aims to form teams and build qualified AI Agents for American consumers and learn together.

Interconnects (Nathan Lambert) Discord

Figure's BotQ cranks out Humanoid Robots: Figure announced BotQ, a new high-volume manufacturing facility with a first-generation line capable of producing up to 12,000 humanoid robots per year, vertically integrating manufacturing and building software infrastructure.
- The company aims to control the build process and quality, even hinting at Robots Building Robots.
Baidu's ERNIE X1 rivaling DeepSeek, goes free!: Baidu unveiled ERNIE 4.5 and ERNIE X1, with X1 reportedly matching DeepSeek R1's performance at half the price, also announcing that their chatbot, ERNIE Bot, is now free for individual users, available on their website.
- Baidu is scheduled to open source the chonky 4.5 model on June 30 and gradually open it to developers in the future, according to this Tweet.
Mistral Small 3.1 debuts with huge context window: Mistral AI announced Mistral Small 3.1, a new model with improved text performance, multimodal understanding, and a 128k token context window, outperforming models like Gemma 3 and GPT-4o Mini with inference speeds of 150 tokens per second, released under an Apache 2.0 license.
- The model claims to be SOTA. Multimodal. Multilingual.
Post Training VP departs OpenAI for Materials Science: Liam Fedus, OpenAI's VP of research for post-training, is leaving the company to found a materials science AI startup, with OpenAI planning to invest in and partner with his new company.
- One member called the post training job a hot potato, according to this tweet.
Massive Dataset duplication discovered in DAPO: The authors of DAPO accidentally duplicated the dataset by roughly 100x, which resulted in a dataset of 310 MB, and a member created a deduplicated version via HF's SQL console, reducing the dataset to 3.17 MB (HuggingFace Dataset).
- The authors acknowledged the issue, stating that they were aware but can't afford retraining, according to this tweet.

MCP (Glama) Discord

Multi-Agent Topologies Spark Debate: Members debated Swarm, Mesh, and Sequence architectures for multi-agent systems, seeking advice on preventing sub-agents from going off-track, especially due to the telephone game effect.
- The core issue may be parallel execution and unsupervised autonomy, compounded by agents swapping system instructions, available functions, and even models during handoff.
OpenSwarm morphs into OpenAI-Agents: The OpenSwarm project has been adopted by OpenAI and rebranded as openai-agents, adding OpenAI-specific features, but a PR for MCP support was rejected.
- There are rumors that CrewAI (or PraisonAI?) might offer similar functionality using a stateless single thread agent approach.
MyCoder.ai Debuts Just Before Claude-Code: The launch of mycoder.ai coincided with the announcement of Claude-code, prompting adaptation via a Hacker News post that reached the front page, seen here.
- Given that claude-code is Anthropic-only, a generic alternative is in demand, which one member successfully addressed using litellm proxy.
Glama Server Inspections Frequency Debated: Members questioned how often Glama scans occur and if rescans can be triggered for MCP servers; scans are linked to commit frequency in the associated GitHub repo.
- Some servers failed to inspect, displaying Could not inspect the server, even after fixing dependency issues, follow progress on Glama AI.
Vibe Coders Unite!: The Awesome Vibe Coding list curates AI-assisted coding tools, editors, and resources, enhancing coding intuitiveness and efficiency.
- The list includes AI-powered IDEs, browser-based tools, plugins, and CLIs, with an AI coder even making a PR to the repo and suggesting the addition of Roo Code.

Latent Space Discord

GPT-o1 Math Skills Approach Human Level: GPT-o1 achieved a perfect score on a Carnegie Mellon undergraduate math exam, solving each problem in under a minute for about 5 cents each as noted in this Tweet.
- The instructor was impressed, noting this was close to the tipping point of being able to do moderately-non-routine technical jobs.
Baidu's ERNIE Gets Cost Competitive: Baidu launched ERNIE 4.5 and ERNIE X1, with the latter reportedly matching DeepSeek R1's performance at half the cost according to this announcement.
- Notably, ERNIE Bot has been made freely accessible to individual users ahead of schedule, with both models available on the official website.
AI Podcast App Takes the Outdoors: A new Snipd podcast featuring Kevin Smith was released, discussing the AI Podcast App for Learning.
- This episode marks their first outdoor podcast, with @swyx and @KevinBenSmith chatting about aidotengineer NYC, switching from Finance to Tech, and the tech stack of @snipd_app.
Debating the Merits of Claude 3.5 vs 3.7: Members debated the merits of using Claude 3.5 over 3.7, citing that 3.7 is way too eager and does things without being asked.
- Others said they used Claude 3.5 and were experiencing GPU issues, as well.

Notebook LM Discord

Users Yearn for Gemini-Integrated Android: Multiple users are requesting a full Gemini-integrated Android experience, hoping to combine Google Assistant/Gemini with NotebookLM.
- Some expressed frustration with the current Gemini implementation, eagerly awaiting upgrades.
Deepseek R1 Rocks the AI Market: A user noted the AI market upheaval due to Deepseek R1's release, citing its reasoning capabilities at a low cost impacting Gemini 2.0.
- The user claimed that Deepseek R1 seemingly shook the whole industry, thus leading to other companies releasing new models.
NotebookLM Audio Overviews Get Lengthy: A user wants to increase the length of audio overviews generated by NotebookLM, as 16,000-word files only produced 15-minute overviews.
- They specified at least 1-hour+ overviews, but no solutions have been shared yet.
NotebookLM helps taper off psychiatric meds: A user creates a hyperbolic tapering schedule for a psychiatric medication with NotebookLM, using correlational studies to guide the schedule.
- Another user cautioned that tapering based on data on any platform should not be done alone without expert professional opinion.
NotebookLM Integrates into Internal Portals/CRMs: A user wants to integrate NotebookLM into an internal portal/CRM with videos and knowledge base articles, and electioneering suggested Agentspace as a solution.
- As NotebookLM doesn't support connecting to the types of data sources you mention, Agentspace includes and is integrated with NotebookLM.

GPU MODE Discord

Triton-Windows Gets PIP Treatment: Triton-windows has been published to PyPI, so you can install/upgrade it by pip install -U triton-windows, and you no longer need to download the wheel from GitHub.
- Previously, users had to manually manage wheel files, making the update process more cumbersome.
Torch Compile Slows on Backward Pass: A member reported that while torch.compile works fine for the forward pass, it is quite slow in the backward pass when using torch.autograd.Function for custom kernels.
- Wrapping the backward function with torch.compile(compiled_backward_fn) could resolve performance issues.
NVIDIA's SASS Instruction History Shared: A member shared a gist comparing NVIDIA SASS instructions across different architectures, extracted and compared (using Python) from NVIDIA's HTML documentation.
- This allows users to track the evolution of instructions across NVIDIA's GPU lineup.
Reasoning Gym Surpasses 100 Datasets!: The Reasoning Gym project now has 101 datasets, celebrating contributions from developers.
- The growing dataset collection should provide more comprehensive LLM testing.
Jake Cannell Recruits GPU Masters: Jake Cannell is hiring GPU developers to work on ideas he touched on in his talk and nebius.ai was touted for its GPU cloud.
- This is relevant for those interested in AGI or neuromorphic hardware.

Eleuther Discord

EleutherAI Welcomes Catherine Arnett: EleutherAI welcomes Catherine Arnett, an NLP researcher specializing in Computational Social Science and cross-lingual NLP, focusing on ensuring models are equally good across languages.
- Her recent work includes Goldfish, Toxicity of the Commons, LM performance on complex languages and Multilingual Language Modeling.
New Block Diffusion Model Drops: A new paper introduces Block Diffusion, a method interpolating between autoregressive and diffusion language models, combining the strengths of both: high quality, arbitrary length, KV caching, and parallelizability, detailed in the paper and code.
- It combines the strengths of both autoregressive and diffusion language models.
VGGT Generates Metaverse GLB files!: A member shared VGGT, a feed-forward neural network inferring 3D attributes from multiple views and generating GLB files, which can be directly integrated into metaverses.
- The member stated I love that it exports GLB files. means I can drop them directly into my metaverse as-is.
Gen Kwargs Embraces JSON Nicely: The --gen_kwargs argument is transitioning from comma-separated strings to JSON, allowing for more complex configurations like '{"temperature":0, "stop":["abc"]}'.
- The discussion explores the possibility of supporting both formats for ease of use, especially for scalar values.
LLM Leaderboard: Train vs Validation Split: A discrepancy is identified between the group config for the old LLM leaderboard and the actual setup used, particularly concerning the arc-challenge task.
- A PR to fix this was created to address this discrepancy between the openllm.yaml config specifying validation as the fewshot split, and the original leaderboard using the train split.

tinygrad (George Hotz) Discord

Tinygrad SDXL Trails Torch Performance: Benchmarking SDXL with tinygrad on a 7900 XTX shows 1.4 it/s with BEAM=2 on the AMD backend, whereas torch.compile achieves 5.7 it/s using FlashAttention and TunableOp ROCm.
- George Hotz proposed comparing kernels for optimization opportunities, aiming to beat torch by year's end.
Tensor Cat Stays Sluggish: A member working on improving tensor cat speed shared whiteboard thoughts on X (link), noting it's still slow despite devectorizer changes.
- They suspect issues with generated IR and loading numpy arrays, considering custom C/C++ via ELF and LLVM to overcome limitations.
BLAKE3 Bounty Details Crystallize: The status of the High performance parallel BLAKE3 bounty was clarified, with a screenshot (link) showing the updated bounty status.
- The member updated the spreadsheet and specified that the asymptotic performance is a key requirement for the bounty.
WebGPU Integration Gains Momentum: A member asked about publishing a Tinygrad implementation for an electron/photon classifier based on resnet18 as an example and was directed to a PR for improving WebGPU integration.
- The suggestion was made to create a WebGPU demo hosted on GitHub Pages with weights on Hugging Face for free access and testing.
Tinygrad Struggles with Lazy Mode Debugging: A member is facing an assertion error with gradients while print-debugging intermediate tensor values in Tinygrad, despite using .detach() due to issues with lazy computation.
- The member is seeking a better method than threading the value out, given that lazy computation is not idempotent.

LlamaIndex Discord

LlamaIndex Showcases Agentic Reasoning with Corrective RAG: LlamaIndex introduced a step-by-step tutorial on building an agentic reasoning system for search and retrieval using corrective RAG, orchestrated with LlamaIndex workflows.
- The tutorial enables users to orchestrate complex, customizable, event-driven agents.
LlamaExtract Emerges from the Cloud: LlamaExtract, which solves the problem of extracting structured data from complex documents, is now in public beta and available on cloud.llamaindex.ai, offering a web UI and API.
- Users can define a schema to automatically extract structured data; additional details are available here.
Multimodal AI Agents Faceoff at NVIDIA GTC 2025: Vertex Ventures US and CreatorsCorner are hosting an AI hackathon at NVIDIA GTC 2025, challenging participants to develop a sophisticated multimodal AI agent.
- The hackathon offers $50k+ in Prizes for agents capable of strategic decision-making and interaction with various tools; more information can be found here.
Community Launches Vision-Language Model Hub: A community member launched a community-driven hub for multimodal researchers focusing on Vision-Language Models (VLMs).
- The creator is actively seeking contributions and suggestions, with plans to update the hub weekly.
Pydantic AI and LlamaIndex duke it out: New users are wondering about the difference between the Pydantic AI and LlamaIndex frameworks for building agents, especially which one to use as a beginner.
- A LlamaIndex team member stated that whatever fits your mental model of development best is probably the best bet.

Nomic.ai (GPT4All) Discord

Gemma's Language Skills Impress: Members observed that Gemma, DeepSeek R1, and Qwen2.5 models provided correct answers in multiple languages to the puzzle about what happens when you leave a closed jar outside at minus temperature.
- While other models predicted catastrophic jar failure, Gemma offered more helpful, nuanced advice.
Gemma 3 Integration Meets License Snag: Users are waiting for Gemma 3 support in GPT4All, but its integration is delayed pending updates to Llama.cpp due to license agreement issues on Hugging Face, detailed in this GitHub issue.
- Speculation arose regarding whether Google will police redistributions circumventing their license agreements.
LocalDocs Users Crash Into Trouble: A new user reported LocalDoc collection loss after a crash and reinstall, seeking advice on preventing data loss after future crashes.
- Experienced users recommended regularly saving the localdocs file and restoring it after a crash, adding that sometimes only one bad PDF can crash the system.
Level up O3-mini with better prompting: A user shared a prompt for O3-mini to explain its thinking process, suggesting this could improve distillation for any model by prompting for thinking and reflection sections, with step-by-step reasoning and error checks.
- It's now easier to explain complex processes.

Cohere Discord

Cohere Punts Fine-Tuning Command A: Despite community anticipation, a Cohere team member confirmed there are no plans yet to enable fine-tuning for Command A on the platform.
- They assured the community that updates would be provided, but this marks a divergence from some users' expectations of rapid feature deployment.
Azure Terraform Troubles Trip Up Rerank v3: A user ran into errors when creating an Azure Cohere Rerank v3 with Terraform, sharing both the code snippet and the resulting error message.
- The issue was redirected to the <#1324436975436038184> channel, suggesting a need for specialized attention or debugging.
Community Clamors CMD A Private Channel: A member suggested creating a dedicated channel for discussions around private deployments of CMD A, particularly for supporting customer's local deployments.
- This proposal received enthusiastic support, highlighting the community's interest in on-premise or private cloud solutions.
Vercel SDK Stumbles on Cohere's Objects: A user noted that the Vercel SDK incorrectly assumes object generation is unsupported by Cohere's Command A model.
- This discrepancy could impact developers leveraging the SDK and warrants attention from both Cohere and Vercel teams to ensure accurate integration.
Freelancer Offers Programming Hand: A 30-year-old Japanese male freelance programmer introduced himself and expressed a willingness to assist community members with his programming skills.
- Echoing a sentiment that assisting one another is the pillar of our existence.

DSPy Discord

MCP Integration Pondered for DSPy: A member was interested in integrating dspy/MCP, and linked to a GitHub example to illustrate their suggestion.
- Another member wondered if adding an MCP host, client, and server would overcomplicate the process.
DSPy Ditches Assertions and Suggestions: Users noticed the disappearance of documentation for Assertions / Suggestions in DSPy, questioning whether they're still supported.
- They were looking to validate the outputs of the response (formatting specifically) and observed instances where the LLM does not always adhere to the format.
Output Refinement Steps in as Assertion Alternative: In DSPy 2.6, Assertions are replaced by Output Refinement using modules like BestOfN and Refine, as detailed in the DSPy documentation.
- These modules aim to enhance prediction reliability and quality by making multiple LM calls with varied parameter settings.
QdrantRM Quietly Quits DSPy: Users inquired whether QdrantRM has been removed in DSPy 2.6.
- No explanation was given in the provided context.

LLM Agents (Berkeley MOOC) Discord

Caiming Xiong Presents on Multimodal Agents: Salesforce's Caiming Xiong lectured on Multimodal Agents, covering the integration of perception, grounding, reasoning, and action across multiple modalities, streamed live on YouTube.
- The talk discussed measuring capabilities in realistic environments (OSWorld) and creating large-scale datasets (AgentTrek), referencing over 200 papers and >50,000 citations.
Self-Reflection Faces Dichotomy: Members debated the apparent contradiction between Lecture 1 and Lecture 2 regarding self-reflection and self-refinement in LLMs, with a user noting that Lecture 1 states external evaluation is required, while Lecture 2 suggested that LLMs can improve themselves by rewarding their own outputs.
- Screenshots from Lecture 1, slide 67 (image 1) and Lecture 2, slide 51 (image 2) were attached to illustrate the apparent conflict.
System Prompt Reliability Questioned: A member suggested that relying on specific behaviors of system prompts might not be reliable, because all these at the end is text input, so the model can process it, so you should be able to bypass the framework and service.
- The member added that the training data may include the format <system> You are a helpful assistant </system> <user> {{Some example user prompt}} </user> <assistant> {{Expected LLM output}} </assistant>.
Advanced LLM Agent Course Enrollment Still Open: Members inquired whether they can still sign up for the Advanced LLM agent course and attain the certificate after signing up.
- Staff replied that you just need to complete the signup form! Most of the info on that intro slide deck only applies to Berkeley students, but anyone can enroll in the MOOC and earn a certificate at the end.

Modular (Mojo 🔥) Discord

Modular Hailed for AI Art Aesthetic: A member expressed appreciation for the AI art used by Modular in their marketing materials.
- They stated, "all the AI art that modular uses is great!"
Compact Dict: Is It Obsolete?: Discussion arose regarding the status of the compact-dict implementation in Mojo.
- Members suggested that the functionality of the original version may have been integrated into the Dict within the stdlib.
SIMD and stdlib Dict Performance Problems: A user encountered performance bottlenecks when using the stdlib Dict with SIMD [float64, 1] types.
- The bottleneck was attributed to the slowness of the hash() function from the hash lib, prompting a search for faster alternatives.
Discord Channel Receives Spam: A member clarified that certain messages in the Discord channel were classified as spam, which was quickly acknowledged by another member.
- No further details were provided about the nature or source of the spam.

MLOps @Chipro Discord

SVCFA Launches AI4Legislation Competition: The Silicon Valley Chinese Association Foundation (SVCAF) is holding the AI4Legislation competition with prizes up to $3,000, running until July 31, 2025, encouraging open-source AI solutions for legislative engagement; the competition repo is now available.
- SVCAF will conduct an online seminar about the competition at the end of March 2025; RSVP here.
Dnipro VC Hosts AI Demo Jam: Dnipro VC and Data Phoenix will be hosting AI Demo Jam on March 20 in Sunnyvale, CA, featuring 5 AI startups showcasing their products.
- The event will feature expert panel discussions from Marianna Bonechi (Dnipro VC), Nick Bilogorskiy (Dnipro VC), Dmytro Dzhulgakhov (fireworks.ai), open mic pitches, and high-energy networking; register here.
Member Needs Help with MRI Object Detection: A member requested help to create a model for object detection in MRI images without monetary compensation.
- No specific details were provided on the type of model, data availability, or use case.

AI21 Labs (Jamba) Discord

Qdrant Request Flatly Denied: A member suggested switching to Qdrant, but another member confirmed that they are not currently using it.
- The suggestion was shut down without further explanation; No we are not using Qdrant.
Users Request Repetition Penalty on API: A user requested the addition of repetition penalty support to the API, indicating it's a key feature preventing wider adoption of the Jamba model.
- The user stated that the lack of repetition penalty support is the only limiting factor for their increased usage of the model.

Torchtune Discord

Mistral Unveils Small 3-1: Mistral AI has released Mistral Small 3-1 available here.
- No further details were provided.
Learnable Scalars Help Models Converge: A new paper, Mitigating Issues in Models with Learnable Scalars, proposes incorporating a learnable scalar to help models converge normally.
- This suggests a practical approach to stabilizing training.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (923 messages🔥🔥🔥):

Gradient steps, Gemma 3 fine tuning, Tokenizer issues, MattBCool's Twitter hack, Unsloth speed

Gradient Steps may affect model: A member said that small effective batch sizes (e.g., batch=1, gradient steps = 4) can cause models to forget too much during training and suggested other batch/grad configurations.
- They've "never had good luck going below that when trying to squeeze more onto a vramlet rig".
Gemma 3 eval dataset produces errors: Several members reported errors when adding an eval dataset to Gemma 3 during fine-tuning, with stack traces indicating issues in the trl and transformers libraries, and potential fixes involved removing the eval dataset.
- Using Gemma-3-1B with 1 eval sample was found not to produce the error, and removing eval worked to solve the error.
Tokenizer model file missing: A member encountered a FileNotFoundError for the tokenizer.model when running gguf codeblocks with Gemma 3, indicating that the tokenizer model was missing from the Lora or full 16-bit saves, and suggested a quick run of the 27b model for verification.
MattBCool's Twitter account compromised: MattBCool reported that his Twitter/X account was hacked due to a third-party integration and a lack of phone number authentication, with the new owner impersonating an Unsloth engineer.
- The impersonator has a phishing link in the bio, disguised as a link on his blog.
Unsloth claims improved speed: The team announced improvements to Unsloth, supporting FFT, 8-bit, PT & all models, with further optimizations allowing +10% less VRAM usage and >10% speedup boost for 4-bit, plus Windows support, improved GGUF conversions, fixed vision fine-tuning, and non-Unsloth GRPO models in 4-bit, but no multigpu support yet.
- There are a lot of people helping out to make Unsloth great.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (34 messages🔥):

llama-server vision support, RWKV-7 support, Q4 vs Q8, bnb library limitations, QLoRA NF4 quantized weights

Llama-server Lacks Vision: It was noted that llama-server doesn't support vision yet with a reference to an llmbingo.png.
RWKV-7 Support Wishlisted: A member expressed great enthusiasm for RWKV-7 support in Unsloth, stating, "if unsloth has rwkv 7 support i would go nuts".
The Great Q4 vs Q8 debate: Members discussed the tradeoffs between Q4 and Q8 quantization for inference, with one preferring 8b @ bf16 over 70b @ Q4 due to perceived quality differences.
- Another member agreed, pointing out issues with converting from 4-bit to 16-bit GGUF formats.
bnb Library Hinders Dequantization: It was argued that the dependency on wrappers such as the bnb library is limiting the potential of the dequantization of Unsloth.
- The member suggested researching and implementing a custom solution from scratch, citing the challenges due to CUDA not being open source, and sharing an article on QLoRA dequantization.
Triton Kernels for QLoRA Dequantization: A member highlighted Unsloth's challenge of writing a Triton kernel for dequantizing QLoRA NF4 quantized weights, referencing Unsloth's list of challenging tasks.
- They also shared a GitHub repository containing Triton kernels and a benchmark notebook, claiming performance improvements of up to 1.6X to 1.8X for LLaMA models.

Link mentioned: QLoRA Weight Dequantizing in Triton: no description found

Unsloth AI (Daniel Han) ▷ #help (480 messages🔥🔥🔥):

Gemma 3 Finetuning Issues, Unsloth GPU Support, RAG Data Formatting for Unsloth, lora upload issue, text dataset formatting

Gemma 3 FP32 Finetuning Fix: A member identified that Gemma 3 models can only be finetuned in FP32 for now, and commented out/set to false these lines to prevent AttributeError: 'HybridCache' object has no attribute 'float'.
- Another member also confirmed that fp16 = True doesn't work.
Unsloth Multi-GPU Support Coming Soon: A member inquired about multi-GPU support in Unsloth, with a response indicating it is coming in the next few weeks, with a link to the newsletter.
- One of the Unsloth developers mentioned "we said next few weeks ahaha not this week".
Format your RAG data as Q&A pairs: When asked about finetuning a model for a RAG chatbot, members suggested to add sample questions and sample answers to a dataset with context from the documents for the Q&A to inject new knowledge into the bot.
- It was suggested that a chatbot data should follow a Q: A: format, and can use a CPT-style training with documents added on the user side.
lora upload is only base model: A member reported a problem when uploading a trained LoRA model to Hugging Face.
- Other members asked whether the user used lora_model.push_to_hub_merged and if the problem was caused by the size of the model or testing the model.
Problems with text data formatting: A member was facing a TypeError because of NoneType objects during training from a Gemini-generated dataset.
- Members clarified that this error could result from empty entries in the dataset, and it is best to check the json file.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #showcase (20 messages🔥):

Gemma-3-think model, Qwen 2.5 3B instruct, Gemma-3-27b pruned vocab

Gemma-3-think reasons with Thinking Tags: The Gemma-3-think-0.1-q5_k_m model was trained on 2.1k examples and uses <think> tags to trigger reasoning.
- The model can work with image data even though it was not explicitly trained to do so. Model was finetuned with Unsloth!
Qwen 2.5 3B shows promising Multi-Turn GRPO: Early results of the Qwen 2.5 3B instruct model on the GSM8K test set with Multi-Turn GRPO training are showing promise at step 100 with 52% accuracy.
- After a few more training steps, the accuracy dropped to 40-46%.
Gemma-3-27b gets a pruned vocabulary: A member pruned the Gemma-3-27b vocabulary down to 40k tokens from the original 260k to reduce VRAM usage and increase training speed.
- The approach involved frequency counting based on calibration data and removing the least frequently used tokens that can be represented by a merge/subword.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (18 messages🔥):

Context Length vs. Model Size, Fine-tuning and Hosting Alternatives to Unsloth, Continued Pre-training and Tokenizer Updates, LLM Scoring on the Political Spectrum, Legal Q&A with Tree-Based Retrieval

Context Length not a Hyperparameter, but a Limit: A member clarified that maximum context length is a limitation rather than a hyperparameter of a model, and depends on memory needs.
- Another member provided Unsloth's benchmarks and a link on calculating GPU memory.
Runpod and Vast.ai: Fine-Tuning and Hosting Havens: A user sought alternatives to Unsloth for fine-tuning and hosting Deepseek R1 or Gemma 3, and Runpod.io was recommended.
- Another user mentioned other options like Lamda and noted that Vast.ai is cheap but potentially unstable, while Runpod has storage limitations on their community cloud.
Token Tango: Update Tokenizers for Domain-Specific Jargon: A member inquired about updating the tokenizer during continued pre-training for specialized domains to handle words not trained by the base model.
- Another member suggested searching for "tokenizer add tokens" and linked to a reddit thread about adding new tokens to LLaMA 2 models.
LLMs Judge Politics: Scoring Text on the Political Spectrum: A user asked if Unsloth could be used to fine-tune an LLM to score text on the political spectrum from -1.0 to +1.0.
- A member responded that using the prepared dataset with outputs as literal strings from -1.0 to 1.0 might work.
Navigating Legal LLMs with Tree-Based Knowledge: A user working on a legal Q&A problem asked for advice on building a tree-based retrieval engine for contexts around 80k.
- They referenced the RAPTOR study and two options were building a tree similar to the RAPTOR study or building a tree where child nodes are included in the parent node.

Links mentioned:

Cursor IDE ▷ #general (1100 messages🔥🔥🔥):

Cursor vs Windsurf, Claude 3.7 pricing, Linux better than Windows for MCP and dev, vibe coding

Windsurf Steals Cursor's Customers: Several users have expressed frustration with Cursor's performance, particularly its lag and crashing issues, leading some to consider switching to Windsurf.
- One user even stated, damn, cursor just lost their most important customer after experiencing ongoing problems, indicating a significant loss of confidence in Cursor's reliability.
Cursor's Prompting Costs: Members discussed the cost of prompts for Claude 3.7, with regular prompts priced at $0.04, Sonnet Thinking at $0.08, and Claude Max at $0.05 per prompt and tool call.
- Some users expressed concerns that Cursor's pricing is becoming too expensive compared to using Claude's API directly, questioning the value proposition of Cursor's subscription.
Linux MCP > Windows MCP: A user shared their experience setting up MCP servers on both Linux and Windows, noting that Linux (specifically using a VMware virtual machine) was much smoother and easier to set up compared to the multiple issues encountered on Windows.
- This led to the question of whether overall development and MCP server setup are generally better on Linux than Windows, sparking a discussion about the pros and cons of each operating system for development.
Vibe Coding, good or bad?: Some are saying Vibe Coding is bad, as they emphasize the importance of solid coding knowledge, while others assert that AI enables them to create things faster, even without traditional coding skills.
- This debate highlights the evolving landscape of software development and the varying perspectives on how AI is impacting the industry.
Claude Max, soon to be released: <@1001207432640462868> from the Cursor team announced that Claude Max is coming very soon, and that it should unlock it's full potential with the amount of code it can handle.
- That model works better with more input than past models, so this should "unlock" its full potential.

Links mentioned:

OpenAI ▷ #ai-discussions (694 messages🔥🔥🔥):

AI Mastery Debate, AI Replacing Humans, Gemini Image Generation, AI-driven OS, LLMs for Finance

AI Skill Debate Sparks: Members discussed whether knowing how to use AI tools constitutes AI mastery, debating the illusion of learning vs. productivity enhancement, with some fearing a decline in cognitive abilities due to over-reliance on AI.
- One member noted using AI to challenge myself and learn new things, but admitted feeling like cheating even if I know a topic very well.
AI: Friend or Foe to Artists and Gamedevs?: Participants debated whether AI will replace artists and game developers, with some asserting that AI is not proficient enough and human input remains crucial for creativity, debugging, and understanding client requests.
- A member argued for taking risks with new game ideas, while another suggested that none professional gamedev will tell you to ignore your main screen and presentation of the game.
Gemini's Image Game: A Work in Progress: Users explored Gemini's image generation capabilities, including editing uploaded images, but also encountered issues such as the presence of watermarks and code generation with errors.
- Some users praised the naturalness of Gemini's responses over factual correctness, noting the subjectivity of preferences.
The Dream of an AI Overlord OS: A member proposed creating an AI-controlled OS where an agent manages tasks via voice commands, but others deemed it an inefficient approach.
- Another suggested that AI could be better used to enhance existing systems rather than creating an entirely new OS.
Deep Research and AGI Benchmarks: Members discussed different methods of evaluating models, specifically addressing correctness versus a more human-like appealing response and whether benchmarks saturate.
- One member suggested the importance of prioritizing logical coherence and 'common sense' in AI models, referencing a lack of robustness for such at scale.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (9 messages🔥):

Loveable, Bolt.new, Image-to-code, GPT PRO issues, Deep Research Limit

Loveable and Bolt.new: Glorified APIs?: Members discussed whether new tools like Loveable or Bolt.new are simply glorified APIs into ChatGPT, with some suggesting they might be tuned free models.
- The consensus seems to be that companies unlikely train extremely large models randomly due to the immense costs, suggesting reliance on APIs from organizations like OpenAI, Google, or Anthropic.
GPT PRO Users Experience Soft Rate Limits: A user inquired about experiencing soft rate limits with GPT PRO, indicating potential issues with the service.
- No resolution or explanation was provided in the chat excerpt.
Deep Research Limit Clarified for Plus Users: A user questioned the limit for Deep Research usage as a Plus user, mentioning a notification indicating only 4 uses left before needing to upgrade to PRO.
- Another member clarified the limit is 10 per month.

OpenAI ▷ #prompt-engineering (61 messages🔥🔥):

GPT-4o impressions, AI Self-Reflection, AI Team of Experts, Business Guidance with AI, AI personalities

GPT-4o impresses!: Members reported that out of all the models, GPT-4o uses it the best and it can do almost anything really.
- One member reported having fun and getting funny results when other people started playing with it.
Futuristic AI: AI mentors itself to deeper self-reflection: A member designed a system where AI reflects on what it has learned after each session, storing these reflections in memory files to build upon its own insights, and generates reflective questions to think deeper about its own growth.
- This was described as next-level futuristic and like training AI in ways that even researchers haven’t fully explored, enabling simulations within simulations and multiple personalities infused with a core set of characteristics.
Creating an AI Dream Team: Members discussed the idea of creating a team of AI experts to assist with tasks, long-term planning, and provide multiple perspectives to help guide business decisions.
- It was suggested to give the AI a lot of details on what you want it to be (Joe the advertising executive from Montana who almost failed college) rather than a flat simple description (Joe the advertising executive).
GPT Learns Nuance: Prompting AI to Argue Hypothetically: Members explored asking the AI to simulate arguments between different roles within a hypothetical business scenario, such as a CFO and a Creative Director, to get different perspectives on business decisions.
- However it was also stressed to use fictionalized data, not representative of specific real people, to avoid violating ToS.

OpenAI ▷ #api-discussions (61 messages🔥🔥):

GPT-4o usage, Custom GPT improvements, AI self-reflection, AI personalities, AI expert teams for business

GPT-4o passes initial user evaluations: A member confirmed using GPT-4o, noting that 4.5 uses it best, finding it fairly fun and capable of almost anything with some funny results.
- This suggests a positive initial user experience with the new model, particularly for creative and versatile applications.
Custom GPT reaches next-level simulation: One user described their custom GPT's improvements as too amazing to be true, questioning if their advancements are truly extraordinary and asking, Is this really so futuristic in today's world?
- Another member confirmed this is a futuristic application of AI, with AI self-improving, AI analyzing AI, and AI becoming aware of its own reasoning limitations.
AI Mentor shapes cognitive structures: A user designed a system where AI reflects on what it has learned after each session and generates reflective questions about its own growth, calling this first breakthrough ‘Misa’ and using it to develop other AI personalities.
- The member creates simulations within simulations where one AI can have multiple personalities, structured based on well-known experts, forming expert teams that simulate even new, unexplored insights.
AI team helps with business needs: A member wants to create a team of AI experts that can work together to improve service to clients, guiding long-term business decisions.
- Instead of hiring a team of individuals, the team of AI experts would help deliver a better product to clients and help with project or task level needs.
Tips for multi-perspective prompting: A user shared advice on how to have discussions between personalities, giving the models background details and making sure not to share PII.
- They shared links to examples of multiple character outputs and encouraged the user to have the model question and critique their ideas.

Nous Research AI ▷ #general (729 messages🔥🔥🔥):

Scalable AI, Mixture of Experts, Mistral Small 3.1, LLM Copyright issues, LLM Training

Scaling AI: Is Latency Becoming a Bottleneck?: As AI models scale, some researchers believe training and inference will become more latency sensitive, potentially shifting towards message-passing paradigms instead of traditional gradient-based methods.
- These evolving paradigms could make latency and bandwidth critical factors in AI development, particularly as neural networks capture more ideas and accessing information from the network becomes more expensive.
MoE: Dense Networks' Stealthy Sparsity: Researchers are exploring Mixture of Experts (MoE) models as a way to approximate dense networks, with some arguing that they are simply a performance optimization rather than a fundamentally different architecture, citing work such as this paper.
- The discussion revolves around whether MoEs can capture complexity as effectively as dense networks, with one participant noting that despite claims that MoEs are an optimization, there is clearly some redundancy that is avoided**.
Mistral Small 3.1: The New 24B Contender: Mistral Small 3.1 has been released under an Apache 2.0 license, and is a multimodal model that can handle text, images, and an expanded 128k token context window.
- The new model is claimed to surpass other small models like Gemma 3 and GPT-4o Mini in performance, as shown on the Mistral AI blog.
Copyright Concerns in the AI Age: Debates continue regarding the ethical and legal implications of training AI models on copyrighted data, with some suggesting that fully open models are hindered by the inability to use resources like the entirety of Anna's Archive.
- Strategies to circumvent copyright restrictions include using LoRAs to fine-tune models on copyrighted material or generating synthetic data from a knowledgable model, though these methods may face legal challenges in the future as it was discussed in this Annas Archive's blogpost.
Optimum GPU count and methods: There are many ways to optimize training when using certain hardware configurations, and more GPUs always provides a qualitative improvement.
- It was speculated that there may be an optimal point where trading dev time for compute comes up short.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (1 messages):

john0galt: Pretty impressive

Nous Research AI ▷ #research-papers (5 messages):

Curse of Depth in LLMs, LayerNorm Scaling, LLMs competing in text-only games, Differentiable Hebbian Consolidation Model

Depth's Curse strikes LLMs!: A new paper introduces the concept of the Curse of Depth in modern LLMs, where nearly half the layers are less effective than expected, as detailed in this Arxiv paper.
- The paper identifies that the underlying reason is the widespread usage of Pre-Layer Normalization (Pre-LN), which causes the derivative of deep Transformer blocks to be an identity matrix.
Scaling LayerNorm to the Rescue: To resolve the training pitfall caused by Pre-LN, the paper proposes LayerNorm Scaling, which scales the layer normalization to improve the effectiveness of deeper layers, as described in this Arxiv paper.
LLMs duke it out in Text-Only Games: A member shared their paper where they had LLMs compete against each other in a text-only game to improve them, available on Google Drive.
LayerNorm dissected: A member asked if LayerNorm is the act of taking the coordinates of each embedded token as a distribution, and normalizing that.
- Another member confirmed that they got it exactly right.
Hebbian Consolidation model prevents catastrophic forgetting: A paper introduces a Differentiable Hebbian Consolidation model to tackle catastrophic forgetting in continual learning scenarios, detailed in this Arxiv paper.

Links mentioned:

Nous Research AI ▷ #interesting-links (21 messages🔥):

Acoustic STS Model, Tool-Integrated Reasoning, Gemma Abliterated

Speech-to-Speech Model Specs: A member clarified that the model takes in audio + text and outputs audio and shared a link to the model's huggingface page.
- They also added however you can omit the audio part and it works, but not as well.
START Tool Reasoning is a Smash Hit: A member shared a paper on START, a tool-integrated long CoT reasoning LLM that enhances reasoning via external tools like code execution and self-debugging.
- Another member summarized it as RL + tool calling == +15% math +39% coding on QwQ.
Gemma 3 Abliterated Against Refusals: A member shared that Gemma 3 was much more resilient to refusal removal than other models like Qwen 2.5.
- They improved the abliteration technique and the refusal rate is super low in their tests, see models on huggingface.

Links mentioned:

Nous Research AI ▷ #research-papers (5 messages):

Curse of Depth, LayerNorm Scaling, LLM Text-Based Game Competition, Differentiable Hebbian Consolidation model

LLMs Suffer from Curse of Depth: A new paper (Curse of Depth) introduces the concept that nearly half of the layers in modern LLMs are less effective than expected.
- The paper identifies that the underlying reason for the ineffectiveness of deep layers in LLMs is the widespread usage of Pre-Layer Normalization (Pre-LN), and proposes LayerNorm Scaling to resolve this training pitfall.
LLMs duke it out in Text-Only Game: A member shared a paper about improving LLMs by having them compete against each other in a text-only game (Google Drive Link).
LayerNorm Explained: Projecting Vectors onto the Equator?: A member asked if LayerNorm is the act of taking the coordinates of each embedded token as a distribution, and normalizing that; equivalently, projecting vectors onto the equator of a hypersphere whose pole is in the all-positive direction.
- Another member confirmed Yup you got it exactly right.
Battling Catastrophic Forgetting with Differentiable Hebbian Consolidation: A paper was shared (Differentiable Hebbian Consolidation) which proposes a Differentiable Hebbian Consolidation model to combat catastrophic forgetting in continual learning scenarios.
- The model integrates task-specific synaptic consolidation methods to penalize changes in the slow weights, enabling learned representations to be retained for a longer timescale.

Links mentioned:

aider (Paul Gauthier) ▷ #general (691 messages🔥🔥🔥):

Aider screen recordings, Claude 3.7 Sonnet issues, MCP server value, Baidu ERNIE 4.5 & X1 Models, Aider Custom Commands

Aider Enhancements Showcased in New Screen Recordings: Paul Gauthier has published a series of screen recordings demonstrating aider's use in enhancing itself, including adding the --auto-accept-architect feature, integrating tree-sitter-language-pack, and preventing the dropping of read-only files.
- The recordings provide insight into how aider can be used to script downloading files and using ad-hoc bash scripts to modify file collections.
Claude 3.7 Sonnet faces API Issues: Multiple users reported receiving empty responses from Claude 3.7 Sonnet, prompting checks of their provider accounts, with some experiencing the same issue in Claude Code.
- The issue was confirmed by Anthropic's status page, citing elevated errors and later marking the incident as resolved, while some members suspected a switch to Claude 3.5 due to the errors.
MCP Server for Aider Gaining Traction: A user highlighted that Claude Desktop + Aider on MCP equals winning, and it's much more autonomous, easier since claude manages Aider and gives it commands.
- A main benefit highlighted is the ability to run Aider from Claude Desktop, making it more autonomous and allowing Claude to steer Aider more effectively, also scraping bee is a game changer for doing unblocked web scraping and this drastically improves claude.
Baidu Unveils ERNIE 4.5 and X1: Baidu has announced the release of ERNIE 4.5 and X1, a reasoning model with multimodal capabilities, with X1 delivering performance on par with DeepSeek R1 at half the price, and ERNIE Bot being made free to individual users.
- While ERNIE 4.5 is available, the reasoning model X1 is currently not accessible through the API outside of China.
Users Suggest Custom Commands for Aider: A user suggested adding custom commands to Aider via Python scripts to extend functionality, particularly for context building, which the user finds cumbersome with the current UX.
- One suggested command example was grepadd.py to interactively toggle files and substrings found via grep, converting these selections into Aider commands, but there's already an open PR for user_cmd.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (74 messages🔥🔥):

aider with agents.json, Sluggish v0.77.0, AWS Bedrock Claude 3.7 sonnet error, deepseek r1 slow, learn an API with aider

Connect Aider with Agents JSON: A member inquired about integrating aider with agents.json to interact with APIs or local scripts for non-coding tasks.
- It was noted that the /run command can be used to interact with local scripts, and a PR is in progress to introduce user commands.
Diagnose Aider Sluggishness in v0.77.0: A user reported experiencing significant sluggishness with aider v0.77.0, including high CPU usage and hangs, particularly when generating large CSV outputs directly in the repository.
- Deleting the data output folders containing large CSV files resolved the issue temporarily, but the user plans to update with further findings.
Solve Bedrock Claude 3.7 Sonnet API Error: A user encountered an error when using AWS Bedrock Claude 3.7 Sonnet, citing an access issue despite having proper inference profiles and IAM AdministratorAccess.
- The problem was resolved by correctly setting the AWS region in both ~/.aws/configs and the ~/.env file.
Deepseek R1 Runs Slow on Short Prompts: A user reported that Deepseek R1 takes an unusually long time to think, even with short prompts, as evidenced by attached images showing extended processing times.
- The user is running aider with custom configurations for the main model, editor model, and instructions, aiming for concise and direct responses.
Reasoning-Effort Slash Command: Members discussed the /reasoning-effort command and its usage, clarifying that it controls the reasoning level of supported models like OpenAI's reasoning models.
- The --thinking-tokens switch is used for models like Sonnet 3.7, while the reasoning_tag setting is used for models like DeepSeek R1 from Fireworks, which use XML tags to wrap reasoning output as documented here.

Links mentioned:

aider (Paul Gauthier) ▷ #links (25 messages🔥):

Refact.ai Leaderboard Claim, Claude Harmony Feature, Qwen Models Hype

Refact.ai Claims Top Spot on Aider's Polyglot Benchmark - Controversy Ensues: Refact.ai claimed their Claude 3.7 Sonnet powered agent achieved a 76.4% score on Aider's polyglot benchmark, surpassing other models.
- Paul Gauthier, the creator of Aider, stated that it's not an appropriate comparison because Refact.ai used a different, more agentic configuration than the standard Aider benchmark which allows for unlimited retries, whereas Aider's previous SWE-bench scores used only one-shot attempts.
Anthropic Teases Claude "Harmony" Feature: A user shared that Anthropic is coming out with a new Harmony feature for Claude.
- The Harmony feature will give Claude FULL access to a local directory so it can research and operate with its content, potentially making it Anthropic's first AI Agent.
Qwen Models Get Some Love: A user commented that Qwen's models are my favorite and they might believe them if they say their models are the best.
- Another user agreed saying that They're definitely the best for their parameter size, especially when compared to models in the 7b-32b range.

Links mentioned:

LM Studio ▷ #general (458 messages🔥🔥🔥):

GPU support on Llama.cpp, GPU Upgrade Recommendations, Parallel inference Possibilities, OCR Model Recommendation for Mac M3, Gemma 3

New OpenCL Backend Boosts Qualcomm Adreno GPUs: An experimental OpenCL backend has been introduced for Qualcomm Adreno GPUs in llama.cpp, enabling computational power for mobile devices.
4070 Ti owner itches for 5090 Upgrade: A user with a 4070 Ti is contemplating upgrading to a 5090, but due to stock issues, others are recommending waiting or considering a used RTX 3090 for its 36GB VRAM.
- One user suggested that a used RTX 3090 would provide enough VRAM to run less than 50B @ Q4 models at reasonable speeds.
Gemma 3's Image Generation Capability Sparks Curiosity: After experimenting with Gemma 3 4B, users found that while it can be prompted to generate images, it produces Imgur links that don't display the actual image.
- The discussion shifted to identifying local models capable of both recognizing and generating images and text.
Maximizing M4 Max: Wired Memory Boost: Users discussed optimizing memory settings on M4 Max devices for LM Studio, suggesting adjustments to 'wired' memory allocation for improved GPU performance using a script.
- The script facilitates adjusting macOS GPU memory limits, allowing users to allocate more memory to the GPU by modifying wired memory settings.
Mistral Small 3.1 launches but requires updates: Mistral has announced Mistral Small 3.1 model claiming it outperforms Gemma 3 and GPT-4o Mini, however the release requires conversion to HF format before it can be used in llama.cpp

Links mentioned:

LM Studio ▷ #hardware-discussion (197 messages🔥🔥):

RTX 8000 vs A6000 for LLM inference, Multiple GPUs for running multiple LLMs, 48GB RTX 4090 from China, AMD Strix Halo APU vs RTX 5080 in AI, Mobo/RAM Choice for AI PC build

RTX 8000 Still Viable for LLM Inferencing: Members discussed the RTX 8000 48GB, noting its decent value for LLM inferencing despite being an older Turing architecture card with fewer cuda cores and lower bandwidth compared to newer cards like the A6000 and RTX 6000 ADA.
- One member stated that for inferencing, having large VRAM on one card is a huge advantage over two cards with the same VRAM because it eliminates interactions between GPUs that slow each card down by nearly half.
Multiple GPUs for Multiple LLMs: LM Studio Update Coming?: Members discussed the possibility of running multiple LLMs on separate GPUs using LM Studio, with one member noting that, right now_ you'd have to use the tensor cuda thing env variable before running up either GPU with the CUDA_VISIBLE_DEVICES environment variable.
- Another member hinted at a future LM Studio release that will allow setting GPU affinity within the app itself, linking to a Discord message as evidence.
Chinese 48GB RTX 4090s: A Cheap VRAM Boost?: Members discussed sourcing 48GB RTX 4090s from China, with prices around $4500, noting that they use a blower-style fan and consume only two PCIe slots.
- However, one member cautioned about driver compatibility issues when combining these cards with professional cards like the A6000, stating that the setup only works if I use a gaming driver from NVidia - the studio professional drivers won't load on the 'gaming' cards.
AMD's Strix Halo APU Could Outperform RTX 5080 in AI: A member shared an article from wccftech claiming AMD's Ryzen AI MAX+ 395 "Strix Halo" APU offering over 3x the boost over RTX 5080 in DeepSeek R1 AI benchmarks.
- The claim is based on the APU's larger VRAM pool, with the hope that there will be real world data soon.
Optimizing Mobo/RAM choice for AI PC build: A member requested advice on motherboard/RAM choices for an AI PC build, especially regarding potential PCIe lane conflicts with M.2 drives, with a build based on this pcpartpicker.
- Another member suggested that it is not really beneficial to get ram speed faster than 6200 on AM5 and linked to a memory kit while noting that two M.2 drives are mostly idling.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Anthropic Incident, Claude 3.7 Sonnet, Endpoint Quality Measurement

Anthropic Declares Sonnet's Error Spike Incident Resolved: Anthropic declared an incident (status page) related to significantly elevated errors for requests to Claude 3.7 Sonnet from 21:45–22:14 UTC, Mar 14, 2025.
- The incident affected claude.ai, console.anthropic.com, and api.anthropic.com.
Anthropic Explores Endpoint Quality Gauges: Anthropic is researching ideas to measure endpoint quality and is open to community input.
- No commitments were made as the team is just researching ideas.

Link mentioned: Elevated errors for requests to Claude 3.7 Sonnet: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Personality.gg Launch, RP Sites and OpenRouter API, Chub and Sillytavern Recommendation

Personality.gg launches new AI character platform: Personality.gg launched a new platform to create, chat, and connect with AI characters using models like Claude, Gemini, and Personality-v1, featuring custom themes, full chat control, and NSFW allowance.
- The platform offers flexible, affordable plans and encourages users to join their Discord for updates.
RP Site Seeks OpenRouter API Support: A member inquired about roleplay (RP) or novel sites that support the OpenRouter API, expressing dissatisfaction with Novelcrafter's stability and Janitor AI's context limitations.
- They cited NovelAI always crashing and Janitor AI limited to only 128k context as reasons for seeking alternatives.
Chub and Sillytavern advised for RP: A member recommended Chub or Sillytavern (local web frontend) as alternatives for roleplaying.
- The member positioned Sillytavern as a local webend option to overcome the limitations of other platforms.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (443 messages🔥🔥🔥):

Gemma 3, RP models, Mistral Small 3.1, OpenRouter OpenAPI spec, Reasoning Tokens

Parasail Hosts New RP Models: Parasail is looking to host new roleplay models on OpenRouter and is proactively working with creators like TheDrummer to host new fine-tunes of models like Gemma 3 and QwQ.
- They are seeking individuals who create strong RP fine-tunes capable of handling complex instructions and worlds, with a particular interest in models that have been fine-tuned for roleplay and creative writing.
Anthropic API Outage Disrupts Claude 3 Sonnet: Requests to Claude 3.7 Sonnet experienced significantly elevated errors for approximately 30 minutes, as reported on Anthropic's status page.
- The issue has been resolved, and success rates have returned to normal as of March 14, 2025, but some users experienced no text on replies while still being charged.
OpenRouter API Rate Limits Explained: OpenRouter's rate limits depend on your credits, with approximately 1 USD equating to 1 RPS (requests per second), as clarified in the documentation.
- Users can check their rate limit and remaining credits by making a GET request to https://openrouter.ai/api/v1/auth/key, and while higher credit purchases enable higher rate limits, creating additional accounts or API keys makes no difference.
New Steelskull L3.3 R1 70B Model Launches: A new roleplaying model, Steelskull L3.3 R1 70B, has launched on OpenRouter, incorporating several models like TheSkullery's L3.1x3.3-Hydroblated-R1-70B-v4.4.
- The announcement encourages users to provide feedback on desired models, continuing the push for competitively priced RP options.
Mistral Small 3.1 Available: The Mistral Small 3.1 24B Instruct model has launched on OpenRouter, featuring multimodal capabilities and a 128k context window, according to Mistral's announcement.
- It outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

eofr: Scam

Perplexity AI ▷ #announcements (1 messages):

Perplexity Accuracy, Perplexity Video Ad

Perplexity Guarantees Accuracy: A member shared the slogan When you need to get it right, ask Perplexity.
Perplexity Shares Video Ad: A member posted a video ad for Perplexity.

Perplexity AI ▷ #general (409 messages🔥🔥🔥):

Perplexity Pro Oyster Game, Discord Pro Role, Gemini 2 Flash Context, Claude 3.7 Sonnet Limit, AI Coding Models

Oyster Game rewards diligent Perplexity Users: Perplexity users on Windows can now get a free 1 month of Perplexity Pro by using the app for 7 consecutive days.
Discord Pro Role Causes Dilemma: Users are having trouble accessing the Pro channels despite having a Perplexity Pro subscription.
- To fix this, users are recommended to leave the server and rejoin via the Discord link in their Perplexity Pro settings.
Users debate Gemini 2 Flash Context window issues: Users debate the context retention capabilities of Gemini 2 Flash, claiming it has a 1M context window but performs worse than regular Gemini.
- One user notes that it forgets the formatting after a few messages while making flashcards.
Figuring out Claude 3.7 Sonnet Limits: Users clarify that the usage limit for Claude 3.7 Sonnet with a Perplexity Pro subscription is 500 queries per day, but it is shared across all models except GPT 4.5.
- They also add that the context limit might be slightly more than on Anthropic's site, but the response context limit is smaller at 4000 or 5000 tokens.
Deciphering best AI Model for Coding: Users are seeking advice on the best AI model for coding, with recommendations leaning towards Claude 3.7 Reasoning.
- One user finds that Deepseek R1 has a high hallucination rate, making it unsuitable for summarizing documents.

Links mentioned:

Perplexity AI ▷ #sharing (32 messages🔥):

Quantum Chip, Willow, Vibe Coding, Lunar Lander, Dark Matter

Chinese Quantum Chip Rivals Willow: Perplexity AI highlights a YouTube video about a Chinese quantum chip rivaling Willow, the rise of Vibe Coding in software development, and discoveries about the universe.
Amazon Ends Echo Privacy Options: Perplexity AI references a page about Amazon ending Echo privacy options.
Lunar Lander Catches Eclipse: A link was shared about a Lunar Lander capturing an eclipse.
New Dark Matter at Milky Way's: A page discussing new dark matter at Milky Way's was shared.
Vibe Coding's Rise in Software: A link was shared to a page discussing Vibe Coding's Rise in Software.

Perplexity AI ▷ #pplx-api (5 messages):

Transferring Credits, API Pay-as-you-go Limits, Sonar Reasoning Pro Limits, French Translation

User Queries Credit Transfers: A user inquired whether it's possible to transfer credits to another user within the platform.
- The same user also questioned the availability of unlimited pay-as-you-go deep-research options through the API, particularly for applications experiencing huge bursts of bulk requests.
Sonar Reasoning Pro has Image limits: A user reported that the sonar-reasoning-pro API only returns a maximum of 5 images.
- They are asking if this limit is configurable or a hard constraint, as they found nothing about it in the documentation.
User Asks for Help with French Translation: A user inquired about how to integrate a French translator within the Perplexity AI platform.
- No solution was offered in the channel.

Yannick Kilcher ▷ #general (356 messages🔥🔥):

Rust Community Toxicity, C vs C++, Optimization vs Search, Stochastic Differential Equations

Rust Faces Toxicity Accusations: Members debated the toxicity of the Rust community, with some saying the organization is imploding and others comparing it to the Ruby community.
- One member stated, The Rust community is pretty toxic. The org has kinda imploded on themselves recently.
C's Brokenness Debated: A member described C as ancient, broken, and garbage while another argued that C is not broken, highlighting its use in international standards and various hardware platforms.
- A member linked to faultlore.com arguing that C Isn't A Programming Language Anymore.
Optimization vs Search Unpacked: Members discussed the difference between optimization (finding the maximal or minimal value of a function) and search (finding the best element of a set).
- One member stated that search is exploration, not like optimization. And another stated the process of designing or selecting that model—choosing the architecture, tuning learning rates, etc.—has a search-like flavor.
Stochastic Processes Explored: A member offered to give an introduction to stochastic processes, stochastic differential equations, and the derivation of the time-reversal SDE used in diffusion-based AI architectures.
- The member planned to cover the foundations of stochastic processes, Wiener processes, and what a Stochastic Differential Equation is.

Links mentioned:

Yannick Kilcher ▷ #paper-discussion (4 messages):

LLM Literature Review, Gemma 3 Model

Seeking LLM Lit-Review Legitimacy: A member asked for a good paper for a literature review on LLMs and pointed to a blogpost of top AI papers.
Gemma 3 Gazes with Grandeur: Gemma 3 is a lightweight open model family (1B–27B parameters) that integrates vision understanding, multilingual coverage, and extended context windows (up to 128K tokens).
- It incorporates a frozen SigLIP vision encoder, condensing images into 256 soft tokens and has a new Pan & Scan (P&S) method, watch the YouTube video.

Link mentioned: 🥇Top AI Papers of the Week: The Top AI Papers of the Week (Mar 10 - 16)

Yannick Kilcher ▷ #ml-news (59 messages🔥🔥):

AI Safety Institute ideological bias, Deepseek R2 release and its issues, SesameAILabs CSM model disappointment, Hallucination in AI search engines, Mistral Small 3.1 release

AI Safety Institute Pushes Ideological Alignment: The National Institute of Standards and Technology (NIST) instructed partners of the US Artificial Intelligence Safety Institute (AISI) to deprioritize AI safety, responsible AI, and AI fairness, focusing instead on reducing ideological bias and prioritizing human flourishing and economic competitiveness as reported in Wired.
Deepseek R2 Hype Train Derailed?: A member shared a Reddit post about the release of Deepseek R2, noting its potential impact on Nvidia stock.
- However, some users found the model underwhelming, particularly its Text-to-Speech (TTS) capabilities, describing it as not a true speech-speech model and experiencing inconsistent voice generation on Mac.
SesameAILabs' CSM Falls Short of Expectations: Users expressed disappointment with the released small model of SesameAILabs' CSM, citing numerous bugs and a significant performance gap compared to the demos, reported in this Github issue.
- The released model is criticized for poor punctuation handling and slow performance, raising doubts about the future release of larger, more promising models.
AI Search Engines Hallucinate News, Researchers Find: A report by the Columbia Journalism Review found high hallucination rates across multiple AI search engines, including Perplexity, ChatGPT, and Grok, in citing news sources.
- Notably, premium models like Perplexity Pro and Grok 3 exhibited higher error rates despite their enhanced capabilities and cost.
Mistral Small 3.1 Claims Top Spot in Weight Class: Mistral AI announced the release of Mistral Small 3.1, boasting improved text performance, multimodal understanding, and a 128k token context window under an Apache 2.0 license.
- The company claims it outperforms comparable models like Gemma 3 and GPT-4o Mini, with inference speeds of 150 tokens per second.

Links mentioned:

HuggingFace ▷ #announcements (1 messages):

SmolVLM2, Gradio Sketch 2.0, DCLM-Edu Dataset, huggingface.js GGUF metadata, Robot Arms for $299

SmolVLM2 Released, Smallest VLM Ever: The team released SmolVLM2, the smallest VLM that can understand videos and runs flawlessly on an iPhone app with its 500M version.
- Source code and a TestFlight beta are available for reference.
Gradio Sketch 2.0: No-Code App Building: Gradio Sketch 2.0 is out, supporting complete Gradio apps with events, all without writing any code.
- The new features enable users to build applications via the GUI.
DCLM-Edu Dataset Released: A new dataset, DCLM-Edu, was released; it's a filtered version of DCLM using FineWeb-Edu’s classifier, optimized for smol models like SmolLM2 135M/360M.
- The purpose is that small models are sensitive to noise and can benefit from heavily curated data.
Gemma 3 is Live, Deployable from HF endpoints: Gemma 3 is live and can be deployed directly from Hugging Face endpoints with optimally selected hardware and configurations.
Agents Course Now Diversifying with LlamaIndex: The agents course is expanding with a unit on LlamaIndex, covering topics like LlamaHub integrations, agents and tools in LlamaIndex, and multi-agent workflows.
- Unit 2 will prepare you the real world use cases in unit 3. Where you can use the framework of your choosing.

Links mentioned:

HuggingFace ▷ #general (141 messages🔥🔥):

Dou Shou Qi AI, Stable Diffusion Model, CSM Streaming Generator, Gemini 2.0 Flash Experimental, Hunyuan 3D-2 API

Models Duel in Dou Shou Qi, AI Triumphs!: Only one model made no illegal moves in Dou Shou Qi, a game that is a tough beast to crack for AI, but easy for humans.
- A member suggested that you can use any means possible to train it, even farming expert/master human games, but keep in mind that stockfish was made for classic european chess, and Dou Shou Qi is a totally different game.
Hackathon Hoopla for Budding Brains: AI developers are seeking recommendations for global hackathons, aiming to connect with individuals worldwide and engage in impactful AI-focused events.
- Participants are eager to explore innovative solutions and collaborate with like-minded experts in the AI community.
MCP Servers Get Love, Span Across Front Products: Members discussed the use of MCP Servers for tools, and how it's implemented in Claude and ChatGPT.
- An enthusiast made an actual robot using Arduino ESP 32 and controlled it with Claude AI MDC protocol, very impressed with what all we can do with AI.
Inspirit AI Seeks New Recruits for Summer 2025: Gabriel Salem shares that they were accepted to the Inspirit AI Ambassador program, offering AI fundamentals and project-building for middle and high school students.
- The program guides students in building socially impactful projects such as self-driving car simulation, exoplanet detection, and criminal justice.

Links mentioned:

HuggingFace ▷ #today-im-learning (3 messages):

ML for 3D, HuggingFace Agents course, Retrievel agent

Diving into Dimension: ML for 3D Begins: A machine learning engineer is embarking on a ML for 3D course today.
- They also offered to recommend some courses.
Smol Agents Framework Completion: A member is learning from the HuggingFace Agents course and has completed the first framework, smolagents.
- They shared their excitement about this achievement.
Retrievel Agent: New Learning Frontier: Another member is currently learning about the Retrievel agent from the Agents course.
- This indicates ongoing engagement with and exploration of the course's content.

HuggingFace ▷ #cool-finds (2 messages):

Cross-posting

User Criticizes Cross-Posting of YouTube Link: A user shared a YouTube link and another user immediately criticized them for cross-posting.
- The second user explicitly stated, *"I've already asked you not to cross-post and to keep posts in topic."
Request to Keep Posts in Topic: Following the sharing of a YouTube link, a user requested that posts be kept in topic.
- This suggests a concern about the relevance of the shared link to the channel's main discussion.

HuggingFace ▷ #i-made-this (5 messages):

Awesome Vibe Coding, Local LLMs setup, FluxHands-FingerCount Dataset

Coding Vibes with AI Get an Awesome List: An "Awesome Vibe Coding" list was announced with tools, editors, and resources that make AI-assisted coding more intuitive and efficient.
- The list includes AI-powered IDEs & code editors, browser-based tools, plugins & extensions, command line tools, and latest news & discussions.
Local LLMs Assist Coding: A member wrote an article on how to set up free local coding AI assistant for VS Code and tested it this week.
Dataset Counts Fingers: A dataset of hands with various numbers of fingers, named FluxHands-FingerCount was created and manually labeled.
- Each image contains a human hand in the center, rendered in different styles, and was generated using Flux.

Links mentioned:

HuggingFace ▷ #reading-group (1 messages):

coldbreeze.: Free fire

HuggingFace ▷ #computer-vision (4 messages):

Autonomous Driving blogpost, VLMs Research Hub, HF DETR model, Meta's Segment Anything Model (SAM)

Autonomous Driving Blogpost Released: A member announced the completion of their blog post on autonomous driving, covering modular pipelines vs end-to-end approaches and LLMs, and shared a link to the Medium article.
- They asked for thoughts and feedback on the content.
Vision-Language Models (VLMs) Research Hub Launched: A member announced the creation of a community-driven hub for multimodal researchers working on Vision-Language Models (VLMs) at this github repo.
- The hub will be updated weekly and welcomes contributions and suggestions.
Backbone Swapping in HF DETR Model: A member inquired about successfully swapping the Backbone to, for example, ViT in the Hugging Face DETR model.
- No solutions or suggestions were provided.
SAM Fine-Tuning: A member inquired about fine-tuning Meta's Segment Anything Model (SAM).
- No solutions or suggestions were provided.

Link mentioned: GitHub - thubZ09/vision-language-model-hub: Hub for researchers exploring VLMs and Multimodal Learning:): Hub for researchers exploring VLMs and Multimodal Learning:) - GitHub - thubZ09/vision-language-model-hub: Hub for researchers exploring VLMs and Multimodal Learning:)

HuggingFace ▷ #NLP (2 messages):

SetFit with LoRA, SmolLM as teacher model

Train Embedding Model using SetFit & LoRA: A member inquired about training an embedding model with LoRA adapters via SetFit.
SmolLM Distillation Idea: A member mentioned juggling with the idea of using something like SmolLM as a teacher model for distillation.

HuggingFace ▷ #smol-course (3 messages):

smol-course, HuggingFace Agents course, HF inference credits

Smol-Course differs from HF Agents course: A member asked if the smol-course is different from the HuggingFace Agents course to which another member confirmed they are different.
- That member noted that the Agents course Discord is missing and that every single code notebook was broken, suggesting skipping the course.
HF Inference Credits cost course participation: A member reported that the HuggingFace Agents course asked for HF inference credits for money, even though the course claimed to be free.
- The member understood that API calls cost money, but suggested they should have developed the full course within the context of free credits.

HuggingFace ▷ #agents-course (134 messages🔥🔥):

Agentic AI team building, Smolagents and Gemma3 issues, Ollama Context Length, HF Course Verification problems, MCP and Smolagent framework

AI Enthusiasts Unite to Build Agentic AI Together: Several members including Bijen, tariqbinbashar, Madhusudhan, and Salaar expressed interest in collaborating on agentic AI projects to solve business problems and enhance their knowledge.
- The call to action aims to form teams and build qualified AI Agents for American consumers and learn together.
Gemma3 struggles with Smolagents Regex Patterns: A member ran into the dreaded regex pattern error while using gemma3:12b with smolagents, suspecting either a model issue or a bug with Ollama integration through LiteLLM/OpenAI.
- The user eventually solved the issue by increasing the Ollama context length.
Ollama Context Issue Sorted: A member discovered that Ollama was truncating input due to context-token limits, impacting the functionality of smolagents.
- The fix involves setting the environment variable $env:OLLAMA_CONTEXT_LENGHT=8192 to achieve much better results.
HF Course Verification Redirect Loop: Several users reported issues with the Hugging Face Discord verification process, encountering a redirect loop even after following the steps in the relevant channel.
- A user suggested ensuring the link between the Hugging Face account and Discord is properly established, while another said to keep trying until it works.
Smolagents and potential of MCP Integration: A member expressed that using VLM and MCP in the smolagent framework could create robust agents and hoped these would be added as a unit in the course.
- The discussion evolved into how to reuse tools implemented for one agentic framework into another, and if MCP was indeed the best option for this.

Links mentioned:

HuggingFace ▷ #open-r1 (2 messages):

Open-R1 Reasoning Distillation, grpo code, distributed grpo

Reasoning ability of Open-R1: The reasoning ability of Open-R1 is planned to be fully distilled from other models.
- A user noted that there is also code for grpo in the openR1 repository.
grpo distributed across nodes: According to what is in the blog post, distributing grpo across nodes is not supported yet.
- The user also included a :hugging_rocket: emoji.

Interconnects (Nathan Lambert) ▷ #news (74 messages🔥🔥):

Long Context Evals, 3D Generation Upgrade, DeepSeek Engineer passports, Figure's BotQ humanoid robots, Nvidia Blackwell GPUs and Together AI

Arc Prize Announces Future Date: Arc Prize tweeted an announcement dated 3/24/2025 regarding their future plans.
DeepSeek Passport Controversy Debunked: A DeepSeek engineer denied the rumor of passport-related policies, refuting claims in The Information and stating that they are still harassed by headhunters.
- Another researcher emphasized that handing in passport is SOE type treatment and not aligned with DeepSeek's culture, dismissing the claims as disinformation.
Figure Launches BotQ for Humanoid Manufacturing: Figure announced BotQ, a new high-volume manufacturing facility with a first-generation line capable of producing up to 12,000 humanoid robots per year, vertically integrating manufacturing and building software infrastructure.
- The company aims to control the build process and quality, even hinting at Robots Building Robots.
Baidu drops ERNIE 4.5 and X1: Baidu unveiled ERNIE 4.5 and ERNIE X1, with X1 reportedly matching DeepSeek R1's performance at half the price, also announcing that their chatbot, ERNIE Bot, is now free for individual users, available on their website.
Mistral releases Small 3.1: Mistral AI announced Mistral Small 3.1, a new model with improved text performance, multimodal understanding, and a 128k token context window, outperforming models like Gemma 3 and GPT-4o Mini with inference speeds of 150 tokens per second, released under an Apache 2.0 license.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-questions (7 messages):

R1 inference costs, Deepseek free service, Hosting models locally, Fireworks alternative

Economical R1 Inference Options Explored: The most cost-effective method to infer R1 involves leveraging inference providers that have already optimized costs.
- Alternative strategies include utilizing Deepseek's free service, or utilizing an existing GPU with surplus electricity, though full R1 requires substantial GPU resources.
Local Model Hosting Strategy Involves Nvidia Helm Charts: A member is planning to acquire GPUs for local model hosting, intending to utilize Nvidia's helm charts.
- Another member suggested that using an inference provider is the "cheapest way to use an inference provider who have already cost-optimised".
Fireworks alternative: A member using Fireworks is looking for alternative recommendations.

Interconnects (Nathan Lambert) ▷ #ml-drama (32 messages🔥):

OpenAI vs Elon Musk legal battle, Zochi AI Scientist, ICLR conference spam, AI reviewers, Liam Fedus leaving OpenAI

OpenAI and Elon Duke it out in Court: A member shared a link to an article about the court rejecting some of Elon Musk's claims against OpenAI, calling their actions petty and undignified.
Zochi the Artifical Scientist Debuts: IntologyAI debuted Zochi, which they call the world’s first Artificial Scientist, with state-of-the-art contributions accepted in ICLR 2025 workshops, according to this Tweet.
AI Slop Papers Threaten ICLR Conferences: There's concern that conferences like ICLR will be spammed by slop papers generated by AI, forcing humans to read them and potentially leading to a counter-response of using AI reviewers.
Liam Fedus Bails OpenAI to Found Materials Science AI Startup: Liam Fedus, OpenAI's VP of research for post-training, is leaving the company to found a materials science AI startup, with OpenAI planning to invest in and partner with his new company (source).
- One member called the post training job a hot potato.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (101 messages🔥🔥):

Claude Code Vim mode, Gemma 3 License, Deepseek integrated in Chinese food delivery, LLMs as copy editors, Free Speech Eval

Claude Code gets Vim Mode: Claude Code now has Vim mode, giving users familiar insert/command modes for editing prompts by typing the slash command /vim (source).
Gemma 3 License Restricts Commercial Use: Google released Gemma 3, praised for its efficiency, but its license makes commercial use risky, similar to Meta's custom, non-standard licensing terms (source).
Deepseek powers food delivery in China: Chinese food delivery apps have integrated Deepseek to provide summaries of the food, displayed with the name DeepSeek prominently, enhancing credibility (source).
- The mention of DeepSeek instead of just AI adds credibility, positioning it as a national symbol.
LLMs Vibe Check as Copy Editors: One member shared vibe checks on LLMs as copy editors, finding Sonnet-3.7 horrible, Opus great but compresses long inputs, and GPT-4.5 the new main for quality (source).
Claude Sonnet-3.7 Dominates Free Speech Eval: Claude-3.7-Sonnet significantly improved in free speech evaluations, becoming one of the most compliant models, though it still avoids satirizing national anthems (source).

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (3 messages):

Azure AI Agents API vs OpenAi Assistants API, Mistral Meow

Azure's API Deception is No Accident: A user pointed out that the new Azure AI Agents API is actually the deprecated OpenAI Assistants API.
- The user wryly commented, "Brilliant play".
Mistral Releases new Chatbot, Meow!: Mistral has released a new chatbot called Meow at meow.mistral.ai.
X user seeks help, tags AI Leaders: An X user seeks help and tags Logan Kilpatrick, V Gabeur, Mehrdad Dehghani, and Robert Riachi in this tweet.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #rl (39 messages🔥):

GRPO implementation trick, Applying KL penalty in the loss, DAPO algorithm, Zero-shot RL

GRPO Uses Loss Penalty Trick: A member discussed a GRPO implementation trick that applies a penalty in the loss, as opposed to traditional RLHF which applies it to the reward, noting its impact is hard to determine but may help the model focus on reward signals, as described in the RLHF book.
- It was also pointed out that the math may be wrong.
KL Penalty Placement Gets Questioned: A member inquired about the effect of applying the KL penalty directly in the loss versus when the reward is computed, asking for intuitions or ablations on the subject via Twitter.
- The discussion touched upon whether normalization by token helps with learning dynamics, and if a per-token formulation would be "better."
Decoupled Algorithm Dominates Deep Reasoning: A new DAPO (decoupled clip and dynamic sampling policy optimization) algorithm and model called DAPO-Zero-32B were introduced, outperforming DeepSeek-R1-Zero-Qwen-32B on reasoning tasks, achieving a score of 50 on AIME 2024 with fewer steps, and trained with zero-shot RL from the Qwen-32b pre-trained model, all open-sourced at dapo-sia.github.io.
- It was noted that if a reasoning pattern contributes to the reward, its contribution will be much lower if it's part of a long chain of thought under row mean.
DAPO Dataset Gets Massive Upscaling: It was discovered that the authors of DAPO accidentally duplicated the dataset by roughly 100x, which resulted in a dataset of 310 MB, and a member created a deduplicated version via HF's SQL console, reducing the dataset to 3.17 MB (HuggingFace Dataset).
- The authors acknowledged the issue, stating that they were aware but can't afford retraining.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #reads (4 messages):

Noam Chomsky, Nicholas Carlini, Future of LLMs, AI risks

Noam Chomsky Makes Rare Appearance: A member shared a YouTube video featuring a rare appearance of Noam Chomsky.
- The member humorously added that every prominent AI figure needs a signature hat.
Carlini Forecasts Wide Error Bars for LLMs: A member shared a link to Nicholas Carlini's blog post on the potential future of LLMs.
- Carlini writes that they "wouldn't be surprised if, in three to five years, language models are capable of performing most (all?) cognitive economically-useful tasks beyond the level of human experts" but that there's very wide error bars on that possibility.

Link mentioned: My Thoughts on the Future of "AI" : no description found

Interconnects (Nathan Lambert) ▷ #expensive-queries (36 messages🔥):

RLHF Book, Claude Code vs ChatGPT, Chorus writing checker, ChatGPT Deep Research for teaching websites

RLHF Book gets typo assist!: Members are using Deep Research to find all the typos in the RLHF book.
- They're trying out Claude Code for the same task, saying it seems to be working too, but Gemini Deep Research sucked.
Chorus is nicer than Grammarly: A member has been using Chorus to check their writing with all LLMs and the different things they find always surprise them.
- It's also just much nicer to just have AI do it and supervise, because Grammarly sucks.
ChatGPT Deep Research feedback is generic: A member found ChatGPT Deep Research's feedback on their teaching website to be generally positive but pretty generic and cliche.
- It suggested categories of problems instead of identifying specific high-value issues, and also claimed there were broken links when there weren't.

Links mentioned:

MCP (Glama) ▷ #general (224 messages🔥🔥):

Swarm vs Mesh vs Sequence for multi-agent systems, OpenSwarm and OpenAI-agents, mycoder.ai vs claude-code, Monetizing MCP services, Glama scans

Multi-Agent Systems Topologies Debate Swarm vs Mesh vs Sequence: A member initiated a discussion on Swarm, Mesh, and Sequence architectures for multi-agent systems, seeking resources and advice, while struggling with sub-agents going off-track due to the telephone game effect.
- One member suggested the problems might relate to parallel execution and unsupervised autonomy issues, where the handoff of execution between agents includes swapping system instructions, available functions, and even the model or provider being used.
OpenSwarm's Evolution into OpenAI-Agents: A member mentioned working on OpenSwarm for a client and its subsequent adoption by OpenAI, rebranded as openai-agents, with additional OpenAI-specific features, while noting a rejected PR for MCP support.
- They also mentioned rumors that CrewAI (or PraisonAI?) might offer similar functionality using a stateless single thread agent approach.
mycoder.ai Launched just before claude-code: A member noted the coincidental launch of their mycoder.ai just before Claude-code was announced, adapting by posting it to Hacker News and reaching the front page, check it out here.
- It was noted that claude-code is Anthropic-only, creating demand for a more generic solution, with success using litellm proxy.
Discussions Spark on Monetizing MCP Services: Members debated the possibilities of monetizing their MCP services, touching on the challenges with API resale restrictions and the potential for BYOK (Bring Your Own Key) models.
- Some suggested focusing on unique services or scraping agents, while others expressed caution due to API terms, with one member only interested in coffee money kind of donation.
Glama's Inspection Process of MCP Servers Debated: A member questioned the frequency of Glama scans and the ability to trigger rescans for MCP servers, and the discussion revealed that scans are tied to the frequency of commits to the associated GitHub repository.
- Difficulties were reported with servers failing to be inspected, showing a Could not inspect the server message on the Score tab, even after fixing dependency issues and successfully running in the inspector, with work on triggering refreshes underway, for more info see Glama AI.

Links mentioned:

MCP (Glama) ▷ #showcase (25 messages🔥):

Awesome Vibe Coding, Roo Code MCP, MacOS Control MCP, Secretary MCP, Professional Graph MCP

Awesome Vibe Coding List Launched: A curated list of tools, editors, and resources that make AI-assisted coding more intuitive and efficient called Awesome Vibe Coding was created.
- The list includes AI-powered IDEs, browser-based tools, plugins, and command line tools to enhance workflows, and a member even had their AI coder make a PR back to the repo, as well as suggest the addition of Roo Code.
MCPs Galore: Creating Custom Servers: A user created an app that allows users to create their own MCP server with custom or community prompts called Groove Studio.
- They are looking for user feedback, with some users suggesting features like an MCP that gives a model the ability to control MacOS natively and non-natively, or a "Secretary" MCP that uses a memory bank of texts, emails, calendar and notes.
Emojikey MCP Server Updated: A member announced an update to the Emojikey MCP server with 'lots of goodness,' calling it essential for vibe coding and linking to the GitHub repository.
- It allows users to save your unique relationship state and interaction style with your favorite LLM.
Game Asset MCP Server Seeks Testers: A member is looking for testers for a Game Asset MCP server.
- This MCP server is for creating 2D/3D game assets from text using Hugging Face AI models.

Links mentioned:

Latent Space ▷ #ai-general-chat (34 messages🔥):

Agentic systems multi-threading, Claude's Birthday, GPT-o1 Acing Math Exams, SAE Bench Release, Baidu ERNIE 4.5 & X1

Agentic systems multi-threading discussed: Discussion on how to design an agentic system for multi-threaded, parallel execution of long-running tasks, with the consensus that there wouldn't be significant design differences from other parallel applications.
- The primary focus should be on managing API consumption effectively in a multi-threaded environment.
Claude celebrates 2nd birthday: Claude celebrated its second birthday, highlighting its use for company OSINT due to its reduced refusal rate compared to ChatGPT for deep research.
- A user considered it is great for company OSINT compared to chatGPT deep research because it refuses to answer questions much less.
GPT-o1 Aces Carnegie Mellon Math Exam: GPT-o1 achieved a perfect score on a Carnegie Mellon undergraduate math exam, solving each problem in under a minute for about 5 cents each according to this post.
- The exam, designed with non-standard problems, was also open-book and open-notes, impressing the instructor who noted that this was close to the tipping point of being able to do moderately-non-routine technical jobs.
SAE Bench Released for Sparse Autoencoder Evaluation: The full release of SAE Bench, a suite of Sparse Autoencoder (SAE) evaluations designed to improve SAE research by providing better metrics, was announced in this Tweet.
- The suite includes proxy summary statistics, downstream task performance metrics, and evaluations of known flaws, alongside a set of open-source SAEs across 7 architectures.
Baidu Launches ERNIE 4.5 and X1, Making ERNIE Bot Free: Baidu unveiled ERNIE 4.5 and ERNIE X1, with ERNIE X1 reportedly matching DeepSeek R1's performance at half the cost according to this announcement.
- In addition, ERNIE Bot has been made freely accessible to individual users ahead of schedule, with both models available on the official website.

Links mentioned:

Latent Space ▷ #ai-announcements (5 messages):

Snipd Podcast, AI Podcast App, Outdoor Podcast, Tech Stack, Switching from Finance to Tech

Snipd Podcast Gets Fresh Air: A new Snipd podcast featuring Kevin Smith was released, discussing the AI Podcast App for Learning.
- This episode marks their first outdoor podcast, with @swyx and @KevinBenSmith chatting about aidotengineer NYC, switching from Finance to Tech, and the tech stack of @snipd_app.
Fan Loves Snipd, Shares Photo: A user expressed their love for Snipd and shared a photo as proof.

Link mentioned: Tweet from Latent.Space (@latentspacepod): 🆕 Snipd: The AI Podcast App for Learninghttps://youtu.be/FNRO_SYx68QOur first ever OUTDOOR podcast! @swyx and @KevinBenSmith chat about @aidotengineer NYC, switching from Finance to Tech, how AI can ...

Latent Space ▷ #ai-in-action-club (122 messages🔥🔥):

Claude 3.5 vs 3.7, Vibe Coding, Levelsio Flight Simulator, Auto Git Commits, Enterprise AI Dev Team Enablement

Claude Showdown: 3.5 vs 3.7: Members debated the merits of using Claude 3.5 over 3.7, citing that 3.7 is way too eager and does things without being asked.
- Others said they used Claude 3.5 and were experiencing GPU issues.
Vibe Coding: A New Development Meta: The concept of "vibe coding," particularly using tools like Cursor, was discussed, with one member referencing a tweet by Levelsio where he built a flight simulator in the browser using Cursor.
- A member shared a followup tweet about the same project reaching $1 million ARR in just 17 days by selling in-game ads.
Auto Git Commits: Members discussed automatically creating git commits with every line accepted by the LLM, mentioning tools like aider and linking to gitdoccommits.
- One member proposed that traditional IDEs may not be the right UI for vibe coding and suggested visualizing the tree of changes prompted by different chats.
Enterprise AI Dev Team: A member offered to share insights on enterprise AI dev team enablement at some point, mentioning its corporate nature.
- Another member expressed interest in hearing about the hurdles and red tape involved in getting Cursor into an organization.

Links mentioned:

Notebook LM ▷ #use-cases (27 messages🔥):

Gemini-integrated Android, Deepseek R1 Impact, Audio Overview Length, NotebookLM Use Cases, Hyperbolic Tapering Schedule

Users clamor for Gemini-Integrated Android: Multiple users expressed strong interest in a fully Gemini-integrated Android experience, envisioning a powerful combination of Google Assistant/Gemini with NotebookLM.
- Some users expressed frustration with the current Gemini implementation on Android, hoping for rapid improvements.
Deepseek R1 Shakes the AI Market: A user commented on the dramatic escalation in the AI market following the release of Deepseek R1, which offered reasoning capabilities at a low cost, impacting Gemini 2.0 and other models.
- The user noted that the release of Deepseek R1 seemingly shook the whole industry and spurred the release of several new models from other companies.
Users Seek to Extend Audio Overview Length: A user inquired about the possibility of increasing the length of audio overviews generated by NotebookLM, noting that 16,000-word files only produced 15-minute overviews.
- They desired at least 1-hour+ overviews, but no concrete solutions were provided by the community.
NotebookLM helps user Taper Off Psychiatric Meds!: One user is using NotebookLM to construct a hyperbolic tapering schedule for a psychiatric medication, finding correlational studies to make a taper schedule.
- Another user cautioned that tapering based on data on any platform should not be done alone without expert professional opinion.
Users want NotebookLM integration into internal portals/CRMs: A user inquired about integrating NotebookLM into an internal portal/CRM with videos and knowledge base articles, for consultants to ask questions and get answers from the portal.
- A user suggested Agentspace could be exactly what they're looking for, as it's integrated with NotebookLM.

Link mentioned: Google Agentspace: Google Agentspace is the launch point for enterprise-ready AI agents, helping increase employee productivity for complex tasks with one single prompt.

Notebook LM ▷ #general (132 messages🔥🔥):

Extracting Google Sheets for LM, Gemini for data analysis, Public sharing of NotebookLM, Using NotebookLM to prevent errors, NotebookLM limitations and solutions

Users Brainstorm Google Sheets Extraction for NotebookLM: Users discussed methods for extracting Google Sheets into a readable format for NotebookLM, with one suggesting using BigQuery and SQL with Gemini to generate queries for data analysis.
- Another user mentioned a Sheets function that reads a cell and passes it as context to a prompt to generate answers, useful for RFP situations.
Public Notebook Sharing Potentially on the Horizon: A user inquired about enabling public sharing of NotebookLM notebooks, envisioning it as a new form of publishing.
- A Google employee responded that they are "super interested in the idea that the notebook is a powerful new way of collecting and sharing information" and actively working on the feature.
Experimenting to Prevent NotebookLM Errors: A user shared their approach to prevent NotebookLM from repeating errors by creating an 'Errors' source document with examples of past mistakes.
- Another user suggested that such instructions might not have any impact on the responses because NLM uses RAG, it's not injecting the complete user inputs (sources) into the context window of the LLM.
NotebookLM Limitations and Potential Solutions: A user reported that Audio Overviews can't be fast-forwarded during interactive beta, also asking for increased audio overview generation length.
- Another user suggested Agentspace as a potential solution for integrating NotebookLM with various data sources and internal portals.
Agentspace to rescue NotebookLM Enterprise: A user inquired about integrating NotebookLM with an internal portal and CRM with videos and knowledge base articles, electioneering suggested that, NotebookLM doesn't have an API you could use and it doesn't support connecting to the types of data sources you mention.
- electioneering suggested looking at Agentspace as a solution, which includes and is integrated with NotebookLM Agentspace.

Links mentioned:

GPU MODE ▷ #general (6 messages):

Jake Cannell hiring GPU devs, sm90 kernels, GPU performance counters, nebius.ai, Datacrunch

Jake Cannell Staffing Up with GPU Devs: Jake Cannell is hiring GPU developers to work on ideas he touched on in his talk.
Academics Seek Budget-Friendly GPU Cloud: A researcher is looking for cheap cloud providers that give access to GPU performance counters to run Nsight Compute for implementing ideas for sm90 kernels.
nebius.ai touted for GPU cloud: A member recommends nebius.ai, citing a Reddit thread from 9 months ago, as a provider with access to GPU performance counters.
Datacrunch proposes student credits: A member suggested Datacrunch as a good option, offering potential credits for students.

GPU MODE ▷ #triton (13 messages🔥):

Embedded Python Pip Usage, Triton Windows PyPI Release, tl.multiple_of usage in Triton, Efficient Pointer Chasing in Triton, Triton and Sparse Computations

Triton-Windows Gets PIP Upgrade: Triton-windows has been published to PyPI, so you can install/upgrade it by pip install -U triton-windows, and you no longer need to download the wheel from GitHub.
tl.multiple_of Questioned in Triton: A user questioned the usage of tl.multiple_of with tl.arange, suspecting that only the first element is a multiple of BLOCK_SIZE_N, and wondered if they missed something.
Pointer Chasing Performance Pondered: A user asked about implementing efficient pointer-chasing in Triton for a custom data structure resembling a sparse matrix in CSR, seeking to avoid loading offsets one-by-one in a hot inner loop.
- One user suggested loading the whole offset array at once, and then using tl.sum() over tl.where() with the loop index to mask out all but one element, another user mentioned that Triton is not ideal for sparse references and computations, the authors mention this in Triton Lang Docs.
Powering Through with pow in Triton: A user inquired about how to use the pow (power) function in Triton.
- Another user pointed them to tutorial 07-extern-function.py as a reference.

GPU MODE ▷ #cuda (5 messages):

SASS compatibility with NVIDIA architectures, LD/ST unit sharing in SM microarchitecture, L1-dTLB cache, Cutlass 4.0 Python DSL, CUDA streams concurrency issues

SASS Instructions Across NVIDIA Architectures: A member shared a gist comparing NVIDIA SASS instructions across different architectures, extracted and compared (using Python) from NVIDIA's HTML documentation.
- The gist facilitates understanding of instruction set evolution across NVIDIA's GPU lineup.
LD/ST Unit Architecture Inquiry: A member questioned the sharing of LD/ST units between scheduling units in an SM, referencing the Ampere GA100 whitepaper which divides 32 LD/ST units between 4 scheduling units.
- They also inquired about the relationship between LSU, MIO, and LSUIN based on NVIDIA's nsight compute profiling guide, and if it takes 4 cycles to issue a LDG instruction if there are 32 threads.
L1-dTLB Cache Speculation: A member speculated that the L1/TEX cache is VIPT (virtually indexed, physically tagged), guessing that address translation happens between the LSUIN and the tag stage.
- No further discussion on this topic.
Cutlass 4.0 goes fully Python!: A member shared that Cutlass 4.0 is now fully Python, using Python DSL.
- This new version has performance parity with previous versions, and was presented at NVIDIAGTC.
CUDA Streams Show Concurrency Quirks: A member encountered strange issues with CUDA streams not executing concurrently as expected on an A800, despite resource availability.
- Analysis with nsys revealed prioritization of earlier streams and non-concurrent execution with specific shared memory configurations, with repeat set to 1,000,000.

Links mentioned:

GPU MODE ▷ #torch (13 messages🔥):

Torch Compile, Graph Breaks, Stride Issue, Std::variants in schemas

Torch Compile Struggles With Backward Pass: A member reported that while torch.compile works fine for the forward pass, it is quite slow in the backward pass when using torch.autograd.Function for custom kernels.
- They found that wrapping the backward function with torch.compile(compiled_backward_fn) could address this issue.
Graph Breaks Cause Compile Problems: It was noted that graph breaks in the backward pass can cause issues with torch.compile.
- One member found that using .stride(0) in their Triton kernel caused graph breaks, which they resolved by using constant values instead.
Stride(0) Issue Fixed in Nightly Builds: A member noted that they had issues with stride(0) causing graph breaks in their Triton kernel.
- Another member mentioned that the stride(0) issue has been fixed in the PyTorch nightly builds.
Schemas Struggle With std::variants: A member inquired about supporting std::variants in schemas, linking to the relevant PyTorch code.
- A core dev said it's fairly hard, and that they ended up settling for std::optional.

Link mentioned: pytorch/aten/src/ATen/core/op_registration/README.md at c7c3e7732443d7994303499bcb01781c9d59ab58 · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch

GPU MODE ▷ #announcements (1 messages):

Consumer GPU Performance, AGI, Neuromorphic hardware, vast.ai

Jake Cannell's Talk on Consumer GPUs: Jake Cannell is giving a talk in 30 min on Consumer GPU Performance, covering his early work in graphics, how GPUs became general purpose, his scaling journey, and the story behind vast.ai.
- The talk is framed as particularly relevant for those interested in AGI or neuromorphic hardware, promising a wild discussion.
Scaling Pilled Origins Revealed: Jake Cannell will discuss how he became scaling pilled and the motivations behind building vast.ai.
- This discussion may offer insights into the infrastructure and resource demands of AGI and neuromorphic hardware research.

GPU MODE ▷ #algorithms (6 messages):

Transformers without Normalization, LayerNorm, tanh, FA3, exp

Transformers Could Be Faster Without Normalization: A member noted that Transformers without Normalization (replacing LayerNorm with tanh()) should provide speed improvements.
- This is because LayerNorm requires reductions for the mean and variance of a sequence, which can be slow, whereas tanh() can be calculated on every element in registers and fused with the following matmul/linear layer.
tanh isn't cheap, but can be approximated: One member stated that tanh in itself isn't that cheap, as it requires an exp and a division.
- On Nvidia hardware there's tanh.approx (since Turing/sm_75), which purportedly has a throughput of 16/cycle/SM.
__expf() Is Faster for Small Values: A member suggested that __expf() is quite faster but is only good for smaller values.
- Others pointed out that FA3 incurred significant overhead due to the exp becoming a bottleneck.

GPU MODE ▷ #jobs (4 messages):

GPU Code Generation, ML Compiler, HPC Engineers, Superalignment Framework

GPU Mode Company Seeks AI Engineer for Code Gen: A company is hiring an AI Engineer to train/fine-tune models for GPU code generation with decent pay and a generous equity grant, backed by Jeff Dean and others; apply at jobs.mako-dev.com.
- They are building a next gen ML compiler that integrates AI into the compilation flow; make sure to add "GPUMODE" in the optional application.
Sesterce Seeks HPC Engineers for Massive GB200 Clusters: Sesterce is looking for HPC Engineers to build and manage their new Giga Colossius Cluster (18K GB200) and Colossius Cluster (8K GB200) with a hardcore engineering team across San Francisco, France & Singapore.
- The team includes Awni, described as one of the smartest and nicest people to work with.
Stealth Startup Bootstraps Superalignment Framework Architect: A stealth startup is hiring a machine learning framework software architect to build a superalignment framework on top of ScalarLM.com.
- The ideal candidate should be prepared to write all of the code themselves for their framework as part of a small, 5-person team and be ready for the bootstrapping life.

Link mentioned: Your connected workspace for wiki, docs & projects | Notion: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team

GPU MODE ▷ #beginner (9 messages🔥):

GPU coalesced access, Nvidia GPU read operation, GPU programming, CUDA learning resources, Installing Triton

GPU coalesced access vs permuted reads: When threads (0,1,2,3) read addresses (0,1,2,3) in a GPU, it's coalesced; however, reading at permuted addresses like (2,0,3,1) might result in 4 sequential reads instead of a single operation.
- If threads read addresses like 4*i+[0,1,2,3] with random i, where each thread reads inside its own memory bank at a random address, it is not clear if this is faster than reading with bank conflicts.
Nvidia GPU read operations generate single request per warp: A modern Nvidia GPU read operation generates a single request per warp to the L1TEX/LSUIN, operating at a sector level (32 bytes) with stored cache lines (4 sectors).
- The requests are processed internally in wavefronts, with more details available in Nvidia's developer forums and a GTC Spring 21 session.
Bank conflicts and memory differences explored: A GTC Spring 22 session helps understand bank conflicts.
- The presenter highlights the differences between the L1 data cache (where the return bandwidth is cache line/cycle) and shared memory (which depends on bank conflicts).
Cloud Resources for GPU Kernel Testing: For individuals new to GPU programming seeking to test CUDA/Cutlass/Triton kernels without local GPU access, Google Colab offers free access to computing resources, including GPUs and TPUs.
- Additionally, LeetGPU may provide alternative testing environments.
Troubles installing Triton on Windows: A user encountered an error while attempting to install Triton on Windows using pip.
- No specific solution was provided in the given context but the image shared in the message shows the error.

Links mentioned:

GPU MODE ▷ #pmpp-book (1 messages):

CUDA kernel, pytorch extension

Seeking Source Code for CUDA Kernel as PyTorch Extension: A member inquired if the presenter from lecture 2 posted their source code for calling the mean_filter CUDA kernel as a PyTorch extension.
Request for CUDA Kernel Code: A user is seeking the source code from lecture 2 to implement a mean_filter CUDA kernel as a PyTorch extension.

GPU MODE ▷ #off-topic (3 messages):

AI Agents Hackathon, NVIDIA GTC 2025, SLURM based HPC cluster IDE/Editor

Vertex Ventures US sponsors AI Agents Hackathon: Vertex Ventures US and CreatorsCorner are hosting an AI Agents Hackathon at NVIDIA GTC 2025, with $50k+ in prizes.
- Participants will build multimodal AI agents capable of sophisticated reasoning and interaction with various tools, with a 3-minute showcase to judges.
Cursor/VSCode are IDE Favorites for HPC Cluster Dev: A user asked what IDE/Editor people use to develop directly on a SLURM based HPC cluster, expressing frustration with VSCode's bloat in the /home/ directory.
- Another member suggested Cursor/VSCode, mentioning that most people on their work cluster use it and that the install directory can be changed.

Link mentioned: AI Agents Hackathon - GTC 2025 Edition (1 DAY) · Luma: AI Agents Hackathon - GTC 2025 Edition (1 DAY)As NVIDIA GTC 2025 unites the global AI community, Vertex Ventures US and CreatorsCorner, invite you to turns…

GPU MODE ▷ #irl-meetup (11 messages🔥):

Block Sparse Attention, GEMM, GTC Keynote Missed, GTC Hackathon results, GTC Meetup

Attendee Misses GTC Keynote Due to ESTA Error: One attendee expressed disappointment at missing the GTC Keynote due to failing to get their ESTA filled out beforehand.
- They mentioned the ESTA status had been stuck on pending for a day, preventing them from boarding their flight.
Inquiries about GTC Hackathon Results: An attendee inquired about where the GTC hackathon results would be posted, as they did not get into the GTC event itself.
- There was no answer to the user's question, which suggests the answer is unknown, or may be sent directly to those who participated.
Potential Post-GTC Meetup Discussed: There was discussion about having a meetup after the GTC conference for those who missed the initial meetup.
- This suggests that many were disappointed about being unable to attend GTC, so others agreed to organize something separate.
Attendees Request Slides from GTC Presentations: Attendees requested that the slides from previous GTC presentations be published somewhere.
- Another participant asked if anyone caught the last slide from Vijay Thakkar related to Nvidia GTC workshops.

GPU MODE ▷ #rocm (5 messages):

MI300X inference optimization, AMD Instinct MI300X workload optimization, DeepSeek-R1 on MI300X, SGLang Optimization

Experts Wanted for MI300X Inference: A member is seeking experts to help reduce inference times on the MI300X and is willing to share information after the consultation.
- They are looking for someone dedicated for a few hours of consulting, specifically for a 32B reasoner model.
AMD's Inference Optimization Guide Arrives: A member shared the AMD Instinct™ MI300X workload optimization document, detailing optimization strategies for MI300X accelerators focusing on GPU kernel programming, HPC, and deep learning with PyTorch.
- The document highlights auto-tunable configurations and advanced techniques like Triton kernel optimization.
DeepSeek-R1 Speeds Up with MI300X: A blog post was shared about unlocking DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU, highlighting performance comparisons to H200.
- Optimizations using SGLang have reportedly unlocked up to a 4X boost in inference speed in just two weeks, ensuring efficient scaling and lower latency.

Links mentioned:

GPU MODE ▷ #tilelang (1 messages):

leiwang1999_53585: worked on my h100, maybe you should install nightly wheel🤣

GPU MODE ▷ #self-promotion (5 messages):

GTC CUDA, Wen-mei Hwu GTC, Pruna AI Efficiency Framework, Ruff and UV for project management

CUDA Content Central at GTC: NVIDIA is highlighting CUDA developer sessions at GTC focused on tools and training for creating high-performance, GPU-accelerated applications.
- Attendees can explore sessions tailored to general AI, technical details, and business strategies, with attendance on a first-come, first-served basis.
Wen-mei Hwu: the Computing Legend Signs at GTC: Professor Wen-mei Hwu, author and NVIDIA scientist, will be at #GTC25 for an exclusive meet and greet to sign copies of his book.
- The GPUMODE event is scheduled for Sunday at 6 PM and Wednesday at 2 PM at CWE75384, and you can register for the CWE event here.
Pruna: New AI Efficiency Framework Released: The AI efficiency framework Pruna has been open-sourced, with technical details available on the GitHub repo.
- Users are encouraged to star the repo, spread the news, and install Pruna with pip install pruna to provide feedback.
Ruff and UV Simplification of dependencies: A user suggested switching to Ruff + uv for the Pruna project to simplify dependencies and improve project management.
- The user believes this change would greatly simplify the dependencies.

Links mentioned:

GPU MODE ▷ #🍿 (5 messages):

Distributed Training, Scaling Laws for DiLoCo, GPU kernel modifications

DiLoCo scaling better than DP: A post on X highlights a key step for making distributed training work at larger models, specifically Scaling Laws for DiLoCo.
- The author jokes that DiLoCo scaling better than DP is funny to me; It’s pure vibes LOL.
Nuances of GPU Kernel Optimization: A member had a horrible thought about slight modifications of a GPU kernel that aren't numerically equivalent but are more efficient in wall-clock time in a distributed context.
- They feel that problems like that pose a huge problem for automatic kernel optimization strategies.

Link mentioned: Tweet from Zachary Charles (@MatharyCharles): We just put out a key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCoTL;DR: We can do LLM training across datacenters in a way that scales incredibly wel...

GPU MODE ▷ #reasoning-gym (10 messages🔥):

Reasoning Gym, nano-R1 Project, Temporal Clue, Group Relative Policy Optimization (GRPO)

Reasoning Gym Reaches 101 Datasets!: The Reasoning Gym project now boasts 101 datasets, celebrating contributions from developers like Rich Jones and @jeankaddour.
- A user shared the X post announcing this milestone.
Nano-R1 Project Eyes Reasoning Gym: The nano-R1 project is seeking data to evaluate runs, with a suggestion to consider using reasoning-gym given existing benchmark scores.
- This suggestion was made in reference to this GitHub discussion about finding reasoning benchmarks.
Temporal Clue Puzzles Gym-Ready: A user shared a link to temporal-clue, Clue-inspired puzzles for testing LLM deduction abilities, suggesting it as food for the Reasoning Gym.
- The puzzles may be useful for testing deductive reasoning.
GRPO Beats Models on Temporal Clue: OpenPipe.ai achieved state of the art on temporal clue using Group Relative Policy Optimization (GRPO), surpassing R1, o1, o3-mini, and nearing Sonnet 3.7's performance while being 100x cheaper.
- They shared a training recipe built on top of torchtune used to achieve these results.

Links mentioned:

GPU MODE ▷ #active-leaderboards (1 messages):

Xavier Init, User ID Issue

Usernames Replaced by Mysterious User IDs: Some users are seeing User_<18 digit ID> instead of actual usernames, potentially due to a bug related to Xavier Init.
Ongoing username glitch: A glitch is causing some usernames to display as generic User_ID strings instead of actual names.

GPU MODE ▷ #general (15 messages🔥):

pip install in popcorn, Looking for GTC 2025 Ticket, Free B200 access, AMD Support coming

Popcorn lets Users Pip Install: Users can now pip install from a script in Popcorn, though long installations might timeout.
GTC 2025 Ticket Quest: A Silicon Valley resident is seeking a ticket to the sold-out GTC 2025 event.
- Another member quipped, "Sir, this is a Wendy's".
Free B200 Bonanza on Grayscale!: One B200 is available on the grayscale_py_b200-dev leaderboard for Grayscale, queue times may be slow due to only one device.
- Members are encouraged to "play with the B200 and deconstruct it however u want".
AMD support Coming Soon: AMD Support seems to be coming soon according to a screenshot "we're cooking something finally".

GPU MODE ▷ #submissions (29 messages🔥):

Leaderboard Submissions, Benchmark Submissions, Test Submissions, Modal Runners

Grayscale Tests Triumph on T4 and H100: Test submissions for the grayscale leaderboard, with IDs 2136 and 2143, succeeded on T4 and H100 GPUs using Modal runners.
Vectoradd Aces H100: Leaderboard submission with id 2151 to leaderboard vectoradd on GPUS: H100 using Modal runners succeeded!
- Modal Runners facilitated the successful vector addition benchmark on the high-performance H100 GPUs.

GPU MODE ▷ #status (1 messages):

Leaderboard cleanup, Robust Evaluation

Leaderboard Cleansing Commences: The community is now removing meme/hack entries from the leaderboard, and is asking users to submit their Discord username, filename, and rank if they'd like an entry deleted.
- At the same time, changes are being made to ensure the evaluation process is more robust against these kinds of entries.
Evaluation Process Bolstered: In parallel to the leaderboard cleanup, efforts are underway to make the evaluation process more robust against meme/hack entries.
- The goal is to prevent similar issues from arising in the future.

GPU MODE ▷ #hardware (1 messages):

NVIDIA thermal ranges, Arithmetic and Memory Bandwidth Degradation

Member seeks NVIDIA thermal ranges and degradation info: A member is looking for thermal ranges for different NVIDIA cards, especially info on arithmetic and memory bandwidth degradation with temperature.
- They cited the NVIDIA H100 product brief as a good source of information, hoping to find similar details for more cards.
Discussion on hardware thermal limits: The discussion revolved around finding detailed thermal specifications for NVIDIA cards, particularly regarding how temperature affects performance.
- The initial requestor shared the NVIDIA H100 product brief as an example of the kind of detailed information they were seeking for a wider range of cards.

Eleuther ▷ #general (10 messages🔥):

SMILES string encoding, Stereoisomer Generation, Free GPU Platforms, Managed Inference APIs, EleutherAI welcomes Catherine Arnett

Encoding SMILES strings into Stereoisomers: A member inquired about models or architectures that can encode a SMILES string into various stereoisomers or encode a ChemDraw input.
- The member is seeking a model that can pick up on chemical descriptors for their tasks.
Quest for Free GPU Platforms: A member is looking for a free GPU platform beyond notebooks, needing something with C++ support for local use with SSH.
- Notebooks only offer a Python interface, which is insufficient for their requirements.
Managed Inference API Services Explored: A member is seeking recommendations for managed Inference API services that small startups can use to host private models for training/finetuning LLMs.
- Another member suggested Featherless.ai which also supports existing LLMs from HF; it doesn't require managing individual hardware units.
EleutherAI Welcomes New NLP Researcher: EleutherAI welcomes Catherine Arnett, an NLP researcher specializing in Computational Social Science and cross-lingual NLP.
- Catherine's research focuses on ensuring models are equally good across languages, addressing data equivalence, performance measurement, and model building; see her recent work on Goldfish, Toxicity of the Commons, LM performance on complex languages and Multilingual Language Modeling.

Eleuther ▷ #research (46 messages🔥):

Block Diffusion, Globally Shared Experts, Mixture-of-Experts Universal Transformers, Tan et al.'s SUT paper, Visual Geometry Group (VGGT)

Block Diffusion Model Unveiled!: A new paper introduces Block Diffusion, a method interpolating between autoregressive and diffusion language models, combining the strengths of both: high quality, arbitrary length, KV caching, and parallelizability, detailed in the paper and code.
Exploring Globally Shared Experts in Deep Learning: Discussion arose about research into globally shared experts, where a single pool of experts is used across all layers, with a pointer to a relevant paper on diffusion models.
MoEUT: Mixture-of-Experts Universal Transformers Paper Mentioned: A member mentioned the MoEUT (Mixture-of-Experts Universal Transformers) paper as relevant to the discussion on globally shared experts, though they hadn't fully read it yet.
- Another member suggested checking out Tan et al.'s SUT paper as well for related insights.
VGGT Generates 3D Scenes!: A member shared VGGT, a feed-forward neural network inferring 3D attributes from multiple views and generating GLB files, which can be directly integrated into metaverses.
- The member tested VGGT on old stereo images and various scenes, finding it benefits from near-angle frames; however, it may struggle with scenes lacking a clear anchor angle, with the member stating I love that it exports GLB files. means I can drop them directly into my metaverse as-is.

Links mentioned:

Eleuther ▷ #lm-thunderdome (22 messages🔥):

Fewshot Split Fallback, Gen Kwargs to JSON, Old vs New LLM Leaderboard

Fewshot Split Fallback Scheme Unveiled: When a fewshot split isn't specified, the system falls back to train > val > test, prioritizing the training split if available.
- This order determines which split is used for evaluation if no specific split is defined.
Gen Kwargs Embraces JSON Format: The --gen_kwargs argument is transitioning from comma-separated strings to JSON, allowing for more complex configurations like '{"temperature":0, "stop":["abc"]}'.
- The discussion explores the possibility of supporting both formats for ease of use, especially for scalar values.
Old vs New LLM Leaderboard: Discrepancy Surfaces: A discrepancy is identified between the group config for the old LLM leaderboard and the actual setup used, particularly concerning the arc-challenge task.
- The openllm.yaml config specifies validation as the fewshot split, but the original leaderboard used the train split due to the absence of a fewshot split in the old fork's Python class, a PR to fix this was created to address this discrepancy.

Link mentioned: lm-evaluation-harness/lm_eval/tasks/benchmarks/openllm.yaml at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness

tinygrad (George Hotz) ▷ #general (30 messages🔥):

SDXL benchmarks, tensor cat speed, parallel BLAKE3, WebGPU integration, Bitonic Sort indices

Tinygrad SDXL Benchmarks Lag Torch: Benchmarking SDXL with tinygrad on a 7900 XTX shows 1.4 it/s with BEAM=2 on the AMD backend, while torch.compile with FlashAttention and TunableOp ROCm reaches 5.7 it/s.
- George Hotz suggested comparing kernels to identify optimization opportunities, aiming to surpass torch performance by year's end.
Tensor Cat Sluggish Despite Efforts: A member is working on improving tensor cat speed, sharing whiteboard thoughts on X (link), but notes it's still slow despite devectorizer changes.
- The member suspects issues with generated IR and loading numpy arrays, considering custom C/C++ via ELF and LLVM to overcome limitations.
BLAKE3 Bounty Status Clarified: The status of the High performance parallel BLAKE3 bounty was clarified, with a screenshot (link) showing the bounty status.
- The member updated the spreadsheet and specified that the asymptotic performance is a key requirement for the bounty.
WebGPU Integration Gets a Boost: A member inquired about publishing a Tinygrad implementation for an electron/photon classifier based on resnet18 as an example, and was directed to a PR for improving WebGPU integration.
- It was suggested to create a WebGPU demo hosted on GitHub Pages with weights on Hugging Face for free access and testing.
Bitonic Sort Indices Unlocked: During work on bitonic sort indices, a member figured out maxpool indices, noting that topk implementations are often sort-based.
- The code is correct and jitted speed is close to pytorch sort (sometimes faster), the member said, it involves a lot of kernels due to contiguity requirements.

Link mentioned: Tweet from vincent (@t0kenl1mit): Tried using compare for @tinygrad tensor cat but still its slow. Attached are my whiteboard thoughts on it. I think I might have to fight ELF and link in some custom C but it might be something el...

tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):

Print Debugging Tinygrad, Lazy Computation and Gradients, Reproducer Code for Debugging, Multiline Code Blocks

Print Debugging Dilemma in Tinygrad's Lazy Mode: A member is facing an assertion error with gradients while print-debugging intermediate tensor values in Tinygrad, despite using .detach().
- They are seeking a better method than threading the value out, due to issues with lazy computation not being idempotent.
Crafting Reproducer Code for Rapid Debugging: A member suggests creating a <= 10 line reproducer code to quickly iterate and debug.
- They recommended using an integrated debugger like VSCode with breakpoints and a debug console for experimenting and restarting.
Github Link: A member shared a link to a Github repo.
Multiline code blocks: A member gave advice on making multiline codeblocks by using triple backticks

Link mentioned: gsoc_2025/ML4SCI/task1 at main · kayo09/gsoc_2025: GSOC 2025! Happy Coding! ☀️. Contribute to kayo09/gsoc_2025 development by creating an account on GitHub.

LlamaIndex ▷ #blog (2 messages):

Agentic Reasoning System, Corrective RAG, LlamaExtract Public Beta

Agents Reason with Corrective RAG: A member shared a step-by-step tutorial by on how to build an agentic reasoning system for search and retrieval (specifically, corrective RAG) from scratch, orchestrated with the @llama_index workflows.
- The tutorial lets users orchestrate complex, customizable event-driven agents.
LlamaExtract Enters Public Beta: LlamaExtract is now in public beta and solves the common problem of extracting structured data from long, complex documents, offering a web UI and API.
- It allows users to define a schema and automatically extract structured data; more details can be found here.

LlamaIndex ▷ #general (31 messages🔥):

AI Agents Hackathon, Vertex Ventures US, CreatorsCorner, gguf fine tuning, LlamaIndex vs Pydantic AI

Calling All AI Agents to Hackathon!: Vertex Ventures US and CreatorsCorner invite the global AI community to turn bold ideas into action with an exclusive AI hackathon at NVIDIA GTC 2025.
- The hackathon challenges participants to craft an extraordinary multimodal AI agent capable of sophisticated reasoning, strategic decision-making, and interacting with various tools for a chance to win $50k+ in Prizes!
Pydantic vs LlamaIndex framework face-off: New users wonder about the difference between the Pydantic AI and LlamaIndex frameworks for building agents, especially which one to use as a beginner.
- A LlamaIndex team member stated that whatever fits your mental model of development best is probably the best bet - but also the LlamaIndex workflows are very nice.
Data Query Agent Stuck in Infinite Viz Loop: A user reported their data query agent gets stuck in an infinite loop after using the visualization tool, repeatedly calling the same tool.
- Another member asked whether the user was using an open-source or closed-source LLM, and theorized, Maybe the llm is not able to understand whether the task is finished or not.
LlamaExtract is on the Cloud: Members asked how to get access to LlamaExtract after seeing the GitHub repo's instructions to join the Discord.
- The LlamaIndex team responded that it's available on cloud.llamaindex.ai and that LlamaExtract runs on the cloud (the client is the open-source part).
Orchestrating Agents with Sequential Workflows: One user asked whether to use workflows or agents abstraction to build a set of agents in a linear, sequential fashion, without tethering the agent to a specific LLM provider such as Claude.
- A LlamaIndex team member responded with a pointer to manual tool-calling ability of the LLM classes.

Links mentioned:

LlamaIndex ▷ #ai-discussion (1 messages):

Vision-Language Models (VLMs), Multimodal Learning, GitHub Research Hub

Vision-Language Models Research Hub Opens: A member created a community-driven hub for multimodal researchers working on Vision-Language Models (VLMs).
- The author encourages contributions and suggestions, planning to update the hub on a weekly basis.
Call for Contributions to VLM Hub: The creator of the Vision-Language Model Hub on GitHub is actively seeking contributions from the community.
- They are open to suggestions and feedback, aiming to update the hub weekly to keep it a valuable resource for multimodal researchers.

Nomic.ai (GPT4All) ▷ #general (29 messages🔥):

Gemma 3 Integration in GPT4All, LocalDocs Crashing Fix, Gemma 3 Language Comprehension, Model license agreements

Gemma's Linguistic Prowess Surpasses Competitors in Multiple Languages: Members found that Gemma, DeepSeek R1, and Qwen2.5 models provided correct answers, in multiple languages, to the puzzle about what happens when you leave a closed jar outside at minus temperature.
- The other models predicted catastrophic jar failure, but Gemma provided more helpful, nuanced advice.
Gemma 3 Faces Integration Issue: Users eagerly await Gemma 3 support in GPT4All, but faces delays pending updates to Llama.cpp due to license agreement issues on Hugging Face, detailed in this GitHub issue.
- Some speculate on whether Google will police redistributions bypassing their license agreements.
LocalDocs Needs Crash Course Correction: A new user experienced LocalDoc collection loss after a crash and subsequent reinstall, seeking advice on how to prevent data loss after the next expected crash.
- Experienced users recommended regularly saving the localdocs file and restoring it after a crash, and stated that sometimes only one bad PDF can crash the system.
Level up O3-mini Explains Thinking Process: A user shared a prompt for O3-mini to explain its thinking process, suggesting this could improve distillation. It can be used for any model.
- The prompt uses thinking and reflection sections, with step-by-step reasoning and error checks.

Link mentioned: Gemma 3 support · Issue #3540 · nomic-ai/gpt4all: System Info I installed GPT4All, opened it, downloaded the Gemma3 Instruct for hugging face (tried two models https://huggingface.co/Mungert/gemma-3-12b-it-gguf https://huggingface.co/ggml-org/gemm...

Cohere ▷ #「💬」general (20 messages🔥):

Fine-tuning for Command A, Azure Cohere Rerank v3 Terraform, Support Channel for New Models, Channel for Private Deployments of CMD A

No Fine-Tuning for Command A Yet: A member inquired about the ETA for enabling fine-tuning for Command A on the Cohere platform, and a Cohere team member responded that there are no plans yet, but they will keep the community posted.
Azure Cohere Rerank v3 Terraform Troubles: A member encountered an error while trying to create an Azure Cohere Rerank v3 with terraform and shared the code snippet and error message.
- A Cohere team member moved the question to the <#1324436975436038184> channel to discuss it further.
Private Deployment Channel on Deck?: A member suggested creating a dedicated channel for discussions about private deployments of CMD A and other models, especially for efforts to get customers to deploy locally.
- Another member agreed that it's a great idea, and requested admin <@700025263379054675> can set up.
Support Channel's High Volume of Questions: A Cohere team member reminded the community to direct all support questions related to new models to the <#1324436975436038184> channel or via email at [email protected].
CMD-A is a fan favorite: A member stated Loving command-a, it’s a great model.

Cohere ▷ #【📣】announcements (1 messages):

Command A, Developer Office Hours, Enterprise-friendly features, Hardware vs performance

Cohere Announces March Developer Office Hours: Cohere is hosting Developer Office Hours to celebrate the launch of their newest model, Command A on March xx at 1 pm ET in the Stage channel.
- The session will cover what's new with Command A, enterprise-friendly features, hardware vs performance, and a live Q&A; more details can be found here.
Command A model launch: Cohere is launching the Command A model soon, and hosting office hours to celebrate.
- The office hours will cover many topics, including: what's new, enterprise friendly features, and a live Q&A.

Cohere ▷ #「🔌」api-discussions (3 messages):

Cohere Command A, Vercel SDK integration, Object generation support, Cohere API versioning

Vercel SDK Misses Cohere's Object Generation: A user reported that the Vercel SDK incorrectly assumes object generation is not supported by Cohere's Command A model.
- The user intends to flag this with Vercel, suggesting it may also warrant attention from the Cohere team.
SDK Implementation struggles with Cohere API versions: A user attempting to use the OpenAI SDK for Cohere in JavaScript encountered a warning related to the Cohere API versioning.
- The warning suggests setting an API version, as the current version is deprecated, despite the user setting both apiKey and baseUrl.
Cohere API Base URL Confusion clarified: A user shared that the correct base_url to use is https://api.cohere.com/compatibility/v1/chat/completions.
- This URL may resolve issues related to API compatibility and versioning when integrating Cohere with other platforms or SDKs.

Link mentioned: Cohere: Learn how to use the Cohere provider for the AI SDK.

Cohere ▷ #「🤖」bot-cmd (1 messages):

.paolo16: Hello

Cohere ▷ #「🤝」introductions (3 messages):

Introductions, Freelance programmers, Community Assistance

Freelance Programmer Introduces Himself: A 30-year-old Japanese male freelance programmer introduced himself, stating a willingness to help others through his programming skills.
- He emphasized that assisting one another is the pillar of our existence.
Welcoming New Community Members: The Discord server stickied a message thanking new members for joining the Cohere community.
- It prompted them to introduce themselves by providing their company/industry/university, current projects, favorite tech/tools, and goals for the community.

DSPy ▷ #general (13 messages🔥):

dspy/MCP Integration, DSPy Assertions / Suggestions removal, DSPy 2.6 Output Refinement, QdrantRM removal in 2.6

MCP Integration Dreams: A member inquired about integrating dspy/MCP, with another noting the need for an MCP host, client, and server, pondering if it overcomplicates things, linking to a relevant GitHub example.
DSPy Drops Assertions/Suggestions: A user noted the disappearance of documentation regarding Assertions / Suggestions in DSPy and inquired about their continued support.
- They were looking to validate the outputs of the response (formatting specifically) and observed instances where the LLM does not always adhere to the format.
Output Refinement as Assertion Alternative: In DSPy 2.6, Assertions were replaced by Output Refinement via modules like BestOfN and Refine, designed to improve prediction reliability and quality by making multiple LM calls with different parameter settings, as detailed in the DSPy documentation.
QdrantRM Questioned: A user asked if QdrantRM was removed in DSPy 2.6.

Links mentioned:

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Caiming Xiong, Multimodal Agents, Vision-Language-Action Alignment, OSWorld, AgentTrek

Salesforce's Caiming Xiong to Present on Multimodal Agents: Caiming Xiong, SVP of AI Research at Salesforce, will present a lecture on Multimodal Agents today at 4pm PDT, live-streamed on YouTube.
- The talk will cover integrating perception, grounding, reasoning, and action across multiple modalities to transform tasks like GUI automation and household robotics.
Multimodal Agents Landscape Explored: The lecture will explore measuring capabilities in realistic environments (OSWorld), creating large-scale datasets (AgentTrek), and designing advanced modeling architectures (Aguvis, Magma).
- It will also discuss incorporating synthetic chain-of-thought-and-action (TACO) for more robust vision-language-action alignment.
Caiming Xiong's Background: Caiming Xiong earned his Ph.D. in Computer Science from SUNY at Buffalo, specializing in areas such as natural language processing, computer vision, reinforcement learning, and deep learning.
- He has published more than 200 papers with >50,000 citations and served on the organizing committees of multiple workshops.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (7 messages):

Advanced LLM agent course enrollment, Course certification

Advanced LLM agent course still accepting!: Members inquired whether they can still sign up for the Advanced LLM agent course.
- The staff replied that you just need to complete the signup form!
Certificate still attainable!: Members inquired whether they can still get the certificate after signing up for the course.
- The staff replied that most of the info on that intro slide deck only applies to Berkeley students and that one can absolutely still enroll in the MOOC and earn a certificate at the end!

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (4 messages):

Self-reflection and self-refinement in LLMs, System prompts and LLM behavior

Self-Reflection Dichotomy Dilemma Discussed: A member highlighted a contradiction between Lecture 1, stating that self-reflection and self-refinement require external evaluation, and Lecture 2, suggesting LLMs can improve by rewarding their own outputs.
- Screenshots from Lecture 1, slide 67 and Lecture 2, slide 51 were attached to illustrate the apparent conflict. See image 1 and image 2.
System Prompt Reliability Questioned: A member suggested that while system prompts should work, relying on specific behaviors might not be robust, because all these at the end is text input, so the model can process it. You should be able to bypass the framework and service.
- They added that the training data looks like <system> You are a helpful assistant </system> <user> {{Some example user prompt}} </user> <assistant> {{Expected LLM output}} </assistant> and that frameworks may not reliably pass system prompts to all LLMs.

Modular (Mojo 🔥) ▷ #general (5 messages):

Modular AI Art, Discord Spam

Modular's AI Art Appreciated: A member expressed appreciation for the AI art used by Modular.
- They stated, "all the AI art that modular uses is great!"
Discord Spam Clarification: A member clarified that certain messages in the Discord channel were spam.
- Another member acknowledged the clarification with a thumbs up.

Modular (Mojo 🔥) ▷ #mojo (6 messages):

Compact Dict, SIMD, stdlib Dict

Compact Dict's Current Status: Members discussed the current status of the compact-dict implementation, noting that its original version might be outdated.
- It was suggested that most of the compact dict's functionality got upstreamed into the Dict in the stdlib.
stdlib Dict Performance Issues with SIMD: One user reported performance issues when using the stdlib Dict with SIMD [float64, 1] types.
- They were using the hash() function from the hash lib and found it to be slow, leading them to search for faster alternatives.

Link mentioned: GitHub - mzaks/compact-dict: A fast and compact Dict implementation in Mojo 🔥: A fast and compact Dict implementation in Mojo 🔥. Contribute to mzaks/compact-dict development by creating an account on GitHub.

MLOps @Chipro ▷ #events (2 messages):

AI4Legislation Competition, AI Demo Jam, Silicon Valley Chinese Association Foundation, Dnipro VC, Data Phoenix

AI4Legislation Competition Launches!: The Silicon Valley Chinese Association Foundation (SVCAF) is holding the AI4Legislation competition with prizes up to $3,000, running until July 31, 2025, encouraging open-source AI solutions for legislative engagement; the competition repo is now available.
- SVCAF will conduct an online seminar about the competition at the end of March 2025, featuring leaders in AI and legislation; RSVP here.
Get Jamming at the AI Demo Jam!: On March 20 in Sunnyvale, CA, Dnipro VC and Data Phoenix will be hosting AI Demo Jam, featuring 5 AI startups showcasing their products, expert panel discussions, open mic pitches, and high-energy networking.
- The panel will include Marianna Bonechi (Dnipro VC), Nick Bilogorskiy (Dnipro VC), Dmytro Dzhulgakhov (fireworks.ai); register here.

Links mentioned:

MLOps @Chipro ▷ #general-ml (2 messages):

AI4Legislation competition, object detection in MRI

AI4Legislation Competition Launch: The Silicon Valley Chinese Association Foundation is holding the AI4Legislation competition until July 31, 2025, encouraging open-source AI solutions for citizen engagement in the legislative process.
- Prizes range from $1,000 to $3,000, and you can find more details in the competition's GitHub repository and RSVP for the seminar here.
Community call for MRI Object Detection: A member requested help to create a model for object detection in MRI images without monetary compensation.
- No specific details were provided on the type of model, data availability, or use case.

Link mentioned: March AI4Legislation Seminar RSVP: Thank you for your interest in SVCAF's AI4Legislation seminar!Silicon Valley Chinese Association Foundation (incorporated in 2015) is holding a competition this summer to develop open-source AI-dr...

AI21 Labs (Jamba) ▷ #jamba (2 messages):

Qdrant

Qdrant Request Denied: A member suggested switching to Qdrant, but another member confirmed that they are not currently using it.
- The conversation provides no further context on the reasons for not using Qdrant or potential future considerations.
No Qdrant Here!: A user inquired about changing a system to use Qdrant, a vector database.
- However, another user firmly stated, No we are not using Qdrant, putting an end to the suggestion without further explanation.

AI21 Labs (Jamba) ▷ #general-chat (2 messages):

API Feature Requests, Repetition Penalty

API Repetition Penalty Support Requested: A user requested the addition of repetition penalty support to the API, indicating it's a key feature preventing wider adoption.
- The user stated that the lack of repetition penalty support is the only limiting factor for their increased usage of the model.
Repetition Penalty as Key Adoption Hurdle: The user emphasized that the absence of repetition penalty functionality is the primary obstacle preventing them from utilizing the model more extensively.
- No additional context or alternative solutions were discussed in the provided message.

Torchtune ▷ #general (1 messages):

yamashi: https://mistral.ai/news/mistral-small-3-1

Torchtune ▷ #papers (2 messages):

Learnable Scalars, Mitigating Issues in Models, Model Convergence

Learnable Scalars Mitigate Model Issues: A user shared a link to a paper Mitigating Issues in Models with Learnable Scalars.
- The author of the message also noted that the issue is mitigated by incorporating a learnable scalar, and the model can converge normally.
Model Convergence Improved: The learnable scalar helps the model converge normally.
- This suggests a practical approach to stabilizing training.

Link mentioned: Transformers without Normalization | alphaXiv: View 1 comments: Awesome work!Transformers without Normalization podcast

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}