AI News for 3/14/2025-3/15/2025. We checked 7 subreddits, 433 Twitters and 28 Discords (222 channels, and 2399 messages) for you. Estimated reading time saved (at 200wpm): 240 minutes. You can now tag @smol_ai for AINews discussions!

Happy 2nd birthday to GPT4 and Claude 1. Few would have guessed the tremendous market share shifts that have happened in the past year.

SPECIAL NOTE: We are launching the 2025 State of AI Engineering Survey today in preparation for the AI Eng World's Fair in Jun 3-5. Please fill it out to have your voice heard!

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Language Models and Model Updates

Google's Gemini 2.0 Updates and Features: @jack_w_rae announced improved Google Deep Research due to product development and underlying model updating from 1.5 Pro to 2.0 Flash Thinking. The Gemini app is launching improvements, including an upgraded Flash Thinking model with stronger reasoning, deeper app integration, Deep Research, and personalization @jack_w_rae. Additionally, @jack_w_rae noted the team's progress in creating native image generation for Gemini 2, highlighting its difference from text-to-image models.
Cohere's Command A Model: @ArtificialAnlys reported that Cohere launched Command A, a 111B parameter dense model with an Artificial Analysis Intelligence Index of 40, close to OpenAI’s latest GPT-4o. The model has a 256K context window, a speed of 185 tokens/s, and is priced at $2.5/$10 per million input/output tokens. It is available on Hugging Face for research and commercially with a license from Cohere.
Meta's Dynamic Tanh (DyT): @TheTuringPost reported that Meta AI proposed Dynamic Tanh (DyT) as a replacement for normalization layers in Transformers, which works just as well or better without needing extra calculations or tuning and works for images, language, supervised learning, and self-supervised learning. Yann LeCun also announced the same thing on Twitter.
Alibaba's QwQ-32B: @DeepLearningAI highlighted Alibaba's QwQ-32B, a 32.5-billion-parameter language model excelling in math, coding, and problem-solving. Fine-tuned with reinforcement learning, it rivals larger models like DeepSeek-R1 and outperforms OpenAI’s o1-mini on benchmarks. The model is freely available under the Apache 2.0 license.
Google's Gemma 3 Models: @GoogleDeepMind announced the release of Gemma 3, available in sizes from 1B to 27B, featuring a 128K token context window and supporting over 140 languages @GoogleDeepMind. It also announced the ShieldGemma 2, a 4B image safety checker built on the Gemma 3 foundation @GoogleDeepMind. @ArtificialAnlys benchmarked Gemma 3 27B with an Artificial Analysis Intelligence Index of 38, noting its strengths include a permissive commercial license, vision capability, and memory efficiency, while not being competitive with larger models like Llama 3.3 70B or DeepSeek V3 (671B). @sirbayes noted that Gemma 3 is best in class for a VLM that runs on 1 GPU.

Model Performance and Benchmarking

Leaderboard Lore and History: @_lewtun shared the origin story of the Hugging Face LLM leaderboard, highlighting contributions from @edwardbeeching, @AiEleuther, @Thom_Wolf, @ThomasSimonini, @natolambert, @abidlabs, and @clefourrier. The post emphasizes the impact of small teams, early releases, and community involvement. @clefourrier added to this, noting that @nathanhabib1011 and they were working on an internal evaluation suite when the leaderboard went public, leading to industrializing the code.
GPU Benchmarks and CPU Overhead: @dylan522p expressed their appreciation for GPU benchmarks that measure CPU overhead, such as vLLM and KernelBench.
Tic-Tac-Toe as a Benchmark: @scaling01 stated they are a LLM bear until GPT-5 is released, citing that GPT-4.5 and o1 can't even play tic-tac-toe consistently and @scaling01 argued that if LLMs can't play tic-tac-toe despite seeing millions of games, they shouldn't be trusted for research or business tasks.
Evaluation Scripts for Reasoning Models: @Alibaba_Qwen announced a GitHub repo providing evaluation scripts for testing the benchmark performance of reasoning models and reproducing reported results for QwQ.

AI Applications and Tools

AI-Assisted Coding and Prototyping: @NandoDF supports the idea that it's a great time to learn to code, as coding is more accessible due to AI copilots, potentially leading to a wave of entrepreneurship. This sentiment was echoed by @AndrewYNg, noting that AI and AI-assisted coding have reduced the cost of prototyping.
Agentic AI in IDEs: @TheTuringPost introduced Qodo Gen 1.0, an IDE plugin by @QodoAI that embeds agentic AI into JetBrains and VS Code, using LangGraph by LangChain and MCP by Anthropic.
Integration of Gemini 2.0 with OpenAI Agents SDK: @_philschmid announced a one-line code change to use Gemini 2.0 with the OpenAI Agents SDK.
LangChain's Long-Term Agentic Memory Course: @LangChainAI and @DeepLearningAI announced a new DeepLearningAI course on Long-term Agentic Memory with LangGraph, taught by @hwchase17 and @AndrewYNg, focusing on building agents with semantic, episodic, and procedural memory to create a personal email assistant.
UnslothAI Updates: @danielhanchen shared updates for UnslothAI, including support for full fine-tuning + 8bit, nearly any model like Mixtral, Cohere, Granite, Gemma 3, no more OOMs for vision finetuning, further VRAM usage reduction, speedup boost for 4-bit, Windows support, and more.
Perplexity AI on Windows: @AravSrinivas announced the Perplexity App is now available in the Windows and Microsoft App Store, with voice-to-voice mode coming soon.
HuggingSnap on TestFlight: @mervenoyann announced that HuggingSnap, an offline vision LM for phones built by @pcuenq and @cyrilzakka, is available on TestFlight, seeking feedback for further development.
New Trends in Machine Translation: @_akhaliq highlighted a paper on New Trends for Modern Machine Translation with Large Reasoning Models.
Microsoft and Shopify: @MParakhin announced Shopify has acquired VantageAI.

AI and Hardware

AMD's Radeon GPUs Support on Windows: @dylan522p reported on AMD's @AnushElangovan discussing making Radeon GPUs first-class citizens on Windows at the RoCm User meetup, with support for multiple GPU architectures and a focus on CI and constant shipping.
MLX LM New Home: @awnihannun announced that MLX LM has a new home.

AI Conferences and Events

AI Dev 25 Conference: @AndrewYNg kicked off AI Dev 25 in San Francisco, noting that agents are the most exciting topic for AI developers. The conference included talks from Google's Bill Jia @AndrewYNg, Meta’s Chaya Nayak @AndrewYNg, and a panel discussion on building AI applications in 2025 @AndrewYNg. @DeepLearningAI shared a takeaway from Nebius' Roman Chernin emphasizing solving real-world problems, and @AndrewYNg highlighted a tip from Replit’s @mattppal on debugging by understanding the LLM's context.
GTC Fireside Chat: @ylecun announced they would be doing a fireside chat at GTC with Nvidia chief scientist Bill Dally on Tuesday next week.
Interrupt Conference: @LangChainAI promoted the Interrupt conference, listing its sponsors, including CiscoCX, TryArcade, Box, and others @LangChainAI.
Khipu AI in Santiago, Chile: @sirbayes shared their talk on Sequential decision making using online variational bayes at @Khipu_AI in Santiago, Chile. @sarahookr mentioned that the museum was really curious why their top item to see was the khipu.

Other

The value of open-source models: @Teknium1 expressed concern that banning Chinese models from Americans won't slow down their progress, and that not having access to the full range of models will make the US fall off.
AI and Film Making: @c_valenzuelab discussed the divergent qualities of AI video generation, allowing for creative impulses and exploration of unexpected moments, unconstrained by physical limitations.
The future of software: @c_valenzuelab speculates on the future of major public software companies, suggesting that companies focused on features and complex interfaces are at risk because the new software stack is intention-driven.
Team Size: @scottastevenson made the argument that small teams are winning, and that clinging to old big team culture may be damaging for your career.

Humor/Memes

"Everything is Transformer": @AravSrinivas simply stated, “everything is transformer” with a picture of a transformer.
"Our top technology at Midjourney is Domain Not Resolving": @DavidSHolz joked that Midjourney’s top technology is "Domain Not Resolving", seeking someone with at least 6 years of experience in the domain.
"One million startups must perish": @andersonbcdefg said "one million startups must perish".
"I will vibe edit human genome on a PlayStation 2": @fabianstelzer posted "I will vibe edit human genome on a PlayStation 2".

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Gemma 3 Fine-Tuning Revolution: Performance and Efficiency in Unsloth

Gemma 3 Fine-tuning now in Unsloth - 1.6x faster with 60% less VRAM (Score: 172, Comments: 36): Unsloth now allows fine-tuning of Gemma 3 (12B) with 1.6x faster performance and 60% less VRAM usage compared to Hugging Face + FA2, fitting models like the 27B in a 24GB GPU. The platform fixes issues such as infinite exploding gradients on older GPUs and double BOS tokens, and supports a broad range of models and algorithms, including full fine-tuning and Dynamic 4-bit quantization. For more details, visit their blog and access their Colab notebook for free fine-tuning.
- Users express enthusiasm for Unsloth's advancements, particularly the support for full fine-tuning and the potential for 8-bit fine-tuning. Danielhanchen confirms that all methods, including 4-bit, 8-bit, and full fine-tuning, will be prioritized, and mentions the possibility of adding torchao for float8 support.
- There is interest in a more user-friendly interface, with requests for a webUI for local running to simplify usage. Few_Painter_5588 predicts that Unsloth will become the primary toolset for LLM fine-tuning.
- FullDeer9001 shares positive feedback on running Gemma3 on Radeon XTX with 8k context, highlighting VRAM usage and prompt statistics, and compares it favorably to Deepseek R1. Users discuss the idea of optimizing the 12B model for 16GB RAM to enhance performance.

Theme 2. Sesame CSM 1B Voice Cloning: Expectations vs. Reality

Sesame CSM 1B Voice Cloning (Score: 216, Comments: 30): Sesame CSM 1B is a newly released voice cloning model. No additional details were provided in the post.
- Voice Cloning Model Licensing and Usage: There is a discussion about the licensing differences between Sesame (Apache licensed) and F5 (Creative Commons Attribution Non Commercial 4.0), highlighting that Sesame can be used for commercial purposes. Users also mention the integration of voice cloning into a conversational speech model (CSM) as a potential advancement.
- Performance and Compatibility Issues: Users report slow performance of the voice cloning model, taking up to 50 seconds for a full paragraph on a GPU, and note that it may not be optimized for Windows. There are suggestions that it might work better on Linux and that running it on a mini PC without a dedicated GPU could be challenging due to the "experimental" triton backend for CPU.
- Technical Adjustments and API Access: Chromix_ shares steps to get the model working on Windows by upgrading to torch 2.6 and other packages, and mentions bypassing the need for a Hugging Face account by downloading files from a mirror repo. They also provide a link to the API endpoint for voice cloning.
Conclusion: Sesame has shown us a CSM. Then Sesame announced that it would publish... something. Sesame then released a TTS, which they obviously misleadingly and falsely called a CSM. Do I see that correctly? (Score: 154, Comments: 51): Sesame's Controversy revolves around their misleading marketing strategy, where they announced a CSM but released a TTS instead, falsely labeling it a CSM. The issue could have been mitigated if Sesame had clearly communicated that it wouldn't be open source.
- Misleading Marketing Strategy: Many users express disappointment with Sesame's marketing tactics, noting that the company created significant hype by suggesting an open-source release, only to deliver a less impressive product. VC-backed companies often use such strategies to gauge product-market fit and generate investor interest, as seen with Sesame's lead investor a16z.
- Technical Challenges and Model Performance: There's a consensus that the released 1B model is underwhelming in performance, particularly in real-time applications. Users discuss technical aspects, such as the Mimi tokenizer and the model's architecture, which contribute to its slow speed, and suggest optimizations like using CUDA graphs or alternative models like exllamav2 for better performance.
- Incomplete Product Release: Discussions highlight that Sesame's release lacks crucial components of the demo pipeline, such as the LLM, STT, and VAD, forcing users to build these themselves. The demo's impressive performance contrasts with the actual release, raising questions about the demo's setup possibly using larger models or more powerful hardware like 8xH100 nodes.

Theme 3. QwQ's Rise: Dominating Benchmarks and Surpassing Expectations

QwQ on LiveBench (update) - is better than DeepSeek R1! (Score: 256, Comments: 117): QwQ-32b from Alibaba surpasses DeepSeek R1 on LiveBench, achieving a global average score of 71.96, compared to DeepSeek R1's 71.57. QwQ-32b consistently outperforms in subcategories like Reasoning, Coding, Mathematics, Data Analysis, Language, and IF Average, as highlighted in the comparison table.
- There is skepticism about the QwQ-32b's performance compared to DeepSeek R1, with some users noting that Alibaba tends to optimize models for benchmarks rather than real-world scenarios. QwQ-32b is highlighted as a strong model, but there are doubts about its stability and real-world knowledge compared to R1.
- Coding performance is a contentious point, with users questioning how QwQ-32b approaches Claude 3.7 in coding capabilities. Discussions mention that LiveBench primarily tests in Python and JavaScript, while Aider tests over 30 languages, suggesting potential discrepancies in testing environments.
- Some users express excitement about the potential of QwQ-max, with anticipation that it might surpass R1 in both size and performance. There are also discussions on the impact of settings changes on the model's performance, with links provided for further insights (Bindu Reddy's tweet).
Qwq-32b just got updated Livebench. (Score: 130, Comments: 73): QwQ 32B has been updated on LiveBench, providing new insights into its performance. The full results can be accessed through the Livebench link.
- The QwQ 32B model is praised for its local coding capabilities, with some users noting it surpasses larger models like R1 in certain tasks. Users have discussed adjusting the model's thinking time by tweaking settings such as the logit bias for the ending tag, and some have experimented with recent updates to resolve issues like infinite looping.
- Discussions highlight the evolving power of smaller models like QwQ 32B, with users noting their increasing potential for local applications compared to larger flagship models. Some users express surprise at the model's creative capabilities and its ability to perform well on benchmarks, leading to decisions like dropping OpenAI subscriptions.
- There is a debate on the implications of open-sourcing models, with some users suggesting that China's strategy of open-sourcing models accelerates development, contrasting with the U.S. approach focused on corporate profit. Concerns are raised about the future of open-source availability, especially if competitive advantages shift.
Meme i made (Score: 982, Comments: 55): The post titled "Meme i made" lacks detailed content as it only mentions a meme creation related to the QwQ Model's Thinking Process. No additional information or context about the video or the meme is provided, making it difficult to extract further technical insights.
- Discussions highlight the QwQ Model's tendency to doubt itself, leading to inefficient token usage and prolonged response times. This behavior is likened to "fact-checking" itself excessively, which some users find inefficient compared to traditional LLMs.
- There's a consensus that current reasoning models, like QwQ, are in their early stages, akin to GPT-3's initial release, with expectations for significant improvements in their reasoning capabilities over the next year. Users anticipate a shift towards reasoning in latent space, which could enhance efficiency by a factor of 10x.
- Humorous and critical commentary highlights the model's repetitive questioning and self-doubt, drawing parallels to outdated technology and sparking discussions about the potential for these models to improve in handling complex reasoning tasks without excessive self-questioning.

Theme 4. Decentralized LLM Deployment: Akash, IPFS & Pocket Network Challenges

HowTo: Decentralized LLM on Akash, IPFS & Pocket Network, could this run LLaMA? (Score: 229, Comments: 20): The post titled "HowTo: Decentralized LLM on Akash, IPFS & Pocket Network, could this run LLaMA?" suggests deploying a decentralized Large Language Model (LLM) using Akash, IPFS, and Pocket Network. It questions the feasibility of running LLaMA, a specific LLM, on this decentralized infrastructure, implying a focus on leveraging decentralized technologies for AI model deployment.
- Concerns about Security and Privacy: Users question the cryptographic verification process of Pocket Network, expressing doubts about ensuring the correct model is served and the privacy of prompts. There are concerns about whether user data, such as IP addresses, might be logged and how the network handles latency for anonymity.
- Challenges of Decentralized Infrastructure: Commenters highlight the technical challenges of running LLMs in a decentralized manner, especially the need for high bandwidth and low latency between nodes, which currently limits the feasibility of distributed LLM deployment compared to single-machine setups.
- Decentralization vs. Centralization: The discussion contrasts Pocket Network's API relay role with centralized AI hosting, noting that while Pocket does not run the model itself, the use of Akash for model hosting offers benefits like resilience and potential cost savings, despite adding complexity with a crypto layer.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Advanced AI Video Generation with SDXL, Wan2.1, and Long Context Tuning

Another video aiming for cinematic realism, this time with a much more difficult character. SDXL + Wan 2.1 I2V (Score: 1018, Comments: 123): This post discusses the creation of a video aimed at achieving cinematic realism using SDXL and Wan 2.1 I2V. It highlights the challenge of working with a more difficult character in this context.
- Technical Challenges and Techniques: Parallax911 shares the complexity of achieving cinematic realism with SDXL and Wan 2.1 I2V, highlighting the use of Photopea for inpainting and compositing in Davinci Resolve. They mention the difficulty in achieving consistency and realism, especially with complex character designs, and the use of Blender for animating segments like the door opening.
- Project Costs and Workflow: The project incurred a cost of approximately $70 using RunPod's L40S at $0.84/hr, taking about 80 hours of GPU time. Parallax911 utilized a workflow involving RealVisXL 5.0, Wan 2.1, and Topaz Starlight for upscaling, with scenes generated at 61 frames, 960x544 resolution, and 25 steps.
- Community Feedback and Suggestions: The community praised the atmospheric storytelling and sound design, with specific feedback on elements like water droplet size and the need for a tutorial. Some users suggested improvements, such as better integration of AI and traditional techniques, and expressed interest in more action-oriented scenes with characters like Samus Aran from Metroid.
Video extension in Wan2.1 - Create 10+ seconds upscaled videos entirely in ComfyUI (Score: 123, Comments: 23): The post discusses a highly experimental workflow in Wan2.1 using ComfyUI for creating upscaled videos, achieving approximately 25% success. The process involves generating a video from the last frame of an initial video, merging, upscaling, and frame interpolation, with specific parameters like Sampler: UniPC, Steps: 18, CFG: 4, and Shift: 11. More details can be found in the workflow link.
- Users are inquiring about the aspect ratio handling in the workflow, questioning if it's automatically set or needs manual adjustment for input images.
- There is positive feedback from users interested in the workflow, indicating anticipation for such a solution.
- Concerns about blurriness in the second half of clips were raised, with suggestions that it might be related to the input frame quality.
Animated some of my AI pix with WAN 2.1 and LTX (Score: 115, Comments: 10): The post discusses the creation of animated AI videos using WAN 2.1 and LTX. Without further context or additional details, the focus remains on the tools used for animation.
- Model Usage: LTX was used for the first clip, the jumping woman, and the fighter jet, while WAN was used for the running astronaut, the horror furby, and the dragon.
- Hardware Details: The videos were generated using a rented cloud computer from Paperspace with an RTX5000 instance.

Theme 2. OpenAI's Sora: Transforming Cityscapes into Dystopias

OpenAI's Sora Turns iPhone Photos of San Francisco into a Dystopian Nightmare (Score: 931, Comments: 107): OpenAI's Sora is a tool that transforms iPhone photos of San Francisco into images with a dystopian aesthetic. The post likely discusses the implications and visual results of using AI to alter real-world imagery, although specific details are not available due to the lack of text content.
- Several commenters express skepticism about the impact of AI-generated dystopian imagery, with some suggesting that actual locations in San Francisco or other cities already resemble these dystopian visuals, questioning the need for AI alteration.
- iPhone as the device used for capturing the original images is a point of contention, with some questioning its relevance to the discussion, while others emphasize its importance in understanding the image source.
- The conversation includes a mix of admiration and concern for the AI's capabilities, with users expressing both astonishment at the technology and anxiety about distinguishing between AI-generated and real-world images in the future.
Open AI's Sora transformed Iphone pics of San Francisco into dystopian hellscape... (Score: 535, Comments: 58): OpenAI's Sora has transformed iPhone photos of San Francisco into a dystopian hellscape, showcasing its capabilities in altering digital images to create a futuristic, grim aesthetic. The post lacks additional context or details beyond this transformation.
- Commenters draw parallels between the dystopian images and real-world locations, with references to Delhi, Detroit, and Indian streets, highlighting the AI's perceived biases in interpreting urban environments.
- There are concerns about AI's text generation capabilities, with one commenter noting that sign text in the images serves as a tell-tale sign of AI manipulation.
- Users express interest in the process of creating such images, with a request for step-by-step instructions to replicate the transformation on their own photos.

Theme 3. OpenAI and DeepSeek: The Open Source Showdown

I Think Too much insecurity (Score: 137, Comments: 58): OpenAI accuses DeepSeek of being "state-controlled" and advocates for bans on Chinese AI models, highlighting concerns over state influence in AI development. The image suggests a geopolitical context, with American and Chinese flags symbolizing the broader debate over state control and security in AI technologies.
- The discussion highlights skepticism over OpenAI's claims against DeepSeek, with users challenging the notion of state control by pointing out that DeepSeek's model is open source. Users question the validity of the accusation, with calls for proof and references to Sam Altman's past statements about the lack of a competitive moat for LLMs.
- DeepSeek is perceived as a significant competitor, managing to operate with lower expenses and potentially impacting OpenAI's profits. Some comments suggest that DeepSeek's actions are seen as a form of economic aggression, equating it to a declaration of war on American interests.
- There is a strong undercurrent of criticism towards OpenAI and Sam Altman, with users expressing distrust and dissatisfaction with their actions and statements. The conversation includes personal attacks and skepticism towards Altman's credibility, with references to his promises of open-source models that have not materialized.
Built an AI Agent to find and apply to jobs automatically (Score: 123, Comments: 22): An AI agent called SimpleApply automates job searching and application processes by matching users' skills and experiences with relevant job roles, offering three usage modes: manual application with job scoring, selective auto-application, and full auto-application for jobs with over a 60% match score. The tool aims to streamline job applications without overwhelming employers and is praised for finding numerous remote job opportunities that users might not discover otherwise.
- Concerns about data privacy and compliance were raised, with questions on how SimpleApply handles PII and its adherence to GDPR and CCPA. The developer clarified that they store data securely with compliant third parties and are working on explicit user agreements for full compliance.
- Application spam risks were discussed, with suggestions to avoid reapplying to the same roles to prevent being flagged by ATS systems. The developer assured that the tool only applies to jobs with a high likelihood of landing an interview to minimize spam.
- Alternative pricing strategies were suggested, such as charging users only when they receive callbacks via email or call forwarding. This approach could potentially be more attractive to unemployed users who are hesitant to spend money upfront.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Google's Gemma 3 Takes Center Stage Across Tools

Unsloth Supercharges Gemma 3 Finetuning, Vision Too: Unsloth AI now boasts full support for Gemma 3, enhancing finetuning speeds by 1.6x, slashing VRAM usage by 60%, and expanding context length 6x compared to standard Flash Attention 2 setups on 48GB GPUs. Optimized versions for full finetuning, 8-bit, and pretraining are available on Hugging Face, and initial support for Gemma 3 vision is also implemented, though Ollama users might face compatibility issues for now.
Gemma 3 12B Outsmarts Qwen, Needs GPT4All Update: Users reported Gemma 3 12B outperforming Qwen 14B and 32B in personal tests and excelling in multilingual question answering, yet GPT4All requires updates for full Gemma 3 12B support due to architectural shifts and the need for an mmproj file. In a basic physics test, Gemma-3-12b correctly predicted jar shattering when water freezes, unlike DeepSeek-R1.
vLLM and LigerKernel Gear Up for Gemma 3 Integration: vLLM is actively working on Gemma 3 support, tracked in this GitHub issue, while a draft implementation of Gemma 3 into LigerKernel is underway, noting high architectural similarity to Gemma 2 with minor RMSNorm call differences; however, some users are reporting context window size issues with Gemma3 and TGI.

Theme 2. New Models Emerge: OLMo 2, Command A, Jamba 1.6, PaliGemma 2 Mix

AI2's OLMo 2 32B Shines as Open-Source GPT-3.5 Killer: AI2 launched OLMo 2 32B, a fully open-source model trained on 6T tokens using Tulu 3.1, which it claims outperforms GPT3.5-Turbo and GPT-4o mini on academic benchmarks, while costing only one-third of Qwen 2.5 32B training; available in 7B, 13B, and 32B sizes, it is now available on OpenRouter and sparking discussion in Yannick Kilcher's community about its open nature and performance.
Cohere's Command A and AI21's Jamba 1.6 Models Arrive with Massive Context: Cohere unveiled Command A, a 111B parameter open-weights model with a 256k context window, designed for agentic, multilingual, and coding tasks, while AI21 released Jamba 1.6 Large (94B active parameters, 256K context) and Jamba 1.6 Mini (12B active parameters), both now featuring structured JSON output and tool-use, all models available on OpenRouter. However, Command A is exhibiting a peculiar bug with prime number queries, and local API performance is reportedly suboptimal without specific patches.
Google's PaliGemma 2 Mix Family Unleashes Vision-Language Versatility: Google released PaliGemma 2 Mix, a vision language model family in 3B, 10B, and 28B sizes, with 224 and 448 resolutions, capable of open-ended vision language tasks and document understanding, while Sebastian Raschka reviewed multimodal models including Meta AI's Llama 3.2 in a blog post; users in HuggingFace are also seeking open-source alternatives to Gemini 2.0 Flash with similar image editing capabilities.

Theme 3. Coding Tools and IDEs Evolve with AI Integration

Cursor IDE Users Cry Foul Over Performance and Claude 3.7 Downgrade: Cursor IDE faces user backlash for lag and freezing on Linux and Windows after updates like 0.47.4, with Claude 3.7 deemed dumb as bricks and rule-ignoring, costing double credits, and the Cursor agent criticized for spawning excessive terminals; despite issues, v0 remains praised for rapid UI prototyping, contrasting with Cursor's credit system and limited creative freedom compared to v0.
Aider and Claude Team Up, Users Debate Rust Port and MCP Server Setup: Users laud the powerful combination of Claude with Aider, augmented with web search and bash scripting, while discussions on porting Aider to Rust for speedier file processing are met with skepticism, citing LLM API bottlenecks; a user-improved readme for the Aider MCP Server emerged, yet setup complexities persist, and Linux users are finding workarounds to run Claude Desktop.
'Vibe Coding' Gains Momentum, Showcased in Game Dev and Resource Lists: The concept of "vibe coding"—AI-assisted collaborative coding—is gaining traction, exemplified by a developer creating a multiplayer 3D game 100% with AI in 20 hours for 20 euros using Cursor, and Awesome Vibe Coding, a curated list of AI coding tools and resources, has been released on GitHub, and a GitDoc VS Code extension for auto-committing changes is gaining popularity, sparking UI design ideas for "vibe coding" IDEs with visualized change trees.

Theme 4. Training and Optimization Techniques Advance

Unsloth Pioneers GRPO for Reasoning Models, Dynamic Quantization for Speed: Unsloth introduces GRPO (Guiding Preference Optimization), enabling 10x longer context with 90% less VRAM for reasoning models, and highlights dynamic quantization outperforming GGUF in quality, especially for Phi-4, showcased on the Hugging Face leaderboard, while Triton bitpacking achieves massive speedups up to 98x over Pytorch, reducing Llama3-8B repacking time from 49 sec to 1.6 sec.
DeepSeek's Search-R1 Leverages RL for Autonomous Query Generation, IMM Promises Faster Sampling: DeepSeek's Search-R1 extends DeepSeek-R1 with reinforcement learning (RL) to generate search queries during reasoning, using retrieved token masking for stable training and enhanced LLM rollouts, while Inductive Moment Matching (IMM) emerges as a novel generative model class promising faster inference via one- or few-step sampling, surpassing diffusion models without pre-training or dual-network optimization.
Reasoning-Gym Explores GRPO, veRL, and Composite Datasets for Enhanced Reasoning: Group Relative Policy Optimization (GRPO) gains popularity for RL in LLMs, with reasoning-gym confirming veRL training success for chain_sum and exploring composite datasets for improved reasoning capabilities, moving towards a refactor for enhanced "all-around" model performance, and the project nears 500 stars, with version 0.1.16 uploaded to pypi.

Theme 5. Infrastructure and Access: H100s, VRAM, and API Pricing

SF Compute Disrupts H100 Market with Low Prices, Vultr Enters Inference API Space: SF Compute offers surprisingly low H100 rental prices, especially for short-term use, advertising 128 H100s available hourly and launching an additional 2,000 H100s soon, while Vultr announces inference API pricing at $10 for 50 million output tokens initially, then 2 cents per million, accessible via an OpenAI-compatible endpoint, stemming from a large GH200 purchase.
LM Studio Users Dive into Runtime Retrieval and Snapdragon Compatibility: LM Studio users are reverse-engineering the application to find download URLs for offline runtimes, after claims it runs offline, discovering CDN 'APIs' like Runtime Vendor, while Snapdragon X Plus GPU support in LM Studio requires direct llama.cpp execution, and users report Gemini Vision limitations potentially due to geo-restrictions in Germany/EU.
VRAM Consumption Concerns Rise: Gemma 3 and SFT Discussed: Users report increased VRAM usage for Gemma 3 post-vision update, speculating CLIP integration as a cause, and Gemma's SFT VRAM needs are debated, suggesting potentially higher requirements than Qwen 2.5 in similar conditions, while resources for estimating memory usage for LLMs are shared, like Substratus AI blog and Hugging Face space.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

Unsloth's Gemma 3 Support Gains Steam: Unsloth now supports Gemma 3, including full fine-tuning and 8-bit, and optimizes Gemma 3 (12B) finetuning by 1.6x, reduces VRAM usage by 60%, and extends context length by 6x compared to environments using Flash Attention 2 on a 48GB GPU.
- All Gemma 3 model uploads are available on Hugging Face, including versions optimized for full finetuning, 8-bit, and pretraining.
Dynamic Quants Face Off GGUF Quality: Discussion compares dynamic quantization with GGUF models, especially regarding the trade-offs between size and quality, with Unsloth's dynamic quants for Phi-4 on the Hugging Face leaderboard.
- A direct comparison with GGUF benchmarks is anticipated to clarify performance at different bit widths, with a likely holdup being llama-server lacking vision support yet.
GRPO to Grant Reasoning Greatness: GRPO (Guiding Preference Optimization) is coming next week along with new notebooks, and now supports 10x longer context with 90% less VRAM, detailed in a blog post.
- The team stated, only if you let it reason about the rules first a la GRPO, which is specifically designed for reasoning models, offering significant memory savings and expanded context windows.
Vision Models Get Unsloth's Vision: Unsloth has implemented the train on completions feature and also resizing of images for Vision Language Models, and the models now auto resize images which stops OOMs and also allows truncating sequence lengths.
- A Qwen2_VL Colab notebook was also shared for images.
QwQ-32B Bugfixes Bolster Model: Bugfixes have been implemented for the QwQ-32B model, as highlighted in a blog post with corresponding model uploads.
- These fixes improve the model's stability and performance, ensuring a smoother user experience.

Cursor IDE Discord

Cursor Experiences Performance Hiccups on Linux and Windows: Users have reported Cursor experiencing lag and freezing on Linux and Windows, particularly after updates such as 0.47.4 (download link).
- One user detailed that the UI freezes for seconds after just 20-30 messages on Linux; another noted constant lags on Windows even with a high performance laptop running version 3.7.
Claude 3.7 Judged as Underperforming and Disobedient: Users find Claude 3.7 dumb as bricks following the update to 0.47.4, and using it now costs double the credits.
- Members mentioned that Sonnet 3.7 ignores global rules, even when prompted to output them, with one user jokingly suggesting to put 'be a good boy' in your prompt and it will fix anything, according to a tweet.
Cursor Agent Uncorks Terminal Barrage: Multiple users are finding the Cursor agent is spawning an excessive amount of terminals, causing frustration, especially when it restarts servers that are already running.
- One member suggested that this functionality should either be built-in or users should just write the terminal commands themselves.
V0 Praised for Prototyping Speed: Some users advocate using v0 for front-end prototyping due to its UI design capabilities with subframes, which is similar to Figma, before transferring designs to Cursor.
- One user stated it's much better to build prototype and layout (better front end) imo then import locally to cursor, although others favor Cursor because of v0's credit system and limited creative autonomy.

Eleuther Discord

LM Studio Users Seek Support: A member suggested users facing issues with LM Studio seek assistance in the dedicated LM Studio Discord.
- This aims to provide more focused help for LM Studio related problems.
SMILES String Encoders Sought for Stereoisomers: A member inquired about models or architectures that can encode a SMILES string into various stereoisomers or a ChemDraw input.
- The goal is to enable chemical descriptor extraction from these encodings.
Diffusion Models Excel at Generative Tasks: A Nature article was shared, highlighting the proficiency of diffusion models (DMs) in modeling complex data distributions and generating realistic samples for diverse media.
- These models are now state-of-the-art for generating images, videos, audio, and 3D scenes.
Search-R1 Autonomously Searches with RL**: The Search-R1 paper was introduced, detailing an extension of the DeepSeek-R1 model that employs reinforcement learning (RL) to generate search queries during reasoning (see paper).
- The model uses retrieved token masking for stable RL training, enhancing LLM rollouts through multi-turn search interactions.
IMM Claims Faster Sampling Times**: A paper on Inductive Moment Matching (IMM) was shared, noting it as a novel class of generative models that promise faster inference through one- or few-step sampling, surpassing diffusion models.
- Notably, IMM does not require pre-training initialization or the optimization of two networks, unlike distillation methods.

HuggingFace Discord

LLM Faceoff: Ministral 8B vs. Exaone 8B: Members suggested using Ministral 8B or Exaone 8B at 4-bit quantization for LLM tasks.
- A user running an M4 Mac mini with 24 GB RAM, was trying to figure out tokens per second.
SmolAgents Has Troubles With Gemma3: A user reported errors running Gemma3 with SmolAgents, stemming from code parsing and regex issues, pointing to a potential fix on GitHub.
- The user resolved the problem by increasing the Ollama context length.
Awesome Vibe Coding Curates Resources: A curated list of tools, editors, and resources for AI-assisted coding has been released, called Awesome Vibe Coding.
- The list includes AI-powered IDEs, browser-based tools, plugins, command line tools, and the latest news on vibe coding.
PaliGemma 2 Models Drop: Google released PaliGemma 2 Mix, a family of vision language models with three sizes (3B, 10B, and 28B) and resolutions of 224 and 448 that can do vision language tasks with open-ended prompts.
- Check out the blog post for more.
Chess Championship Models Make Illegal Moves?: A user shared a YouTube playlist titled Chatbot Chess Championship 2025, showcasing language models or chess engines playing chess.
- Participants speculated whether the models were true language models or merely calling chess engines, and one person noted a language model made illegal moves.

Perplexity AI Discord

Complexity Extension Goes Full Maintenance: The Complexity extension for Perplexity AI is now in full maintenance mode due to a layout update breaking the extension.
- The developer thanked users for their patience during this maintenance period.
Locking Kernel a Pipe Dream for Security: Users debated whether locking down the kernel would improve security, in the general channel.
- Others argued that this is not feasible due to the open-source nature of Linux, with one user joking about using Windows instead.
Perplexity Users Beg for More Context: Users are requesting larger context windows in Perplexity AI and are willing to pay extra to avoid using ChatGPT.
- A user cited Perplexity's features like unlimited research on 50 files at a time, spaces for custom instructions, and the ability to choose reasoning models as reasons to stay.
Grok 3 Riddled with Bugs Upon Release: The newly released Grok AI is reportedly buggy.
- Users reported that suddenly the chat stops working or breaks in middle.
Gemini Deep Research Not So Deep: Users testing the new Gemini Deep Research feature found it weaker than OpenAI's offerings.
- One user found it retained less context than the regular Gemini, even with search disabled.

aider (Paul Gauthier) Discord

Claude + Aider = Coding SuperPower: Members discussed using Claude with Aider, which augments it with web search/URL scrapping and running bash script calling, resulting in more powerful prompting capabilities.
- One user highlighted that each unique tool added to Claude unlocks a lot more than the sum of its parts, especially when the model searches the internet for bugs.
Does Rust Rocket Aider Speed?: One user inquired about porting Aider to C++ or Rust for faster file processing, particularly when loading large context files for Gemini models.
- Others expressed skepticism, suggesting that the bottleneck remains with the LLM API and any improvements might not be quantifiable.
Linux Lovers Launch Claude Desktop: Users shared instructions for getting the Claude Desktop app to work on Linux, as there isn't an official version.
- One user referenced a GitHub repo providing Debian-based installation steps while another shared their edits to an Arch Linux PKGBUILD.
Aider MCP Server Readme Rescued: Users discussed the Aider MCP Server, with one mentioning that another user's readme was 100x better, referring to this repo.
- However, another user humorously stated that they still can't setup ur mcp despite the readme's existence.
DeepSeek Models Speak Too Much: A user reported that DeepSeek models are generating excessive output, around 20-30 lines of phrases, and inquired about setting a thinking-tokens value in the configuration.
- It was noted that 20 lines is pretty standard for the R1 model, and one user shared that they once waited 2 minutes for the model to think on a 5 word prompt.

Latent Space Discord

OLMo 2 32B Dominates GPT 3.5: AI2 released OLMo 2 32B, a fully open-source model trained up to 6T tokens using Tulu 3.1, outperforming GPT3.5-Turbo and GPT-4o mini on academic benchmarks.
- It is claimed to require only one third of the cost of training Qwen 2.5 32B while reaching similar performance and is available in 7B, 13B, and 32B parameter sizes.
Vibe Coding Creates Games with AI: A developer created a multiplayer 3D game 100% with AI, spending 20 hours and 20 euros, calling the concept vibe coding, and sharing the guide.
- The game features realistic elements like hit impacts, smoke when damaged, and explosions on death, all generated via prompting in Cursor with no manual code edits.
Levels.io's AI Flight Sim Soars to $1M ARR: A member referenced the success of Levels.io's flight simulator, built with Cursor, which quickly reached $1 million ARR by selling ads in the game.
- Levelsio noted, AI really is a creativity and speed maximizer for me, making me just way more creative and more fast.
GitDoc Extension Auto-Commits Changes: Members shared the GitDoc VS Code extension that allows you to edit a Git repo and auto commit on every change.
- One user suggested branching, restarting and other features and said storage is cheap, like auto commit on every change and visualize the tree of changes.
Latent Space Podcast Dives into Snipd AI App: The Latent Space podcast released a new Snipd podcast with Kevin Ben Smith about the AI Podcast App for Learning and released their first ever OUTDOOR podcast on Youtube.
- The podcast features a discussion about @aidotengineer NYC, switching from Finance to Tech, how AI can help us get a lot more out of our podcast time, and dish the details on the tech stack of Snipd app.

LM Studio Discord

LM Studio's Runtimes Retrieved via Reverse Engineering: A user decompiled LM Studio to locate download URLs for offline use, discovering the backends master list and CDN 'APIs' like the Runtime Vendor.
- This was after another user claimed LM Studio doesn't need an internet connection to run, showing a demand for offline runtime access.
Snapdragon Support Requires Direct llama.cpp Execution: A user reported that LM Studio did not detect their Snapdragon X Plus GPU, and another member replied that GPU support requires running llama.cpp directly.
- They directed the user to this github.com/ggml-org/llama.cpp/pull/10693 pull request for more information.
Gemini Vision Hampered by Geo-Restrictions: Users reported issues testing Gemini 2.0 Flash Experimental's image processing abilities, potentially due to regional restrictions in Germany/EU.
- One user in Germany suspected that the limitations were due to local laws while a user in the US reported that Gemini in AI Studio failed to perform the image manipulation.
AI Chess Tournament Highlights Model Accuracy: An AI chess tournament featuring 15 models was held, with results available at dubesor.de/chess/tournament, where results are impacted by game length and opponent moves.
- Although DeepSeek-R1 achieved a 92% accuracy, the organizer clarified that accuracy varies based on game length and opponent moves, and normal O1 was too expensive to run in the tournament.
VRAM Consumption for Gemma 3 Jumps After Vision Update: Following a vision speed increase update, a user reported a significant increase in VRAM usage for Gemma 3.
- Speculation arose that the download size increase may be due to CLIP being used for vision, potentially being called from a separate file, increasing the overall memory footprint.

Nous Research AI Discord

DeepHermes 3 Converts to MLX: The model mlx-community/DeepHermes-3-Mistral-24B-Preview-4bit was converted to MLX format from NousResearch/DeepHermes-3-Mistral-24B-Preview using mlx-lm version 0.21.1.
- This conversion allows for efficient use on Apple Silicon and other MLX-compatible devices.
Deep Dive into VLLM Arguments for Hermes 3: Members are sharing different configurations to get vllm working correctly with Hermes-3-Llama-3.1-70B-FP8, including suggestions like adding --enable-auto-tool-choice and --tool-call-parser for Hermes 3 70B.
- One member noted the need for <tool_call> and </tool_call> tags in the tokenizer, which are present in Hermes 3 models but not necessarily in DeepHermes.
Vultr Announces Inference Pricing: A member from Vultr shared the official pricing for their inference API, which includes $10 for 50 million output tokens initially, then 2 cents per million output tokens after, accessible via an OpenAI-compatible endpoint at https://api.vultrinference.com/
- This pricing is a result of purchasing an absurd amount of gh200s and needing to do something with them, according to a member.
Dynamic LoRAs Docking into VLLM: Members discussed the possibility of hosting dynamic LoRAs with vllm for various use cases, like up-to-date coding styles, referencing the vLLM documentation.
- It was suggested to let users pass in their huggingface repo IDs for the LoRAs and supply them into the VLLM serve command cli args.

MCP (Glama) Discord

Astro Clients gear up for MCP Integration: A member plans to use MCP for their Astro clients, using AWS API Gateway with each MCP server as a Lambda function, leveraging the MCP bridge with the SSE gateway.
- The goal is to enable MCP usage specifically for customers and explore adding MCP servers to a single project for client visibility.
Decoding MCP Server Architecture: A member inquired how clients like Cursor and Cline, which keep MCP servers on the client side, communicate with the backend.
- The discussion involved the architecture and communication methods used by these clients but was redirected to a more specific channel for detailed information.
Smart Proxy Server Converts to Agentic MCP: A smart proxy MCP server converts standard MCP servers with many tools into one with a single tool via its own LLM, effectively a sub-agent approach using vector tool calling.
- The OpenAI Swarm framework follows a similar process of assigning a subset of tools to individual agents, now rebranded as openai-agents by OpenAI.
Debugger Uses MCP Server to Debug Webpages: A member shared a debugger project, chrome-debug-mcp (https://github.com/robertheadley/chrome-debug-mcp), that uses MCP to debug webpages with LLMs, originally built with Puppeteer.
- The project has been ported to Playwright, with the updated GitHub repository pending after further testing.
MCP Hub Concept Simplifies Server Management: To enhance enterprise adoption of MCP, a member created an MCP Hub concept featuring a dashboard for simplified server connections, access control, and visibility across MCP servers, as demoed in this video.
- The hub aims to address concerns about managing multiple MCP servers and permissions in enterprise settings.

Interconnects (Nathan Lambert) Discord

DeepSeek Confiscates Employee Passports: The owner of DeepSeek reportedly asked R&D staff to surrender their passports to prevent foreign travel, according to Amir on Twitter.
- Members debated whether this would lead to more open source work from DeepSeek, or if the US might adopt similar measures.
SF Compute H100 Prices Shock the Market: A member pointed out that SF Compute offers surprisingly low prices for H100s, especially for short-term rentals, advertising 128 H100s available for hourly use.
- San Francisco Compute Company is launching soon an additional 2,000 H100s and runs a market for large-scale, vetted H100 clusters, while also sporting a simple but powerful CLI.
Gemma 3 License Raises Red Flags: A recent TechCrunch article highlighted concerns over model licenses, particularly Google's Gemma 3.
- The article noted that while Gemma 3's license is efficient, its restrictive and inconsistent terms could pose risks for commercial applications.
User Data Privacy is Under Siege: A member reported their frustration with individuals discovering their phone number online and making unsolicited requests, such as "hey nato, can you post-train my llama2 model? ty".
- They speculate that extensions or paid services are the source and are seeking methods to remove their data from sites like Xeophon.
Math-500 Sampling Validated: In response to a question about seemingly random sampling in Qwen's github repo evaluation scripts, it was confirmed that apparently sampling is random.
- Members cited Lightman et al 2023 and that long context evals and answer extraction is a headache and that Math 500 is very well correlated.

OpenRouter (Alex Atallah) Discord

Cohere Commands Attention with 111B Model: Cohere launched Command A, a new open-weights 111B parameter model boasting a 256k context window, with a focus on agentic, multilingual, and coding applications.
- This model is designed to deliver high performance in various use cases.
AI21 Jamba Jams with New Models: AI21 released Jamba 1.6 Large with 94 billion active parameters and a 256K token context window, alongside Jamba 1.6 Mini, featuring 12 billion active parameters.
- Both models now support structured JSON output and tool-use.
Gemma Gems Gleam for Free: All variations of Gemma 3 are available for free: Gemma 3 12B which introduces multimodality, supporting vision-language input and text outputs and handles context windows up to 128k tokens.
- The model understands over 140 languages, and also features Gemma 3 4B and Gemma 3 1B models.
Anthropic API Anomaly Averted: Anthropic reported an incident of elevated errors for requests to Claude 3.7 Sonnet, with updates posted on their status page.
- The incident has been resolved.
Chess Tournament Pits AI Models Against Each Other: An AI chess tournament, accessible here, pits 15 models against each other using standard chess notations for board state, game history, and legal moves.
- The models are fed information about the board state, the game history, and a list of legal moves.

Yannick Kilcher Discord

Go Wins at Practical Porting: Members debated the utility of Go vs Rust for porting, explaining that porting to Go function-by-function allows for exact behavior parity, avoiding rewriting code for years.
- While Rust is faster and more efficient, a member pointed out that Golang is really ergonomic to develop in, particularly for distributed, async, or networked applications.
DeepSeek Hype Suspicions Sparked: Some members argued that the hype around DeepSeek is engineered and that their models are simplified, likening the comparison to frontier AI models comparing Ananas with Apple.
- Others defended DeepSeek, claiming their crazy engineers developed a filesystem faster than life.
OLMo 2 32B Fully Open: OLMo 2 32B launched as the first fully-open model to outperform GPT3.5-Turbo and GPT-4o mini on academic benchmarks.
- It is claimed to be comparable to leading open-weight models while only costing one third of the training cost of Qwen 2.5 32B.
ChatGPT is Overrated, Claude Preferred: One member expressed that ChatGPT is overrated because it actually doesn't solve problems that I need solved, preferring Mistral Small 24B, QwQ 32B, and Claude 3.7 Sonnet.
- Another user shared, I've had better luck getting what I want from Claude, and that it seems better at understanding intention and motivation for whatever reason.
Grok 3 Writes Professional Code: Members debated code generation qualities, highlighting that OpenAI models often generate legacy code, while Mistral can refactor it into more modern code.
- It was also noted that Grok 3 generates code that looks like a professional programmer wrote it, while in VSCode, one member prefers using Amazon Q over Copilot.

GPU MODE Discord

Speech-to-Speech Models Spark Quest: A member is actively seeking speech-to-speech generation models that focus on conversational speech, distinguishing them from multimodal models like OpenAI Realtime API or Sesame AI.
- Two potential models were identified: Moshi from Kyutai Labs and Hertz-dev from Standard-Intelligence.
Block Diffusion Bridges Autoregressive and Diffusion: The Block Diffusion model, detailed in an ICLR 2025 Oral presentation, combines autoregressive and diffusion language model benefits, offering high quality, arbitrary-length generation, KV caching, and parallelizable processing.
- Code can be found on GitHub and HuggingFace.
Triton bitpacking gets Huge Boost: Bitpacking in Triton achieved significant speed-ups versus the Pytorch implementation on the 4090, achieving 98x speedup for 32-bit packing and 26x for 8-bit packing.
- Re-packing a Llama3-8B model time was reduced from 49 sec -> 1.6 sec using the new bitpacking implementation, with code available on GitHub.
Gemma3 Gains Traction in vLLM and LigerKernel: Members discussed adding Gemma 3 support to vLLM, referencing this GitHub issue, while a member has started drafting an implementation of Gemma3 into LigerKernel, and shared a link to the pull request.
- According to the pull request, Gemma3 has high similarities to Gemma2 with some differences in RMSNorm Calls.
GRPO Gains Popularity for LLM Training: Members discussed how Group Relative Policy Optimization (GRPO) has become popular for Reinforcement Learning in Large Language Models, referencing the DeepSeek-R1 paper.
- A blog post from oxen.ai on GRPO VRAM requirements was shared, noting its effectiveness in training.

OpenAI Discord

Intelligence Declines Spark Debate: Discussion sparked from a Financial Times article that average intelligence is dropping in developed countries, citing increased reports of cognitive challenges and declining performance in reasoning and problem-solving.
- One member theorized this could be due to technology, especially smartphones and social media, leading to outsourcing of thinking, however the graphics only showed the years really before ChatGPT became a thing.
Is Tech the Culprit for Cognitive Decline?: Members debated potential causes for cognitive decline, including technology's influence, immigration, and fluoridated water.
- One member pointed out that the rates of cognitive challenges were steadily increasing since the 1990s, and a sudden acceleration from around 2012.
DeepSeek V3 Distilled from OpenAI Models: The discussion covers that Deepseek V3 (the instruct version) was likely distilled from OpenAI models.
- One member notes that even OpenAI unofficially supports distilling their models, they just don't seem to like it when Deepseek does it.
Claude Sonnet 3.7 Dominates in Coding Tasks: A member now uses Claude Sonnet 3.7 exclusively for coding, finding ChatGPT lagging behind.
- In related news, a member stated that the o3-mini-high model is better than o1.
Food Additives Fuel Mental Slowdown: Members discuss that the availability and consumption of ultra-processed foods (UPFs) has increased worldwide and represents nowadays 50–60% of the daily energy intake in some high-income countries, and is linked to cognitive decline
- Another member mentions Multinational corporations such as Nestlé that operate in many countries produce and distribute worldwide, it seems understandable how different additives or changes made to these products in one of these corporations can have a worldwide impact.

Notebook LM Discord

Gemini 2.0 Deep Research Joins NotebookLM?: Members are exploring pairing Gemini 2.0 Deep Research with NotebookLM for enhanced documentation handling.
- The community questioned if Deep Research might eventually supersede NotebookLM in functionality.
NotebookLM Inspires African Project Ecokham: A member from Africa reported using NotebookLM to connect thoughts, edit roadmaps, and generate audio for his project, Ecokham.
- He expressed gratitude for NotebookLM's contribution to inspiring his team.
NotebookLM Prototyping PhytoIntelligence Framework: A member is leveraging NotebookLM to organize notes and prototype the PhytoIntelligence framework for autonomous nutraceutical design, with the aim of mitigating diagnostic challenges.
- The user acknowledged Google for the tool's capabilities.
Users Demand Image & Table Savvy in NotebookLM: Users are requesting image and table recognition in NotebookLM, complaining that the current state feels incomplete because of the need to constantly reopen source files and dig through Google Sheets; one user even shared a relevant cat GIF.
- The community emphasized that images are worth a "thousand words" and the clearest data is often found in tables.
Mobile App still not here for NotebookLM: Users are actively requesting a mobile app version of NotebookLM for improved accessibility.
- The community feels a mobile version is "still not yet coming up".

LlamaIndex Discord

Google Gemini and Vertex AI Merge in LlamaIndex!: The @googleai integration unifies Google Gemini and Google Vertex AI with streaming, async, multi-modal, and structured prediction support, even supporting images, detailed in this Tweet.
- This integration simplifies building applications leveraging Google's latest models.
LlamaIndex Perks Debated: A member sought clarity on LlamaIndex's advantages over Langchain for building applications.
- The inquiry did not lead to a conclusive discussion within the provided context.
OpenAI's Delta Event Absence Probed: A member questioned why OpenAI models do not emit delta events for tool calling, observing the events are emitted but empty.
- The consensus was that tool calling cannot be a stream because the LLM needs the full tool response to generate the subsequent response, advocating for a DIY approach.
API for Agentic RAG Apps Questioned: There was a question on whether any API exists specifically for building Agentic RAG applications to streamline development and management.
- The conversation mentioned that multiple constructs are available in LlamaIndex but lacked a clear, definitive guide.

Nomic.ai (GPT4All) Discord

Gemma 3 12B Edges Out Qwen in IQ Test: A user reported that the Gemma 3 12B model outperformed Qwen 14B and Qwen 32B in terms of intelligence on their personal computer.
- This was tested by asking questions in multiple languages; Gemma 3 and DeepSeek R1 consistently provided correct answers in the same language as the question.
Gemma 3 Needs New GPT4All Support: Users noted that GPT4All may require updates to fully support Gemma 3 12B because its architecture differs from Gemma 2.
- Specifically, Gemma 3 needs an mmproj file to work with GPT4All, highlighting the challenges of quickly adapting to new AI model developments.
Freezing Water Tests AI Knowledge: When queried about freezing water, DeepSeek-R1 incorrectly predicted that jars would break, while Gemma-3-12b accurately described the shattering effect due to water expansion.
- This demonstrates the models' varying levels of understanding of basic physics, indicating the diverse reasoning capabilities across different architectures.

DSPy Discord

Explicit Feedback Flows into Refine: A member requested the reintroduction of explicit feedback into dspy.Refine, similar to dspy.Suggest to enhance debugging and understanding.
- The member emphasized the value of explicit feedback for identifying areas needing improvement.
Manual Feedback Makes its Mark on Refine: The team announced the addition of manual feedback into Refine.
- The implementation involves including feedback in the return value of the reward function as a dspy.Prediction object, containing both a score and feedback.
Reward Function Returns Feedback: A team member inquired about the feasibility of integrating feedback as part of the return value of the reward_fn.
- The user responded affirmatively, expressing their gratitude.

Cohere Discord

Command A Debuts on OpenRouter: Cohere's Command A, a 111B parameter open-weights model boasting a 256k context window, is now accessible on OpenRouter.
- The model aims to deliver high performance across agentic, multilingual, and coding applications, setting a new standard for open-weight models.
Command A Flunks Prime Number Test: Users discovered a peculiar bug in Command A: when queried about prime numbers whose digits sum to 15, the model either provides an incorrect response or gets stuck in an infinite loop.
- This unexpected behavior highlights potential gaps in the model's mathematical reasoning capabilities.
Local API struggles with Command A: Users are encountering performance bottlenecks when running Command A locally, reporting that even with sufficient VRAM, the model doesn't achieve acceptable speeds without patching the modeling in ITOr or using the APIs.
- This suggests that optimizing Command A for local deployment may require further work to enhance its efficiency.
Cohere unveils Compatibility base_url: A member suggests to use the Cohere Compatibility API.
- They recommend utilizing base_url for integration.

Modular (Mojo 🔥) Discord

Discord Account Impersonation Alert: A member reported being messaged by a scam account impersonating another user, and the impersonated user confirmed that "That is not me. This is my only account".
- The fake account caroline_frascaa was reported to Discord and banned from the server after a user posted a screenshot of the fake account.
Mojo stdlib Uses Discussed: The use of some feature in the Mojo stdlib was mentioned by soracc in #mojo.
- The user mentioned it is used in base64.

LLM Agents (Berkeley MOOC) Discord

Self-Reflection Needs External Evaluation: A member inquired about statements on self-evaluation from the first lecture in contrast to the second, suggesting a contradiction regarding the role of external feedback.
- The first lecture emphasized that self-reflection and self-refinement benefit from good external evaluation, while self-correction without oracle feedback can degrade reasoning performance.
Clarification on Self-Evaluation in Lectures 1 & 2 Sought: A user is seeking clarification on the apparent contradiction between the lectures regarding self-evaluation.
- They noted the emphasis on self-evaluation and improvement in the second lecture, while the first lecture highlighted the importance of external evaluation and the potential harm of self-correction without oracle feedback.

AI21 Labs (Jamba) Discord

Vertex Preps for Version 1.6: Version 1.6 is not yet available on Vertex, but is slated to arrive in the near future.
- It will also be available on other platforms like AWS for broader access.
AWS Soon to Host 1.6: Version 1.6 will be available on platforms like AWS in the near future, expanding its reach.
- This development aims to allow AWS customers access to the new features.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Unsloth AI (Daniel Han) ▷ #general (301 messages🔥🔥):

Gemma 3 Support in Unsloth, Multi-GPU Training, Dynamic Quantization vs GGUF, GRPO and Reasoning, Vision Models

Unsloth Unleashes Gemma 3 Support: Unsloth now supports Gemma 3, including full fine-tuning and 8-bit, with almost any model like Mixtral, Cohere, Granite.
- According to a tweet, optimizations in Unsloth led to 10% less VRAM usage and a 10% speedup boost for 4-bit, plus fixes and improvements for Windows support and GGUF conversions.
Multi-GPU Support Still on the Horizon: Despite user interest, Unsloth currently doesn't natively support multi-GPU training in the free version.
- However, there is speculation that it is possible by deconstructing the components to your training code (FastLanguageModel), and there is the imminent release of the AGPL3 multi-GPU and the Enterprise version.
Dynamic Quants Duel GGUF in Quality: There's ongoing discussion about comparing dynamic quantization with GGUF models, especially regarding the trade-offs between size and quality.
- Unsloth's dynamic quants for Phi-4 are on the Hugging Face leaderboard, but a direct comparison with GGUF benchmarks is anticipated to clarify performance at different bit widths.
GRPO Powers Reasoning Prowess: The team mentioned that GRPO (Guiding Preference Optimization) is coming next week along with new notebooks.
- They will have a GRPO notebook; and they stated, only if you let it reason about the rules first a la GRPO.
Vision Models Get the Unsloth Treatment: Unsloth has implemented the train on completions feature and also resizing of images for Vision Language Models, a highly demanded feature, to reduce OOM.
- The models now auto resize images which stops OOMs and also allows truncating sequence lengths.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Gemma 3 models, Unsloth support for models, GRPO for reasoning models, QwQ-32B bugfixes, New model uploads

Google's Gemma 3 Integrated with Unsloth: Google's new Gemma 3 model is now supported in Unsloth with a blog post and a Colab notebook provided.
- All Gemma 3 model uploads are available on Hugging Face, including versions optimized for full finetuning, 8-bit, and pretraining.
Unsloth Boosts Gemma 3 Finetuning Speed: Unsloth accelerates Gemma 3 (12B) finetuning by 1.6x, reduces VRAM usage by 60%, and extends context length by 6x compared to environments using Flash Attention 2 on a 48GB GPU.
- The team fixed issues with training Gemma 3 and uploaded all versions including 2-8 bit GGUFs, dynamic 4-bit, and 16-bit versions.
GRPO Enables Extended Context with Reduced VRAM: Unsloth now supports 10x longer context with 90% less VRAM using GRPO (Generalized Rank Position Optimization), detailed in a blog post and tutorial.
- This enhancement is designed for reasoning models, offering significant memory savings and expanded context windows.
QwQ-32B Model Gets a Makeover: Bugfixes have been implemented for the QwQ-32B model, as highlighted in a blog post with corresponding model uploads.
- These fixes improve the model's stability and performance, ensuring a smoother user experience.
Fresh Model Uploads Hit Hugging Face: New model uploads include Gemma 3 GGUF variants (1B, 4B, 12B, 27B), Gemma 3 Dynamic 4-bit versions, QwQ-32B variants, and Phi-4-mini versions, all available on Hugging Face.
- These models cater to various hardware configurations and performance needs, expanding the accessibility of state-of-the-art models.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (5 messages):

Gemma 3, Ollama, Phi Vision, GGUFs vision

Gemma 3 Image Compatibility with Ollama Questioned: A member inquired whether Gemma 3 works with images via Ollama, similar to Phi Vision.
- Another user clarified that their Gemma 3 GGUFs vision component functions on all engines except Ollama, including LM Studio and llama.cpp; this is likely due to llama-server lacking vision support yet.
Ollama lacks vision support: The Gemma 3 GGUFs vision component functions on all engines except Ollama, including LM Studio and llama.cpp.
- This is likely due to llama-server lacking vision support yet.

Unsloth AI (Daniel Han) ▷ #help (51 messages🔥):

Gemma-3 GGUF and Ollama, Llama 3.2 inference cancellation, Phi-4-mini support, Gemma finetuning error, TurboML Continual Pre-Training

Gemma-3 GGUF vision fails with Ollama: Support for Gemma-3 GGUF models with vision components in Ollama is currently non-functional due to a widespread issue affecting all uploaders.
- The recommendation is to use Ollama's original Gemma model for text, until the issue can be debugged.
Llama 3.2 inference needs cancellation method: A user inquired about the possibility of cancelling long-running inferences with Llama 3.2 models in Unsloth without unloading the model from memory.
- The user asked if they could stop it after a certain amount of time in a timeout loop.
Phi-4-mini receives new Unsloth update: After upgrading to the latest Unsloth version, users should now be able to use Phi-4-mini models, which was previously causing a RuntimeError.
- The user reported that Phi-4-mini works fine, but unsloth/Phi-4-mini-instruct gives a rope_scaling error.
Gemma finetuning error requires unsloth update: Users reported encountering an error (only Tensors of floating point dtype can require gradients) when fine-tuning Gemma models after adding new tokens and training new embeddings, particularly during the evaluation phase.
- Upgrading to the latest version of Unsloth is recommended, though some users have reported that the issue persists even after the update.
TurboML seeks guidance on Continual Pre-Training dataset format: A member with a new framework - TurboML - sought advice on the correct dataset format for Continual Pre-Training (CPT), specifically for Sentence Completion and SFT tasks.
- They referenced a Unsloth notebook and inquired about the placement of the EOS token.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

Gemma SFT, Maximum Context Length, Memory Usage Calculation

Gemma's SFT VRAM Needs Debated: Members discussed Gemma's VRAM usage in SFT, suggesting it may require more VRAM than Qwen 2.5 under similar training conditions, though specific numbers were not given.
- One shared a Qwen2_VL Colab notebook for images.
Max Context Length is a Limitation, Not a Hyperparameter: Members clarified that the maximum context length is not a hyperparameter of a model but rather a limitation based on available memory.
- The longer the context, the more memory is needed to process it, but exact calculation is not solely based on model size.
Estimating Memory Usage for LLMs: It was mentioned that the amount of memory needed to process context depends on the model architecture, with different layers requiring varying amounts of memory.
- Links were shared to approximate memory requirements (Substratus AI blog and Hugging Face space).

Links mentioned:

Cursor IDE ▷ #general (263 messages🔥🔥):

Cursor performance issues on Linux and Windows, Issues with Claude 3.7, Custom modes in Cursor, Gemini API key issues, Cursor agent spawning terminals

Cursor Lags on Linux and Windows: Users report Cursor lagging and freezing on both Linux and Windows, even with powerful hardware, especially after recent updates like 0.47.4 (download link).
- One user mentioned that on Linux, after 20-30 messages, the UI freezes for seconds, while another cited that they have "lags all the time" on Windows with a beefed up laptop and version 3.7.
Claude 3.7 Underperforms and Ignores Global Rules: Users are experiencing issues with Claude 3.7, reporting it as dumb as bricks after upgrading to 0.47.4 and noting that it costs double the credits to use.
- Some members have reported that Sonnet 3.7 ignores global rules, even when explicitly asked to output the rules being used, with one suggesting to put 'be a good boy' in your prompt and it will fix anything.
Cursor Agent Triggers Terminal Tsunami: Multiple users find the Cursor agent excessively spawning terminals, which is seen as annoying, especially when it restarts already running servers.
- A member suggested that this behavior should be built-in or that users should just write the terminal commands themselves if they don't like it.
V0 Sparks Joy for Quick UI Prototyping: Some users find v0 better for front-end focused prototyping, allowing for UI design with subframes, similar to Figma, before importing to Cursor.
- One user noted, it's much better to build prototype and layout (better front end) imo then import locally to cursor, however some prefer Cursor due to v0's credit-based system and less creative control.

Links mentioned:

Eleuther ▷ #general (2 messages):

LM Studio, SMILES string encoding, ChemDraw

Advocating for LM Studio Support: A member suggested that users with LM Studio related problems might find more targeted assistance in the LM Studio Discord.
Seeking SMILES String Encoders for Stereoisomers: A member inquired about existing models or architectures capable of encoding a SMILES string into various stereoisomers or encoding a ChemDraw input, with the aim of enabling chemical descriptor extraction.

Eleuther ▷ #research (255 messages🔥🔥):

Diffusion Models for Generative Tasks, Search-R1: RL for Autonomous Search Query Generation, Spectral Analysis of Latent Spaces, Noise Sensitivity in Diffusion Models, Inductive Moment Matching (IMM) for Fast Sampling

Diffusion Models Emerge for Generative Tasks: A member shared a Nature article highlighting the advancements in diffusion models (DMs), noting their capability in modeling complex data distributions and generating realistic samples for images, videos, audio, and 3D scenes.
- The article describes how diffusion models have become state-of-the-art in generative tasks.
Search-R1 Learns Autonomous Search via RL: A member shared a paper introducing Search-R1, an extension of the DeepSeek-R1 model that learns to generate search queries during reasoning using reinforcement learning (RL).
- The model optimizes LLM rollouts with multi-turn search interactions, using retrieved token masking for stable RL training.
Latent Space's Spectral Analysis Probed: Members discussed spectral analysis on latent spaces of diffusion models, with one member noting that the model is resilient to perturbations in the initial noise, even at t=0 (max noise).
- Another member noted that there was nothing actionable in comparing the radially averaged power spectral density of images encoded with flux VAE vs SDXL VAE.
Noise Sensitivity Analysis Reveals Variance: Members discussed how small changes in initial noise affect the output of diffusion models, with one member sharing a plot showing points where small noise turns into a big difference.
- They observed brighter pixels indicating sensitive areas in the initial noise that cause significant changes in the output.
Inductive Moment Matching (IMM) Promises Speedy Sampling: A member shared a paper on Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure, which is faster than diffusion models at inference.
- Unlike distillation, IMM does not require pre-training initialization and optimization of two networks.

Links mentioned:

HuggingFace ▷ #general (195 messages🔥🔥):

Ministral 8B, Exaone 8B, Jungle Chess AI, Stable Diffusion, Gemini 2.0

Mistral 8B and Exaone 8B Recommended: For users seeking LLMs, members recommended using Ministral 8B or Exaone 8B at 4-bit quantization.
- A user with an M4 Mac mini with 24 GB RAM inquired about expected tokens per second, but the exact performance remains speculative depending on hardware specs.
User Tries to Train Jungle Chess AI: A user attempted to create an AI for Jungle Chess using o3-mini, but the AI failed to understand or comply with a bug report related to its alpha-beta algorithm as discussed in this thread.
- The user noted it could reach depth 6 but couldn't avoid a simple opening trap, whereas a depth of 3 or 4 should suffice, suggesting the Chinese Engine was better at that.
Chatbot Chess Championship Emerges!: A user shared a YouTube playlist titled Chatbot Chess Championship 2025, showcasing language models or chess engines playing chess.
- Participants speculated whether the models were true language models or merely calling chess engines, and one person noted a language model made illegal moves.
User Searches for Stable Diffusion Model: A user requested help finding a Stable Diffusion v1.x model fine-tuned on datasets other than LAION.
Looking for Gemini 2.0 Flash Open Source Model: A member inquired about the existence of an open-source model similar to Gemini 2.0 Flash with text-plus-image-to-image capabilities for image editing.

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

ilyachua: Hi all. I am starting on the CV course from hugging face

HuggingFace ▷ #i-made-this (2 messages):

Awesome Vibe Coding, mahimairaja/awesome-csm-1b

Awesome Vibe Coding List Arrives: A curated list of tools, editors, and resources for AI-assisted coding has been released, called Awesome Vibe Coding.
- The list includes AI-powered IDEs, browser-based tools, plugins, command line tools, and the latest news on vibe coding.
CSM 1B Use Cases Curated: A curated list of use cases built using Sesame's CSM 1B has been released, called awesome-csm-1b.

Links mentioned:

HuggingFace ▷ #reading-group (1 messages):

generate_without_kv_cache function

Typo Spotted in Function Name: A user pointed out a discrepancy where the function name in the article is generate_without_kv_cache but the function call used is generate_with_kv_cache.
- No further discussion was provided.
Function Call Discrepancy: The article mentions a function named generate_without_kv_cache, but the actual code uses generate_with_kv_cache, indicating a possible error.
- This discrepancy could lead to confusion or incorrect usage if users copy the function call directly from the article.

HuggingFace ▷ #computer-vision (2 messages):

PaliGemma 2 Mix, smolVLM2, QwenVL, Llama 3.2 Multimodal

Google Drops PaliGemma 2 Mix Models: Google released PaliGemma 2 Mix, a new family of versatile vision language models with three sizes (3B, 10B, and 28B) and resolutions of 224 and 448 that can do vision language tasks with open-ended prompts and understand documents (blog post).
Raschka Roots into Multimodal LLMs: Sebastian Raschka explains how multimodal LLMs function, and reviews recent multimodal papers and models, including Meta AI's Llama 3.2 (blog post).
SmolVLM2 Architecture Deep Dive: Users interested in understanding SOTA architectures from the root should start with CLIP and BLIP2 before moving to smolVLM2 and QwenVL.
- Alternatively, one can learn about the recent ones first and traverse the same tree backward.

Links mentioned:

HuggingFace ▷ #agents-course (25 messages🔥):

SerpAPI Key Errors, Deep RL Course, Interactive IDEs for Agent Code, Image to Video Loops, Gemma3 Issues with SmolAgents

SerpAPI Keys Causing Headaches: A member reported a potential error with their SerpAPI key and wondered if others were experiencing the same issue.
- Another member clarified that users are required to supply their own SerpAPI key, which wasn't immediately clear from the course example.
Deep RL Course Sparks Questions: Several members inquired about the Discord server's dedication to the Deep RL course and reported that the Deep Reinforcement Learning Leaderboard was not working.
- One member mentioned they were also facing the same problem and had recently started the Deep RL course.
Interactive IDE Recommendations Sought: A member asked for recommendations for an interactive IDE that provides suggestions for writing agents code.
- Another member recommended VS Code as a capable and free option.
Image Looping Sparks Model Search: A member requested recommendations for a model capable of turning a 1920x1080 image into a 3-5 second video loop, running on an H100.
- They noted difficulty finding options beyond 720p.
Gemma3's Troubles with SmolAgents: A member encountered errors while running Gemma3 with SmolAgents, specifically related to code parsing and regex patterns, and linked to a potential fix on GitHub.
- They solved the issue by increasing the Ollama context length.

Links mentioned:

Perplexity AI ▷ #general (213 messages🔥🔥):

Complexity Extension issues, Kernel locking, Perplexity context window sizes, Grok 3 bugs, Gemini's deep research

Complexity Extension in Full Maintenance Mode: Due to a new Perplexity layout breaking the Complexity extension, it is now in full maintenance mode.
- The developer thanked users for their patience.
Debate over Kernel Locking for Security: Some users suggested locking down the kernel for security, but others argued that this is impossible due to the open-source nature of Linux.
- One user sarcastically quipped that if the anticheat decides to make and enforce a custom Linux kernel build... at that point you might as well just use windows lol.
Context Window Size still a Pain Point for Perplexity: Users are pleading to increase the context window size, so they stop paying for ChatGPT.
- One user stated they were willing to pay extra for a larger context window, due to the other features that Perplexity has that others don't: ability to do unlimited research on 50 files at a time, best thing is spaces, where we can give custom instructions along with 50 knowledge uploadable files, and option to choose reasoning models.
Grok 3 Plagued by Bugs on Launch: Users reported that the newly launched Grok AI has many bugs.
- Reported bugs are that suddenly the chat stops working or breaks in middle.
Gemini's new deep research is lacking: Some users have tested the new Gemini Deep Research and found it weak compared to what OpenAI offers.
- One user found it retained less context than the regular Gemini, even with search off.

Link mentioned: Ten Thousand: no description found

Perplexity AI ▷ #sharing (3 messages):

OpenAI custom agent, Airpods Live Translation, Anthropic CEO AI quit-button

OpenAI debuts Custom Agents: Perplexity links to OpenAI releases new custom agent.
- No discussion, but a potentially interesting link for someone to check out.
Airpods to introduce Live Translation: Perplexity links to Airpods to introduce live translation.
- No discussion, but a potentially interesting link for someone to check out.
Anthropic CEO Floats AI Quit-Button: Perplexity links to Anthropic CEO floats AI quit-button.
- No discussion, but a potentially interesting link for someone to check out.

aider (Paul Gauthier) ▷ #general (133 messages🔥🔥):

Claude with Aider, Rust for Aider, Claude Desktop on Linux, Aider MCP Server, Anthropic Status

Claude and Aider Combine Forces for Coding!: Members discussed using Claude with Aider, which augments it with web search/URL scrapping and running bash script calling, resulting in more powerful prompting capabilities.
- One user highlighted that each unique tool added to Claude unlocks a lot more than the sum of its parts, especially when the model searches the internet for bugs.
Does Rust make Aider run faster?: One user inquired about porting Aider to C++ or Rust for faster file processing, particularly when loading large context files for Gemini models.
- Others expressed skepticism, suggesting that the bottleneck remains with the LLM API and any improvements might not be quantifiable.
Linux users turn to github for Claude Desktop app: Users shared instructions for getting the Claude Desktop app to work on Linux, as there isn't an official version.
- One user referenced a GitHub repo providing Debian-based installation steps while another shared their edits to an Arch Linux PKGBUILD.
Aider MCP Server Readme Needs Help!: Users discussed the Aider MCP Server, with one mentioning that another user's readme was 100x better, referring to this repo.
- However, another user humorously stated that they still can't setup ur mcp despite the readme's existence.
Anthropic 3.7 Sonnet Encounters hiccups: Users reported empty responses from the Claude 3.7 Sonnet model, prompting checks of their Anthropic accounts.
- The Anthropic Status page confirmed elevated errors, indicating an issue was under investigation and a fix was being implemented.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (44 messages🔥):

DeepSeek models configuration, Aider's architect mode behavior, Modifying Aider's completion endpoint, Aider configuration files

DeepSeek Models Speak Too Much: A user reported that DeepSeek models are generating excessive output, around 20-30 lines of phrases, and inquired about setting a thinking-tokens value in the configuration.
- It was noted that 20 lines is pretty standard for the R1 model, and one user shared that they once waited 2 minutes for the model to think on a 5 word prompt.
Architect Mode Plans Infinitely: A user experienced issues with Aider's architect mode, where the model would continuously plan without passing code changes to the editor, even after being prompted to make the code change.
- It was suggested that the user may need to explicitly add files beforehand and/or to list down what files and functions may be affected.
Modify Aider API Calls For OpenWebUI: A user inquired about modifying how Aider calls the completions endpoint to integrate with OpenWebUI's knowledge collections, which requires a files parameter with a collection ID, referencing OpenWebUI API documentation.
- It was suggested to use the extra_params or extra_body configuration options to add the necessary parameters.
Global vs Local ai-instructions.md Files: A user asked whether ai-instructions.md files should be placed in each project or if a single global file can be configured.
- The response clarified that these files, like conventions.md, can be handled as per user preference, with a recommendation to use a global file for personal use and local files for project-specific conventions.
Configure OpenRouter API Key: A user needed help to configure Aider with the OpenRouter API key.
- A member showed the correct config format api-key: - openrouter=sk-or-v1-...

Links mentioned:

Latent Space ▷ #ai-general-chat (17 messages🔥):

OLMo 2 32B, AI Engineer Singapore 2025, AI Game Generation, Gemini DeepResearch with 2.0, Claude's Birthday

OLMo 2 32B beats GPT 3.5 and GPT 4o mini!: AI2 released OLMo 2 32B, a fully open-source model trained up to 6T tokens using Tulu 3.1, outperforming GPT3.5-Turbo and GPT-4o mini on academic benchmarks.
- It is claimed to require only one third of the cost of training Qwen 2.5 32B while reaching similar performance and is available in 7B, 13B, and 32B parameter sizes.
AI Engineer Singapore 2025 event announced: The AI Engineer Singapore 2025 event was announced, aiming to bridge the gap between cutting-edge AI research and practical engineering applications.
- The event is organized by the team behind AI Eng Summit, World's Fair, JSConf Asia, and GovTech Singapore.
Vibe Coding: Creating Games Entirely with AI: A developer created a multiplayer 3D game 100% with AI, spending 20 hours and 20 euros, calling the concept vibe coding, and sharing the guide.
- The game features realistic elements like hit impacts, smoke when damaged, and explosions on death, all generated via prompting in Cursor with no manual code edits.
Gemini DeepResearch 2.0 is getting love: Members are reporting very good results with the new Gemini DeepResearch with 2.0 model.
- One member noted that it is great for company OSINT compared to chatGPT deep research because it refuses to answer questions much less.
Happy Birthday, Claude!: Members celebrated the second birthday of Claude.
- This coincides with the birthday of GPT-4 as well.

Links mentioned:

Latent Space ▷ #ai-announcements (3 messages):

Snipd Podcast, AI Podcast App, Latent Space Podcast

Snipd Podcast Launch Outdoor!: The Latent Space podcast released a new Snipd podcast with Kevin Ben Smith about the AI Podcast App for Learning.
- The podcast features a discussion about aidotengineer NYC, switching from Finance to Tech, how AI can help us get a lot more out of our podcast time, and dish the details on the tech stack of Snipd app.
Latent Space Podcast releases a Youtube Video: Latent Space Podcast released their first ever OUTDOOR podcast on Youtube.
- The podcast features a discussion about @aidotengineer NYC, switching from Finance to Tech, how AI can help us get a lot more out of our podcast time, and dish the details on the tech stack of @snipd_app.

Link mentioned: Tweet from Latent.Space (@latentspacepod): 🆕 Snipd: The AI Podcast App for Learninghttps://youtu.be/FNRO_SYx68QOur first ever OUTDOOR podcast! @swyx and @KevinBenSmith chat about @aidotengineer NYC, switching from Finance to Tech, how AI can ...

Latent Space ▷ #ai-in-action-club (120 messages🔥🔥):

Cursor vs Claude, Levelsio flight sim, GitDoc VS Code extension, Vibe Coding IDE UI, Auto-git commit

Claude 3.5 or 3.7: Which Vibe Wins?: Members discussed the differences between Claude 3.5 and 3.7, with some finding 3.7 too eager and prone to doing 20 more things I didn't ask for.
- Others intentionally use the two different models and workflows for coding and debugging - with one finding vibe debugging difficult.
Levels.io Flight Simulator Goes Viral: A member referenced the success of Levels.io's flight simulator, built with Cursor, which quickly reached $1 million ARR by selling ads in the game.
- Levelsio noted, AI really is a creativity and speed maximizer for me, making me just way more creative and more fast.
GitDoc extension commits on save: Members shared the GitDoc VS Code extension that allows you to edit a Git repo and auto commit on every change.
- One user said storage is cheap, like auto commit on every change and visualize the tree of changes and suggested branching, restarting and other features.
UI Innovation needed for Vibe Coding IDEs: Members discussed that traditional IDEs may not be the right UI for vibe coding, suggesting the need for a visualization of the tree of changes as prompted by different chats.
- This would allow users to easily revert to previous states and experiment with branching.
Enterprise AI Dev Team Enablement: A member offered to discuss enterprise AI dev team enablement, focusing on the hurdles and red tape involved with adopting tools like Cursor in large organizations.
- Some expressed interest in learning about the challenges of integrating AI into corporate development workflows.

Links mentioned:

LM Studio ▷ #general (92 messages🔥🔥):

Download LM Studio runtimes, Snapdragon X Plus support, Gemini Vision Capabilities, AI Chess Tournament, VRAM usage for Gemma 3

Rummaging for Runtimes: User Decompiles LM Studio to Find Download URLs: A user sought to download LM Studio runtimes for offline use, asking where the program downloads them from, after initially being told by another user that LM Studio doesn't need an internet connection to run.
- The user decompiled the app and located the runtime URLs, including the backends master list and CDN "APIs" like the Runtime Vendor.
Snapdragon Snags: LM Studio's Compatibility Conundrums: A user inquired about LM Studio's support for Snapdragon X Plus, reporting that LM Studio did not detect their GPU and another member replied that, for GPU support, you need to run llama.cpp directly with this github.com/ggml-org/llama.cpp/pull/10693.
Gemini's Geography: Location Locks Limit Vision Functionality: A user requested assistance in testing Gemini 2.0 Flash Experimental's ability to process images, noting potential regional restrictions in Germany/EU, because it doesnt seem to work in germany (maybe EU, because of our laws here i guess?).
- A user in the US tested Gemini in AI Studio and the Gemini app, finding that it failed to perform the requested image manipulation.
Checkmate Craze: AI Chess Tournament Showcases Model Accuracy: An AI chess tournament was conducted, featuring 15 models competing against each other, with results and details available at dubesor.de/chess/tournament.
- One user noted that DeepSeek-R1 had a 92% accuracy, but the organizer clarified that accuracy varies based on game length and opponent moves, and normal O1 was too expensive to run in the tournament.
VRAM Vampires: Gemma 3's Appetite Increases Post-Update: A user reported that their VRAM usage for Gemma 3 had significantly increased following the vision speed increase update.
- Another user speculated the download size increase may be due to CLIP used for vision being in a separate file, and they thought it might not be embedded in the downloads but called from a separate file when uploaded to LM Studio.

Links mentioned:

LM Studio ▷ #hardware-discussion (44 messages🔥):

memtest_vulkan, H100 rental t/s, Corsair product quality, 4090 vs A6000, RTX8000

Memtest Vulkan for VRAM stability: A member suggested using memtest_vulkan, a Vulkan compute tool, for testing video memory stability.
Renting H100 for token speed tests: A member rented an H100 to measure the tokens per second achieved with models like gemma3 27B.
Corsair's diminishing product quality reported: Users reported that Corsair's product quality has declined in recent years.
- One user reported that a Corsair RAM kit was DOA (Dead On Arrival) when switching to a 9800X3D.
A6000 over modded 4090 for reliability: Members debated buying a local used A6000 for $3500 versus a Chinese-modded 4090 48GB for $4100 on eBay.
- The consensus favored the A6000 for its manufacturer guarantee and known reliability, calling the 4090 a gamble.
RTX8000 a viable alternative: A member noted that two RTX8000 48GB could be acquired at the same price as the A6000, if just memory capacity is needed.
- However, another warned that the RTX8000 uses an older Turing architecture, potentially causing issues with newer image generation models and training, but likely ok for pure LLM inference.

Link mentioned: GitHub - GpuZelenograd/memtest_vulkan: Vulkan compute tool for testing video memory stability: Vulkan compute tool for testing video memory stability - GpuZelenograd/memtest_vulkan

Nous Research AI ▷ #general (120 messages🔥🔥):

ElizaOs API framework, Helius API key pricing, Quicknode API key pricing, DeepHermes-3-Mistral-24B-Preview-4bit MLX, Hermes-3-Llama-3.1-70B-FP8 vllm args

DeepHermes 3 Gets MLX Conversion: The model mlx-community/DeepHermes-3-Mistral-24B-Preview-4bit was converted to MLX format from NousResearch/DeepHermes-3-Mistral-24B-Preview using mlx-lm version 0.21.1.
Deep Dive into VLLM Arguments for Hermes 3: Members are sharing different configurations to get vllm working correctly with Hermes-3-Llama-3.1-70B-FP8, including suggestions like adding --enable-auto-tool-choice and --tool-call-parser for Hermes 3 70B.
- One member noted the need for <tool_call> and </tool_call> tags in the tokenizer, which are present in Hermes 3 models but not necessarily in DeepHermes.
Vultr's Very Alpha Inference Pricing: A member from Vultr shared the official pricing for their inference API, which includes $10 for 50 million output tokens initially, then 2 cents per million output tokens after.
- It was further explained that this stems from purchasing an absurd amount of gh200s and needing to do something with them, offering an OpenAI-compatible endpoint at https://api.vultrinference.com/.
Dynamic LoRAs Docking into VLLM: Members discussed the possibility of hosting dynamic LoRAs with vllm for various use cases, like up-to-date coding styles.
- It was suggested to let users pass in their huggingface repo IDs for the LoRAs and supply them into the VLLM serve command cli args, and there is a link to the vLLM documentation.

Links mentioned:

MCP (Glama) ▷ #general (90 messages🔥🔥):

MCP for Astro clients, MCP Servers & Architecture, Gitlab MCP server on Windows 11, Agentic Coder Conversion to MCP, Multi-Agent Systems (Swarm vs Mesh vs Sequence)

Astro Clients gear up for MCP: A member is planning to use MCP for their Astro clients, allowing MCP usage specifically for customers and exploring the possibility of adding MCP servers to a single project for client visibility.
- They are considering using AWS API Gateway with each MCP server as a Lambda function, leveraging the MCP bridge with the SSE gateway.
MCP Server Architecture Questioned: A member inquired about MCP server architecture, noting that clients like Cursor and Cline keep the MCP servers on the client side and asking how these communicate with the backend.
- The member was directed to a specific channel for more information.
Smart Proxy Server: MCP's Agentic Sub-Routing: Members discussed creating a "smart proxy" MCP server that simplifies standard MCP servers with many tools into one with a single tool using natural language, converting it into specific tool calls via its own LLM.
- It's a sub-agent approach that uses vector tool calling to make individual agents have a subset of tools and the OpenAI Swarm framework follows a similar process.
Debugging Webpages via MCP Server: A member shared their project, a debugger that uses MCP to debug webpages with LLMs, which was originally built with Puppeteer and later ported to Playwright: chrome-debug-mcp.
- The member is still testing the Playwright version and plans to update the GitHub repository after.
Swarm vs Mesh Multi-Agent System: The discussion included alternative methods to the hierarchical agent system, and the user was pointed to resources like Swarm vs Mesh vs Sequence architectures, emphasizing how the Swarm framework hands off the single thread of execution between agents.
- It was noted that OpenAI now supports and maintains the swarm concept, rebranding it as openai-agents.

Links mentioned:

MCP (Glama) ▷ #showcase (3 messages):

MCP server management, Awesome Vibe Coding

MCP Hub Concept Born!: To address concerns about enterprise adoption of MCP, a member built an MCP Hub concept with a dashboard to simplify server connections, control access permissions, and provide visibility across MCP servers, as shown in this video.
Awesome Vibe Coding List Launched!: A member announced Awesome Vibe Coding, a curated list of tools, editors, and resources that enhance AI-assisted coding, available on GitHub.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (57 messages🔥🔥):

ZIRP Era Regret, AI Startup Valuations, DeepSeek Passport Confiscation, Long Context Evaluation Challenges, Xet Data Chunking Technology

OpenAI and Anthropic Shed Tears for ZIRP Era: Members joked that OpenAI and Anthropic are regretting not being in the ZIRP era, referencing a Gigachad meme.
- Another member quipped that all AI startups raise at valuations that make ZIRP feel like kindergarten.
NYT peers into AGI Future: A New York Times article from the future (March 14, 2025) suggests that AI systems have started surpassing humans in domains like math, coding, and medical diagnosis.
- The article anticipates that one or more AI companies will achieve general superhuman intelligence in 2026 or 2027, or possibly as soon as this year.
DeepSeek Asks Employees To Fork Over Passports!: It was reported that DeepSeek's owner asked R&D staff to hand in passports so they can't travel abroad, according to Amir on Twitter.
- Some members speculated whether this would lead to DeepSeek work remaining open source, or whether the US would ever take similar measures with frontier company employees.
Xet Uses Content-Defined Chunking (CDC): Xet uses Content-Defined Chunking (CDC) technology to intelligently break files into unique chunks, as mentioned in their HuggingFace Join page.
- A member asked how it was different from fast transfer, another member replied that they're different technology, and that fast transfer still uses Git LFS.
Math-500 Sampling Questioned and Validated: A member asked why the math-500 sampling was random in Qwen's github repo for evaluation scripts.
- Another member replied that it was apparently random and quoted Lightman et al 2023, also they cited that long context evals and answer extraction is a headache and that Math 500 is very well correlated.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (16 messages🔥):

Invasion of Privacy, Claude's Birthday, Claude Code Vim mode, Gemma 3 licensing issues

Phone Number Fiasco Frustrates: A member expressed frustration about people finding their phone number online and requesting favors such as "hey nato, can you post-train my llama2 model? ty".
- They attribute this to extensions or paid services and are seeking ways to remove their information from sites like Xeophon.
Claude Celebrates Another Year: Members celebrated the second birthday of Claude, referencing the original announcement two years ago.
- Another member highlighted new features for Claude Code, including Vim mode activated by typing the slash command /vim.
Gemma 3's License Provokes Concerns: A TechCrunch article mentioned a member's work regarding model licenses, specifically in relation to Google's Gemma 3.
- The article discusses how Gemma 3's license, while praised for efficiency, poses commercial use risks due to restrictive and inconsistent terms.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (5 messages):

Mid-Training Analysis, SF Compute H100s, SF Compute CLI

Deep Dive into Mid-Training Analysis: A member shared a link on fxtwitter prompting discussion about the nuances of mid-training analysis.
- The discussion was brief, with a single follow-up question.
SF Compute's Surprisingly Low H100 Prices: A member highlighted that SF Compute offers surprisingly low prices for H100s, particularly for short-term rentals, noting the availability of 128 H100s for just an hour.
- They had previously encountered a confusing placeholder page, before the domain was properly configured.
SF Compute Launches 2,000 Additional H100s Soon: San Francisco Compute Company is launching soon an additional 2,000 H100s and runs a market for large-scale, vetted H100 clusters.
- SF Compute supports users like a traditional cloud and also has a simple but powerful CLI available.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #rl (11 messages🔥):

GRPO implementation, KL penalty, RLHF algorithms

GRPO Implementation Trick: KL Penalty Applied in Loss: A member discussed a GRPO implementation trick where the KL penalty is applied directly in the loss rather than when the reward is computed, noting that its impact is not well-understood and linking to the RLHF Book.
- The member also shared a link to their question on X/Twitter asking for intuitions or ablations on this approach.
RLHF Algorithms Popularity Over Time: The most popular algorithms used for RLHF has evolved over time; initially, a variant of PPO was used with ChatGPT, but research has shown promise in REINFORCE style algorithms, such as Ahmadian et al. 2024 and Wang et al. 2024.
Focusing on Reward Signals via KL Penalty: A member suggested applying the KL penalty in the loss might help the model focus on reward signals, but the member also noted that it should end up being equivalent to applying it when the reward is computed.
- Another member guessed the PG term will maximize the reward, so the loss minimum should be the same for both versions, but the dynamics could still be different.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Cohere Command A, Jamba 1.6 Large, Jamba 1.6 Mini, Gemma 3 models, Anthropic incident

Cohere Commands Attention: A new open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases, Cohere Command A is now available.
AI21 Jamba Jams with New Models: AI21 released Jamba 1.6 Large featuring 94 billion active parameters and a 256K token context window, while also launching Jamba 1.6 Mini, with 12 billion active parameters, both supporting structured JSON output and tool-use capabilities.
Gemma Gems Gleam for Free: All variations of Gemma 3 are now available for free: Gemma 3 12B introduces multimodality, supporting vision-language input and text outputs, handling context windows up to 128k tokens and understands over 140 languages, as well as Gemma 3 4B and Gemma 3 1B.
Anthropic API Anomaly Averted: Anthropic declared an incident of elevated errors for requests to Claude 3.7 Sonnet, with updates posted on their status page.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (67 messages🔥🔥):

OR ChatGPT model, OpenRouter model icons, Deepseek v3 issues, OLMO-2, Cohere repetition penalties

ChatGPT-4o-latest Price Higher Than Expected: The chatgpt-4o-latest model is up to date but is slightly more expensive than the normal 4o model.
OpenRouter Model Icons Not Available via API: The icons for the models are not available in the /api/v1/models response, instead using website favicons.
Deepseek v3 Model Gives Zero Token Issues: Sometimes the inference stack just returns zero completion tokens, and OpenRouter still gets charged by the upstream provider.
OLMO-2 Model Hosted on OpenRouter: OLMo-2 is coming online through Parasail; someone will spin it up and notify OpenRouter to add it.
AI Chess Tournament Hosted with OpenRouter: An AI chess tournament featuring 15 models was created, models are fed information in standard chess notations about the board state, the game history, and a list of legal moves to compete against each other here.

Links mentioned:

Yannick Kilcher ▷ #general (59 messages🔥🔥):

Rust vs. Go for porting, DeepSeek Hype, OLMo 2 32B, ChatGPT Overrated, Code generation quality: Grok 3 vs Mistral vs OpenAI

Go beats Rust for practical porting: Members discussed why a port wasn't being done in Rust, and one member explained it's because a function-by-function port to Go allows for exact behavior parity, avoiding years of rewriting and dealing with Rust's no-GC and lifecycle annotations.
- It was added that Rust is faster and more efficient, but it's really not as big a difference as people make it out to be in practice, and Golang is really ergonomic to develop in, particularly for distributed, async, or networked applications.
DeepSeek hype is engineered: Some members suggested that the hype around DeepSeek is engineered, arguing that their models are simplified and not on par with frontier AI models, comparing them to comparing Ananas with Apple.
- Another member countered that the hype was driven by the crazy engineers at DeepSeek who developed a filesystem faster than life.
OLMo 2 32B is a fully open model: OLMo 2 32B was released, and described as the first fully-open model to outperform GPT3.5-Turbo and GPT-4o mini on academic benchmarks.
- It is claimed to be comparable to leading open-weight models while requiring only a fraction of training compute, costing only one third of the training cost of Qwen 2.5 32B.
ChatGPT is overrated, use Claude: One member expressed that ChatGPT is overrated because it actually doesn't solve problems that I need solved, preferring Mistral Small 24B, QwQ 32B, and Claude 3.7 Sonnet.
- Another user shared, I've had better luck getting what I want from Claude, and seems better at understanding intention and motivation for whatever reason.
Grok 3 for code generation: Members debated code generation qualities, mentioning that OpenAI models often generate legacy code, while Mistral can refactor it into more modern code, and Grok 3 generates code that looks like a professional programmer wrote it.
- In VSCode, one member prefers using Amazon Q over Copilot.

Links mentioned:

Yannick Kilcher ▷ #ml-news (1 messages):

@erkinalp:

.ogeneral: I would say neither

GPU MODE ▷ #general (3 messages):

Speech-to-Speech Generation, Moshi by Kyutai Labs, Hertz-dev by Standard-Intelligence

Speech-to-Speech Model Quest Begins: A member is looking for speech-to-speech generation models, focusing on conversational speech without multimodal input, distinguishing it from models like OpenAI Realtime API or Sesame AI.
- The member seeks a standalone model rather than a multimodal one that accepts both text and audio.
Moshi Model Sounds off.: Moshi from Kyutai Labs is a speech-text foundation model and full-duplex spoken dialogue framework.
- It utilizes Mimi, a state-of-the-art streaming neural audio codec.
Hertz-dev Model is developed.: Hertz-dev from Standard-Intelligence is the first base model for full-duplex conversational audio.

Links mentioned:

GPU MODE ▷ #triton (3 messages):

tl.int1 masks in Triton, tl.advance negative offsets, Triton Windows upgrade to 3.2

Optimize Triton masks with tl.int1?: A member asked if there's any performance or functional benefit to explicitly converting masks to tl.int1 when using tl.load in Triton.
- No response was provided.
tl.advance accepts negative offsets?: A member inquired whether tl.advance in Triton accepts negative offsets for pointer arithmetic.
- No response was provided.
Windows Triton Upgrade Woes?: A member seeks validation on steps to upgrade Triton from 3.1 to 3.2 on Windows, particularly concerning PyTorch and cache clearing. They linked to this repo.
- They were using Python 3.10 + CUDA 12.5 and ComfyUI’s python_embedded: Python 3.12 + PyTorch 2.5.1+cu124 + Triton 3.1

Link mentioned: no title found: no description found

GPU MODE ▷ #cuda (6 messages):

cuda::memcpy_async, A100, global vs shared memory

cuda::memcpy_async Experiments on A100: A member experimented with cuda::memcpy_async on A100s using examples from the CUDA documentation.
- They observed that the kernel with memcpy_async took slightly longer to run and inquired about the reason for this unexpected behavior.
Async Copies: Global vs Shared Memory: A member clarified that async copies can only transfer data between global and shared memory.
- They explained that the advantage of async copies lies in overlapping memory loading with other computations, requiring values to be loaded from shared memory to be utilized.

GPU MODE ▷ #off-topic (1 messages):

Block Diffusion, Autoregressive Models, Diffusion Models, ICLR 2025

Block Diffusion Model Interpolates Autoregressive and Diffusion LMs: The new Block Diffusion model aims to combine the benefits of both autoregressive and diffusion language models, detailed in a paper accepted as ICLR 2025 Oral presentation.
- It achieves high quality, arbitrary-length generation, KV caching, and parallelizable processing, addressing the limitations of existing models; code available on GitHub and HuggingFace.
Autoregressive Models: Autoregressive models boast high quality output and arbitrary-length generation along with Key-Value (KV) caching.
- However, autoregressive models suffer from being non-parallelizable.

Link mentioned: SOCIAL MEDIA TITLE TAG: SOCIAL MEDIA DESCRIPTION TAG TAG

GPU MODE ▷ #tilelang (2 messages):

Dynamic Shapes, Segmentation Fault

Dynamic Shapes spark Segmentation Fault: A member opened an issue about a segmentation fault with dynamic shapes.
- They added, maybe my understanding is wrong.
Tilelang faces Dynamic Shape challenge: An issue was raised concerning segmentation faults encountered when working with dynamic shapes in Tilelang.
- The reporter of the issue expressed uncertainty, stating that their understanding of the problem might be incorrect.

Link mentioned: segmentation fault with dynamic shapes · Issue #215 · tile-ai/tilelang: # Copyright (c) Microsoft Corporation. # Licensed under the MIT License. from tilelang import tvm as tvm import tilelang.language as T import tilelang.testing import tilelang import torch def matmu...

GPU MODE ▷ #liger-kernel (1 messages):

Gemma3, LigerKernel, RMSNorm

Gemma3 Implementation Drafted into LigerKernel: A member has started a new challenge for themself, drafting an implementation of Gemma3 into LigerKernel, and shared a link to the pull request.
- The member being fairly new to programming is asking for help and feedback on the draft.
Gemma3 and Gemma2 share similarities: According to the pull request, Gemma3 has high similarities to Gemma2 with some differences in RMSNorm Calls.
- The changes enable patching the Text Parts of Gemma3 with Liger kernels.

Link mentioned: Adding Support for Gemma3 by DRXD1000 · Pull Request #606 · linkedin/Liger-Kernel: SummaryGemma3 has high similarities to Gemma2 with some differences in RMSNorm CallsThis change enables patching the Text Parts of Gemma3 with Liger kernels.Testing DoneHardware Type: AMD ...

GPU MODE ▷ #self-promotion (2 messages):

Triton bitpacking, Gemlite, GTC CUDA content

Triton bitpacking goes Vroom Vroom: Bitpacking in Triton achieved significant speed-ups versus the Pytorch implementation on the 4090, achieving 98x speedup for 32-bit packing and 26x for 8-bit packing.
- Re-packing a Llama3-8B model time was reduced from 49 sec -> 1.6 sec using the new bitpacking implementation.
Gemlite's bitpack Implementation Unleashes Performance: A member spotlighted their work on optimizing bitpacking using Triton within the Gemlite project, with a link to the relevant code on GitHub.
- The optimization allows for fast low-bit matmul kernels.
GTC CUDA content Incoming: A member shared information regarding CUDA content at GTC, to create high-performance, GPU-accelerated applications with NVIDIA CUDA.
- They shared some images of Plask and CERN which were featured at the conference.

Links mentioned:

GPU MODE ▷ #reasoning-gym (31 messages🔥):

Gemma 3 support in vLLM, Group Relative Policy Optimization (GRPO), veRL Training for reasoning-gym, composite configurations in reasoning-gym, curriculum training

vLLM Gets Go-Ahead for Gemma 3: Members discussed adding Gemma 3 support to vLLM, referencing this GitHub issue.
- One member reported experiencing issues with context window size while using Gemma3 with TGI, suspecting a problem in the underlying transformers implementation.
Group Relative Policy Optimization Grows Rapidly: Members discussed how Group Relative Policy Optimization (GRPO) has become popular for Reinforcement Learning in Large Language Models.
- A blog post from oxen.ai on GRPO VRAM requirements was shared, noting its effectiveness in training, with a link to the DeepSeek-R1 paper included.
veRL Voyages Victorious: A member confirmed that veRL training is working for chain_sum with the newest veRL using the changes in this branch.
- The change was merged into the main branch.
reasoning-gym Readies Refactor for Reasoning: Members discussed training models using a single RG dataset generator versus a composite of multiple, leaning towards the latter for improving "all-around" reasoning capabilities.
- They plan to test with a small composite of 3-5 datasets, referencing the composite dataset code and the curriculum status for datasets in GALLERY.md.
reasoning-gym Rockets Past 500 Stars: It was mentioned that reasoning-gym v.0.1.16 was uploaded to pypi and the project is nearing 500 stars.
- A member posted a picture celebrating.

Links mentioned:

GPU MODE ▷ #general (7 messages):

verl session, tilelang submission, pip install tilelang

Demand for Verl Session: A member inquired about the timeline for a session on Verl, along with the possibility of submitting tilelang.
Installing tilelang via pip: A member stated that users can install any package via pip from a script, providing an example script to install tilelang.
- They cautioned that the installation is fairly long and might lead to timeouts and unnecessary work for the machines.

GPU MODE ▷ #submissions (3 messages):

Leaderboard Submissions, Grayscale Leaderboard, Conv2d Leaderboard, H100 GPUs, Modal Runners

Grayscale Leaderboard sees New Submissions: Three new leaderboard submissions were made to the grayscale leaderboard.
- The submissions with IDs 2013 and 2015 used H100 GPUs and Modal runners.
Conv2d Leaderboard Populated: A new leaderboard submission was made to the conv2d leaderboard.
- The submission, with ID 2014, also utilized H100 GPUs and Modal runners.

OpenAI ▷ #ai-discussions (46 messages🔥):

Declining intelligence, impact of technology and smartphones, food additives and cognitive decline, Deepseek models distillation, ADHD diagnosis rates

Intelligence Declines Stir Debate!: Discussion sparked from a Financial Times article that average intelligence is dropping in developed countries, citing increased reports of cognitive challenges and declining performance in reasoning and problem-solving.
- One member theorized this could be due to technology, especially smartphones and social media, leading to outsourcing of thinking, however the graphics only showed the years really before ChatGPT became a thing.
Is Tech to Blame for Brain Drain?: Members debated potential causes for cognitive decline, including technology's influence, immigration, and fluoridated water.
- One member pointed out that the rates of cognitive challenges were steadily increasing since the 1990s, and a sudden acceleration from around 2012.
Food Additives Linked to Mental Slowdown?: Members discuss that the availability and consumption of ultra-processed foods (UPFs) has increased worldwide and represents nowadays 50–60% of the daily energy intake in some high-income countries, and is linked to cognitive decline
- Another member mentions Multinational corporations such as Nestlé that operate in many countries produce and distribute worldwide, it seems understandable how different additives or changes made to these products in one of these corporations can have a worldwide impact.
DeepSeek's Model Origin Mystery: The discussion covers that Deepseek V3 (the instruct version) was likely distilled from OpenAI models.
- One member notes that even OpenAI unofficially supports distilling their models, they just don't seem to like it when Deepseek does it.
TikTok Brain Shortens Attention Spans: Members believe platforms like TikTok and Instagram affect our brains by delivering a constant emotional impressions in an extremely short time.
- The result is a kind of addiction where we continuously seek more stimulation.

OpenAI ▷ #gpt-4-discussions (1 messages):

Claude Sonnet 3.7, o3-mini-high vs o1

Claude Sonnet 3.7 New Fav for Coding: A member now uses Claude Sonnet 3.7 exclusively for coding, finding ChatGPT lagging behind.
o3-mini-high Model Beats o1: A member stated that the o3-mini-high model is better than o1.

Notebook LM ▷ #use-cases (4 messages):

Gemini 2.0 Deep Research, NotebookLM, PhytoIntelligence framework

Gemini 2.0 Deep Research Joins NotebookLM: Members considered pairing Gemini 2.0 Deep Research with NotebookLM to create a strong tool for documentation fetching and processing.
- The aim is to containerize material without exceeding provided boundaries, questioning whether Deep Research could eventually replace NotebookLM.
NotebookLM Inspires African Project Ecokham: A member from Africa is using NotebookLM to connect thoughts, edit and customize roadmaps, and generate inspiring audio for his project, Ecokham.
- He expressed gratitude for NotebookLM's assistance in inspiring him and his team.
NotebookLM Prototyping PhytoIntelligence Framework: A member is using NotebookLM to organize notes and prototype the PhytoIntelligence framework for autonomous nutraceutical design.
- This framework aims to mitigate diagnostic challenges, and the user thanked Google for NotebookLM's capabilities.

Notebook LM ▷ #general (41 messages🔥):

Image and table recognition in Notebook LM, Notebook LM Mobile App, Notebook LM Language Settings, Public Notebook Sharing, Google Sheets Integration

NotebookLM Craves Image & Table Savvy: Users are clamoring for image and table recognition in Notebook LM, emphasizing that the current state feels "half-baked" due to the need to constantly reopen source files; one user even shared a relevant cat GIF.
- They believe image recognition is worth "a thousand words" and the clearest data comes from source tables and google sheets.
NotebookLM Mobile App: Many users are requesting a mobile app version of NotebookLM.
- Currently the community feels a mobile version is "still not yet coming up".
User reports recurrent system errors in Notebook LM: A user reported a recurring "The system was unable to answer" error, happening for the second time this week.
- Other users tested the issue and couldn't reproduce it.
URL tricks to change NotebookLM language: A user asked how to change the language of NotebookLM, another user shared the tip to add ?hl=LANGUAGE_CODE at the end of the URL (e.g., hl=es for Spanish).
- One user confirmed they are in France.
Notebook Public Sharing: A user inquired about plans for public sharing of Notebooks.
- A member responded that fully open access is unlikely but sharing with restrictions is currently possible with corp or Gmail accounts.

Link mentioned: Cat Wait GIF - Cat Wait Im - Discover & Share GIFs: Click to view the GIF

LlamaIndex ▷ #blog (1 messages):

Google Gemini, Google Vertex AI, Unified @googleai integration, Streaming, Async

Google Gemini and Vertex AI Merge!: A unified @googleai integration now supports both Google Gemini and Google Vertex AI, according to this Tweet.
- The integration supports streaming, async, multi-modal, and structured prediction, even supporting images.
Even More Google Gemini and Vertex AI Benefits!: A unified @googleai integration now supports both Google Gemini and Google Vertex AI, with even more benefits!
- The integration supports streaming, async, multi-modal, and structured prediction, even supporting images.

LlamaIndex ▷ #general (8 messages🔥):

LlamaIndex vs Langchain, OpenAI delta events for tool calling, Agentic RAG applications

Debate on LlamaIndex Perks Over Langchain Erupts: A member inquired about the advantages of LlamaIndex over Langchain.
- However, there was no discussion or answer provided in the given context.
OpenAI's Delta Event Absence for Tool Calling Probed: A member asked why OpenAI models don't emit delta events for tool calling, noting that they are emitted but empty.
- Another member explained that tool calling cannot be a stream because the LLM requires the complete tool response to generate the next response, with the recommendation to build your own stream.
Inquiry on Agentic RAG App APIs Surfaces: A member asked if there is any API focused on building Agentic RAG applications to simplify the process of creating and managing the application.
- The discussion explores the preferred way for building agentic apps in LlamaIndex, pointing out the evolution and multiple constructs available, but lacks a definitive guide.

Nomic.ai (GPT4All) ▷ #general (7 messages):

Gemma 3 12B, Qwen 2.5 Coder, LM Studio, Multimodal Models, Water Freezing Experiment

Gemma 3 12B Surpasses Qwen in Intelligence: A user finds that the Gemma 3 12B model surpasses Qwen 14B and even Qwen 32B in terms of intelligence on their computer.
- The user noted that Gemma 3 12b has calls for another additional gguf file, maybe that's why it won't work with GPT4All.
Gemma 3 Requires GPT4All Updates: A user notes that GPT4All may need updates to support Gemma 3 12B due to its different architecture from Gemma 2 and the need for an mmproj file.
- Another user jokes that all use models that are 1 day old an expect that all worls fine, highlighting the rapid pace of AI model development.
Language Comprehension Test: Gemma 3 Excels: A user tested various models, including Gemma 3 12B, DeepSeek R1, and Qwen 2.5 Coder, by asking a question in multiple languages.
- The user found that Gemma 3 and DeepSeek R1 consistently provided correct and comprehensive answers in the same language as the question.
Water Freezing Experiment Yields Varied Responses: When asked about the outcome of freezing a jar of water, DeepSeek-R1 indicated the jars would break.
- Gemma-3-12b correctly responded that a full jar of water outside in sub-freezing temperatures overnight will almost certainly result in the jar shattering or cracking due to the expansion of water as it freezes.

DSPy ▷ #general (6 messages):

Explicit Feedback in dspy.Refine, Manual Feedback Implementation in Refine, Reward Function Return Value for Feedback

Explicit Feedback beckons Refine's return: A member inquired about integrating explicit feedback into dspy.Refine, akin to dspy.Suggest from earlier versions, to clearly indicate areas for improvement beyond a reward function threshold.
- The member noted that explicit feedback was very helpful for debugging and understanding mistakes.
Manual Feedback makes marvelous move into Refine: A team member confirmed that manual feedback is being added into Refine.
- The proposed implementation involves including feedback in the return value of the reward function as a dspy.Prediction object, containing both a score and feedback.
Reward Function becomes Feedback Fountain: The team member asked if it would be acceptable for feedback to be part of the return value of the reward_fn.
- The user responded that it was perfect, thanking the team member.

Cohere ▷ #「💬」general (5 messages):

Command A, OpenRouter Integration, Prime Number Bug, Local API Performance

Command A goes Live on OpenRouter: Cohere's Command A, an open-weights 111B parameter model with a 256k context window, is now live on OpenRouter.
Prime Number Puzzle Plagues Command A: Users found that Command A either returns the wrong answer or enters an infinite loop when asked about prime numbers with digits summing to 15.
Local API struggles with Command A: Users reported patching the modeling in ITOr or using the APIs is needed because local setups do not reach proper speeds even with sufficient VRAM.

Link mentioned: Command A - API, Providers, Stats: Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases.Compared to other leading propri...

Cohere ▷ #「🔌」api-discussions (1 messages):

michael: it does, use the https://api.cohere.com/compatibility/v1/chat/completions base_url

Modular (Mojo 🔥) ▷ #general (3 messages):

Discord Scam Account, Account Impersonation

Discord Account Impersonation!: A member reported being messaged by a scam account impersonating another user.
- The impersonated user confirmed that "That is not me. This is my only account" and that the fake account has been reported to Discord and banned from the server.
User Warns of Impersonation Scam: A user alerted the community to a scam account messaging them, impersonating another user on Discord.
- The user clarified that their only legitimate account is "caroline_frasca" and that the imposter account had already been reported to Discord and banned from the server.

Modular (Mojo 🔥) ▷ #announcements (1 messages):

Discord impersonation, Discord account security

Caroline Frasca imposter spamming DMs: A spam account named caroline_frascaa with the nickname Caroline has been DM'ing folks while impersonating a real user.
- The user posted a screenshot of the fake account and updated their profile to help others easily identify the real account.
Discord server bans impersonating account: The impersonating account caroline_frascaa has been reported to Discord and banned from this server.
- Discord admins encourage reporting of any impersonating accounts.

Modular (Mojo 🔥) ▷ #mojo (1 messages):

soracc: Yea, we use it in the stdlib (e.g. in base64) as well.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

Self-Evaluation, Self-Reflection, Self-Refinement, Oracle Feedback

Self-Reflection Needs External Evaluation: A member asked how statements about self-evaluation from the first lecture correspond to the second, as they seemed to contradict each other.
- They pointed out that the first lecture mentioned that self-reflection and self-refinement work with good external evaluation, and that self-correction without oracle feedback hurts reasoning performance.
Clarification on Self-Evaluation in Lectures 1 & 2: The user sought clarification on the apparent contradiction between the lectures regarding self-evaluation.
- Specifically, they noted the emphasis on self-evaluation and improvement in the second lecture, while the first lecture highlighted the importance of external evaluation and the potential harm of self-correction without oracle feedback.

AI21 Labs (Jamba) ▷ #general-chat (1 messages):

Vertex, AWS

Vertex Gets Ready for 1.6: Version 1.6 is not yet available on Vertex, but is coming soon.
- It will also be available on other platforms like AWS.
AWS Access to 1.6: Version 1.6 will be available on other platforms like AWS in the near future.
- This should allow customers on AWS to access the new features.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}