AI News for 3/31/2025-4/1/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7148 messages) for you. Estimated reading time saved (at 200wpm): 719 minutes. You can now tag @smol_ai for AINews discussions!

people were mostly smart enough not to launch things on april fools'.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Open Source Models and Releases

OpenAI's upcoming open-weight language model: @sama stated OpenAI will not impose restrictions like preventing usage if a service exceeds 700 million monthly active users. @LiorOnAI noted that OpenAI is planning to release their first open-weight model since GPT-2 in the next few months. @ClementDelangue welcomes OpenAI's willingness to share open weights, hoping it leads to a golden age of AI progress. @snsf mentioned the open weight model coming in the next few months.
DeepSeek's Open-Source R1 Model: @scaling01 reports that OpenAI's commitment to releasing an open-weight language model is a response to DeepSeek's R1 model launch on January 20, 2025, which challenges the notion that China lags in AI development.
License and Usage of Open Source Models: @cognitivecompai defended that someone merely stated that a license was silly and that he's not going to do that.

Model Performance and Benchmarks

Gemma Model Performance: @osanseviero announced that Gemma 3 can do function calling and is now on the Berkeley Function-Calling Leaderboard. @jack_w_rae noted Gemini's rate of progress in math is amazing to see, driven by talented researchers, observing the uplift on HMMT.
GemmaCoder3-12b: @ben_burtenshaw introduced GemmaCoder3-12b, a code reasoning model that improves performance on the LiveCodeBench benchmark by 11 points, highlighting its ability to run on 32GB of RAM, its 128k context length, and the option to activate thinking via the chat template.
Qwen 2.5 Models: @TheTuringPost highlights Alibaba_Qwen's Qwen2.5-Omni, which understands any types of input and introduces a two-part Thinker-Talker system and TMRoPE feature to create responses in both text and natural speech.
@vipulved reported that the TogetherCompute inference team achieved 140 TPS on a 671B parameter R1 model, which is ~3x faster than Azure, and ~5.5x faster than DeepSeek API on Nvidia GPUs.

AI Product and Tool Releases & Updates

ChatGPT and OpenAI: @kevinweil announced that the new image generation in ChatGPT is now available to 100% of free users. @OpenAI announced the release of a new voice in ChatGPT.
Runway Gen-4: @TomLikesRobots shares excitement about Gen-4 for animating miniature diorama-style generations, praising its movement interpretation and style maintenance.
LangChain: @LangChainAI introduced using LangGraph's pre-built computer use agent through a chat-based generative UI.
Figure 03 Humanoid Robots: @adcock_brett discussed the first commercially deployed humanoid robots, highlighting full autonomy, real-world integration at BMW, fleet data for better pretraining, and BotQ manufacturing scaling.
Other Tools: @juberti highlighted the new OpenAI realtime transcription API now supports WebRTC connections. @TheRundownAI mentioned Amazon’s Nova Act AI browser agent.

AI Research and Studies

Efficient Reasoning for LLMs: @omarsar0 shared a survey focusing on reasoning economy in LLMs, analyzing how to balance deep reasoning performance with computational cost.
Stanford's Tutor CoPilot: @DeepLearningAI reports that Stanford researchers developed Tutor CoPilot, a GPT-4-powered tool that assists online tutors.
AI-driven Automation and Economic Implications: @EpochAIResearch discussed that AI investments might seem huge, but global wages add up to over $70 trillion.

Hugging Face and Gradio

Gradio Usage: @ClementDelangue announced that Gradio just crossed 1,000,000 monthly developers using it in March.

Humor/Memes

Sarcasm and April Fools' Jokes: @sama joked that "-restart-0331-final-final2-restart-forreal-omfg3" is gonna hit, i know it. @vladquant jokingly announced that after strategic review, Kagi is now Kagibara.

AI Reddit Recap

/r/LocalLlama Recap

1. LLM Mathematical Reasoning Limitations

Olympiad Obstacle: Top Models Falter: A research paper revealed that state-of-the-art LLMs like O3-MINI and Claude 3.7 scored less than 5% on the 2025 USA Mathematical Olympiad (USAMO), despite being trained on extensive mathematical data including previous olympiad problems.
- The study highlighted significant issues with the models' logical reasoning, creativity, and self-evaluation capabilities, with LLMs overestimating their own scores by up to 20x compared to human graders. Community discussion pointed to the need for specialized proof-focused benchmarks and integration with formal proof tools like Lean or Coq.
Formal Proof Progress: Pioneering Paths Forward: Reddit users discussed ongoing research efforts in automated theorem proving, with users sharing links to Google's AlphaProof and several open-source projects from Princeton, Stanford, and Huawei focusing on formal mathematical proofs.
- The discussion highlighted the challenges of formalizing mathematics, with users suggesting that future AI systems might combine strict formalized symbolic logic with diffusion-like processes for concept discovery. Many agreed that current LLMs need specialized tools and training to excel at mathematical reasoning rather than just answer prediction.

2. DeepMind Research Publication Strategy

Six-Month Secrecy: DeepMind's Defensive Delay: According to a Financial Times report, Google's DeepMind will implement a six-month embargo policy on publishing strategic generative AI research papers to maintain competitive advantage, with a researcher stating they "could not imagine us putting out the transformer papers for general use now."
- The community had mixed reactions, with some users arguing the delay is reasonable given how companies like OpenAI built their business on DeepMind's freely shared research, while others expressed concern that this could be a "race to the bottom" that would eventually lead to longer delays or permanent secrecy.
Open Research Ramifications: Progress vs. Profit: Redditors debated the impact of DeepMind's new publication policy on AI advancement, with many pointing out that transformer architecture research from 2017 created hundreds of billions in value for other companies while Google failed to capitalize on its own innovations.
- Some commenters argued that open collaboration accelerates progress for everyone, noting "we probably wouldn't be where we are currently when it comes to the field if it wasn't publicly shared," while others defended DeepMind's right to protect its intellectual property and competitive position.

3. New Tools and Features for Local LLM Users

Hugging Face's Hardware Helper: Hugging Face launched a new feature that allows users to check if their hardware can run specific GGUF models directly from the model page by entering their hardware specifications at https://huggingface.co/settings/local-apps.
- Users welcomed this quality-of-life improvement while suggesting additional features such as filtering models by hardware compatibility, estimating maximum context length, and providing layer offload recommendations for CPU+GPU setups. The Hugging Face team indicated they would iterate on these suggestions in future updates.
Mobile Model Momentum: iPhone Inference Innovations: A developer demonstrated achieving 90 tokens per second with Llama 3.2 1B in float16 precision on an iPhone by completely rewriting the inference engine, showcasing significant performance improvements over existing solutions like MLX.
- The community discussed the trade-offs between using float16 versus quantized models, with some questioning whether the quality difference between fp16 and q8 was significant enough to justify the performance cost, while others debated the practical applications of such small models on mobile devices.
DeepSeek's Diminutive Deployment: V3 GGUF Quantization: User VoidAlchemy released new GGUF quantizations of DeepSeek V3-0324 using the ikawrakow/ik_llama.cpp fork, optimized to support 32k+ context in under 24GB VRAM with Multi-Layer Attention (MLA) and high-quality tensors for attention/dense layers.
- The quantizations were designed specifically for the ik_llama.cpp fork and won't work with mainline llama.cpp or other tools like Ollama or LM Studio. Performance benchmarks showed achieving near Q8_0 quality with speeds comparable to 4bpw quants on CPU-only setups.

4. Novel LLM Research Concepts

Temporal Training: LLMs Trapped in Time: A Reddit user proposed creating LLMs trained exclusively on data from before a specific year or time period, such as pre-2010, sparking discussion about the feasibility and implications of such historically-bounded models.
- Community members suggested that models limited to pre-1950s data would be possible with public domain books, newspapers, and archived materials, but noted such models would reflect historical biases and technological optimism while lacking modern concepts. Some pointed to existing research like TimeLMs that tracked how language models' performance degraded on recent content.
Acoustic Analysis: GPU Symphonies from Model Inference: Users discovered that different LLM models produce distinctive sounds from GPUs during inference, with a post linking to evidence that these audio patterns are specific to model architecture, quantization, and context size combinations.
- The discussion revealed this phenomenon is caused by "coil whine" from capacitors and inductors in the GPU's voltage regulator module, with some noting researchers have previously extracted encryption keys by recording such processing noise, raising potential security implications.
Attention-Free Architecture: Qwerky's Quantum Leap: A post highlighted Qwerky-72B and 32B, attention-free models trained on just 8 GPUs, representing a significant advancement in efficient model architecture that requires less computational resources.
- These models, available on Hugging Face, demonstrate how attention-free architectures can reduce VRAM requirements while maintaining performance, with community members noting the potential implications for long context handling and accessibility of large model training.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

1. GPT-4o Image Generation Capabilities

Precise Placement Prowess: Reddit users are impressed by GPT-4o's ability to handle precise item arrangement and text in generated images. A user shared examples showing how the model accurately places multiple icons in grid layouts with correct labeling, maintaining consistency across complex visual hierarchies.
- Many commenters noted that GPT-4o's unified text-image architecture gives it a significant advantage in understanding and executing detailed prompts compared to other models. One user demonstrated the model could handle up to 24 distinct icons with labels before quality degradation, showcasing its impressive compositional capabilities.
Content Filter Frustrations: Users expressed frustration with GPT-4o's content filtering system, with one post titled "Chat gpt 4O sucks and everything trips its baby mode content filters" gaining significant traction. The poster complained about being unable to generate even mildly violent or suggestive content like fantasy battle scenes.
- Despite complaints, several users demonstrated techniques to work around the filters, sharing successful generations of warrior characters and stylized art. This sparked debate about OpenAI's approach to content moderation, with some users creating satirical content mocking the filters, including a GitHub repo titled "I've reverse-engineered OpenAI's ChatGPT 4o image generation algorithm" that was actually an April Fool's joke.

2. Claude vs Gemini Competition Heats Up

Gemini 2.5 Takes the Lead: A post titled "This is the first time in almost a year that Claude is not the best model" sparked significant discussion as a Claude user admitted that Google's Gemini 2.5 now outperforms Claude across multiple use cases. The post highlighted Gemini's superior handling of context and overall reliability.
- Users debated specific strengths of each model, with many noting that Gemini 2.5's million-token context window is a game-changer compared to Claude's more limited capacity. Several commenters praised Gemini's creative writing abilities, though some suggested the influx of pro-Gemini posts might be strategic "astroturfing" rather than organic user feedback.
Claude's Service Struggles: Multiple posts documented issues with Claude's service reliability, with users reporting increased rate limiting and error messages like "Message Limit Reached" appearing more frequently for paid subscribers. Screenshots showed the service becoming unresponsive despite users paying for premium access.
- The timing of these issues coincided with growing praise for Gemini 2.5, leading some users to question Anthropic's infrastructure scaling. One user wrote, "Anthropic should scale or get out of the business, we are paying customers," while others noted they were switching to Gemini due to Claude's increasingly restrictive usage limits.

3. Video Generation Breakthroughs

Wan 2.1 Video Model Mastery: User @legarth shared an impressive video demonstration of the Wan 2.1 vid2vid model running locally on a 5090 GPU, transforming a clip from Tropical Thunder into a scene featuring the Joker. The model accurately maintained physics details like jacket movement despite working only with pose information.
- The creator explained they processed 216 frames (9 seconds at 24fps) but noted quality deterioration after about 120 frames. The community was particularly impressed by the model's ability to predict physics from motion alone, with one commenter noting "the jacket's cool. Physics" and another highlighting how the model handled hair movement despite the original actor being bald.
VACE Video Control Released: A significant open-source video generation advancement was announced with the partial release of VACE (Video with Attention-based Cascaded Extraction) models on GitHub. The release includes VACE-Wan2.1-1.3B-Preview and VACE-LTX-Video-0.9, with the larger 14B version promised for later.
- Users expressed excitement about this open-source alternative to closed commercial platforms, with one commenter noting: "If this works anything like the examples shown, open-source video just leveled up big time." The technology appears to offer enhanced control over video generation, including structure and pose preservation features.

4. AI Development Tools and Innovations

Claude Code's Costly Creation: A developer shared their experience spending $417 on Claude Code to build a word game called LetterLinks, detailing both successes and frustrations with the AI coding assistant. Despite the high cost, the user concluded it was still cheaper than hiring a freelancer for the estimated $2-3K the project would have required.
- The post highlighted specific issues with Claude Code, including context window limitations that became problematic as the codebase grew to 15K lines, and the need for extensive manual testing since "Claude can write code all day but can't click a f***ing button to see if it works." Many commenters suggested alternatives like using Gemini 2.5 Pro with its million-token context window or Claude MCP on the desktop app.
EasyControl: Diffusion Transformer Enhancement: A new framework called EasyControl was released, designed to add efficient and flexible control capabilities to Diffusion Transformer (DiT) models. The system incorporates a lightweight Condition Injection LoRA module and position-aware training to enhance model compatibility and generation flexibility.
- Community members were particularly interested in EasyControl's potential to provide ControlNet-like functionality for Flux models, with one user commenting "Could this be the long-awaited good ControlNets for Flux?" Testing revealed mixed results, with OpenPose control working well but subject transfer capabilities showing inconsistent performance.

5. Pixel Art and Retro Graphics AI

Retro Diffusion's Pixelated Precision: Retro Diffusion launched an interactive browser-based playground for generating authentic pixel art using AI, with no signup required. The FLUX-based model can create pixel art across various styles through smart prompting alone, without requiring LoRAs.
- The technical article accompanying the launch detailed how Retro Diffusion solved challenges specific to pixel art generation, including grid alignment, limited color palettes, and maintaining pixel-perfect outputs. The platform's creator joined the discussion, answering questions about features like animation capabilities and color palette control.
XLSD: Lightweight Model Magic: Developer @lostinspaz shared progress on the XLSD project, which aims to create a high-quality image generation model that can run on systems with limited VRAM (8GB or even 4GB). The approach involves forcing SD1.5 to use the SDXL VAE and then training it to produce significantly better results.
- Comparison images showed substantial quality improvements over the base SD1.5 model, with the developer noting they were "cherry picking a little" but providing a fair comparison using identical settings. The community responded positively to this optimization-focused approach, with one commenter appreciating "people who push a piece of technology to its limit and explore it just for the sake of it."

AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1: OpenAI's Open-Weight Model Sparks Excitement

Sam Altman Dangles Open-Weight Model Carrot: OpenAI CEO Sam Altman announced plans to release a powerful new open-weight language model, seeking developer feedback to maximize its utility. He assured they "will not do anything silly like saying that you can't use our open model if your service has more than 700 million monthly active users."
Community Speculates on OpenAI's Strategy Shift: Nathan Lambert expects a 30B parameter reasoning model with an MIT/Apache license, igniting discussions about OpenAI's potential impact on the open-source community.
Enthusiasts Brace for OpenAI's Return to Open Releases: AI developers express optimism about OpenAI's move, seeing it as a boost for collaboration and innovation in AI development.

Theme 2: New AI Models Under the Microscope

Gemini 2.5 Pro's 'Aliveness' Stirs Turing Test Talk: Users are intrigued by Gemini 2.5 Pro's unique interaction style, suggesting it "might be the first to pass a serious Turing Test" due to its apparent aliveness and curiosity.
DeepSeek R1 Zips Past Rivals, Democratizes RL: DeepSeek R1 outperforms larger labs with efficient resource use and an MIT license, making reinforcement learning accessible to the "GPU poor" through GRPO.
Gemma 3 Smokes Gemini 1.5 in Benchmarks: Gemma 3 27B outperforms Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, impressing users with its superior capabilities.

Theme 3: Users Vent Over AI Tool Troubles

Manus.im Users Rage Over Credit Crunch: Manus.im's new credit system infuriates users as credits deplete rapidly, leading them to recommend alternative AI research tools like Traycer.
Gemini 2.5 Pro Rate Limits Drive Users Nuts: Frustrated users encounter rate limits on Gemini 2.5 Pro, debating whether limits apply to free and paid tiers, with some attempting to bypass them via a VPN.
Cursor Charges for Free Models? Users Say 'What the Heck!': Cursor users question being charged to use a free model, sparking discussions about API usage, billing practices, and transparency within the platform.

Theme 4: Open-Source Contributions and Technical Innovations Shine

Neuronpedia Opens the Data Floodgates: The interpretability platform Neuronpedia goes open source under MIT license, releasing over 4 TB of data and tools to democratize model interpretability.
Stanford Teaches Transformers to the Masses: Stanford opens its CS25 Transformers seminar course to the public via Zoom and YouTube, covering topics from LLM architectures to creative applications.
Megatron Tensor Parallelism Gets Deep Dive Treatment: An illustrated deep-dive into Megatron-style tensor parallelism, including fused/parallel CE loss, is shared, enhancing understanding of ML scalability and performance techniques.

Theme 5: AI Makes Strides in Law and Healthcare

AI Decodes Legal Jargon in New Seminar Series: The Silicon Valley Chinese Association Foundation hosts a seminar on AI in legislation, featuring the founder of Legalese Decoder, exploring how AI simplifies complex legal documents.
Sophont Aims for Medical AI Revolution: Sophont launches to build open multimodal foundation models for healthcare, striving to create a DeepSeek for medical AI.
Dream On! Rem App Wants You to Journal Your Nighttime Adventures: Rem introduces a dream journaling app that allows users to record, analyze, and share dreams, leveraging AI to uncover hidden patterns in their subconscious.

PART 1: High level Discord summaries

Manus.im Discord Discord

R1 Users Rage Over Credit Crunch: Many R1 users expressed dissatisfaction with the new credit system, with some experiencing complete credit depletion after a few requests, and recommend alternative AI research tools like Traycer to save credits.
- They observed that the system is like gambling and proposed more clear and transparent options for future plans, urging a reconsideration for user adoption.
Decoding Credit Consumption: Credits deplete based on LLM tokens, virtual machines, and third-party APIs, increasing with task complexity and time, now also consuming credits even just browsing online.
- Members reported projects failing to upload and needing 800 credits plus 1800 more debugging, pointing out that debugging on ChatGPT was superior.
OpenManus Gains Traction: Despite security concerns with PAT and API keys, there's rising interest in OpenManus, with some planning to evaluate its capabilities, with a member asking if the tool's output could improve.
- Members caution about capability deficiences when adapting to the Manus' work scenarios and also point out that it can generate interactive study guides as websites and in depth research, depending on the situation.
Manus Now Offers Website Hosting: Members are reporting success with Manus on creating a hosted website, pointing out that the software provides DNS and hosting services, while they are combining services like Perplexity and Gemini Deep Research.
- A member says there's a video on website creation, leading other members to inquire about how to get people to use the website.
Manus Android App Debuts: Users discover that Manus has an Android app, accessible via the browser by clicking a phone icon, which redirects to the Play Store.
- Some members even jokingly suggested purchasing an iPhone as a solution.

LMArena Discord

Meta Models' Safety Settings Get Downgraded: Newer models from Meta are becoming safer by sanitizing censored details when inferring hidden context from corrupted text, marking a shift in model behavior.
- Previous models like Themis, Cybele, and Spider were eager to go where other models couldn't.
Decoding the 'Venom' System Prompt: Members analyzed the system prompt for models like Spider, Cybele, and Themis, believing they share a similar prompt to the now exposed venom prompt.
- The analysis reveals a whacky but intelligently crafted prompt that heavily influences the models' style and responses, particularly in how they format and structure their outputs.
Gemini 2.5 Pro's 'Aliveness' Sparks Debate: Members expressed intrigue over the aliveness and curiosity of Gemini 2.5 Pro, with one suggesting it might be the first to pass a serious Turing Test due to its unique interaction style and exceptional creative writing.
- They highlight Gemini's top scores on Philip's SimpleBench as evidence of its potential and note the model appears to be more creative and engaging, leading to calls for a double-blind Turing Test.
LMArena Unleashes New Models into the Pantheon: LMArena introduced a flood of anonymous models like Aether, Maverick, Ray, Stargazer, Riveroaks, with members trying to uncover their origins and capabilities.
- Stargazer is said to be made by Google (=== Nebula), and Riveroaks claims to be from OpenAI, gpt 4o, while Maverick, Spider and 24_karat_gold seem to have a similar style due to their shared system prompts and origins at Meta.
Alpha Arena Adds Copy Code and Images: The Alpha Arena now features a copy code function and image generation capabilities, enhanced accessibility.
- Testers are encouraged to provide feedback via a Google Forms link and report bugs via an Airtable link.

Cursor Community Discord

Reasoning Ability of Gemini 2.5 Pro Debated: Members are debating the reasoning capabilities of Gemini 2.5 Pro, with some finding it quick but lacking depth, while others praise its performance in specific coding scenarios citing Tweet from Min Choi.
- Some suggested Claude 3.7 handles complexity and detail more effectively, however the new Gemini Pro 2.5 model is now being used in Cursor. See Tweet from Ryan Carson.
Account Restrictions Ignite Trial Abuse Discussion: A user's account limitations sparked a debate about trial abuse, with claims of accounts being flagged for abuse and requiring a credit card.
- Alternatives like Windsurf or Cline were suggested to bypass payment issues, but no further details were provided about how to use those tools or their reliability.
AI's Impact on Jobs Discussed: Members are discussing AI's potential impact on employment, speculating that 86% of jobs could be replaced by 2030.
- The response was to learn ML/AI and Prompting properly, with the additional suggestion to learn polynomials with regressions.
Cursor's Charging for Free Models Questioned: Members questioned Cursor for charging to use a free model, with explanations clarifying that Cursor manages API usage through their wallet and has deals with AI models via Fireworks.
- The general consensus was that Cursor has limited token usage but is approximately 10x cheaper than Claude, offering a more cost-effective solution for some users.

Unsloth AI (Daniel Han) Discord

Multi-GPU Support Marches to Unsloth: Unsloth is adding multi-GPU support, with the first release focusing on data parallelism, but fsdp (Fully Sharded Data Parallelism) may not be included initially.
- The fsdp (Fully Sharded Data Parallelism) component will be under the AGPL3 license.
DeepGrove's Bonsai Boasts Budget BitNet Bootstrapping: A member is skeptical about DeepGrove's Bonsai claiming to pretrain a BitNet with only $70 and 3.8b tokens.
- They're running the model in Kaggle to see if it's valid, exploring whether the model is a blindly copied Qwen model or continue trained Qwen to BitNet.
Unsloth Dataset Defect Detected: A user ran into a ValueError when using a custom dataset in Unsloth Orpheus format, which was later resolved by using a GPU.
- Another user mentioned that the Orpheus dataset uses SNAC, which operates at 24kHz.
Gemma 3 Generates Text-to-Image Wonders: A user sought image and text inference samples for Unsloth/Gemma 3 using Hugging Face, referencing a Gemma 3 demo on Hugging Face Spaces.
- It was noted that while Llama 3.2 Vision requires an image, Gemma 3 should not have the same issue.
Long Ctx Benchmarks? RULER is Rule!: For long ctx benches, a member stated that RULER is the bare minimum for what should be considered a long ctx bench, and NIAH is garbage.
- They added that some of the recent ones are alright.

Perplexity AI Discord

Discord Revamp Imminent!: The moderation team is gearing up to overhaul the Discord experience, focusing on simplified onboarding, a consolidated feedback channel, and automated Pro Channel access over the next week.
- These changes aim to streamline user engagement and ensure the team stays responsive to community needs.
Space Instructions Still Caged?: Users found out that Space Instructions in Perplexity AI have limitations on controlling the search experience, mainly impacting output summarization.
- Because instructions only apply after the data has been extracted, this prevents the AI from avoiding specific topics.
Image Generation Goes MIA: Users noticed the disappearance of the image creation feature within Perplexity.
- While it isn't clear if the function is completely discontinued, one user suggested using the web search to find the generate option, but another confirmed that the function doesn't seem to appear for everyone, perhaps indicating phased rollout or feature testing.
GPT Omni Receives Thumbs Down: Members have reported experiencing frustration with GPT Omni, with one describing it as suck ass.
- While designed for smarter interaction with audio, video, and images, users have noted that Omni has been dumbed down compared to GPT-4 for cost reasons.
JSON Escapes Sonar API: A user reported issues with the Sonar API adding odd special characters to the JSON results when searching the web, despite using pydantic for formatting.
- The user provided an example where extra characters were added to the source_name, source_title, summary, and url fields in the JSON output.

OpenAI Discord

ChatGPT Debuts Monday Voice: ChatGPT introduced a new voice option called Monday, accessible via the voice picker in the top right corner of voice mode, as shown in this demo video.
- Users can select the new Monday voice option by opening the voice mode and using the voice picker in the top right corner.
Beware of Fake ChatGPT Apps!: Users reported encountering fake ChatGPT apps on the Play Store, where they purchased the app but did not receive access, emphasizing the need to verify purchases via their purchase history.
- It's crucial to ensure you're using the official app to avoid scams and ensure you have access to genuine OpenAI services.
Gemini 2.5 Pro Rate Limits Plague Users: Users reported experiencing rate limits on Gemini 2.5 Pro, leading to discussions on whether the limit applies to both free and paid tiers, with some users trying to bypass rate limits by using a VPN.
- A suggestion was made to use Gemini in Google AI Studio instead, where the usage limits are higher (50 requests per day).
ElevenLabs Model Promises Audiobooks: A member explored ElevenLabs' new model for narrated audio books, citing its voice cloning feature.
- While they were impressed with the initial results, they await OpenAI to release a similar voice product to avoid subscribing to external services, because it may be useful as a voice acting placeholder for game developers.
Model Users RESET Rigid Patterns: A member shared a code snippet FORMAT_RESET to help models acknowledge when they've fallen into rigid patterns and rethink their approach.
- The code encourages the model to analyze what format would better suit the response and completely rethink its approach without defaulting to templates.

LM Studio Discord

Gemma 3 Smokes Gemini 1.5: Gemma 3 27B outperformed Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, with one member producing the data using Gemini 2.5 Pro, available for free on OpenRouter.
- A user with a 4060 Ti and i5 12400F was recommended Qwen Coder 7B, available on LM Studio's model page, though members emphasized that local LLMs generally perform worse than cloud alternatives.
Gamers Plug eGPU into LM Studio: Members discussed the feasibility of using an eGPU with LM Studio, suggesting it should work if the computer recognizes it, though speeds may be slower, as referenced in a YouTube video comparing LLMs on RTX 4090 Laptop vs Desktop.
- Another user observed a 3.24x speedup from M4 Max to 5090 after resolving crashing issues, which aligns with the 3.28x ratio of their memory bandwidths when doing QwQ 32B 4 bit quant comparisons.
Copilot's Code Called Garbage!: Members debated the benefits of AI assistance in programming, with one arguing it hurts more than helps due to learning from AI slop, expressing concern that Copilot is trained on garbage code.
- Others disagreed, saying Copilot works great for experienced developers, but one person noted average users trust the recommendations too easily.
Context Size Drives Mac Preference: Despite faster Nvidia GPUs, users are leaning towards Macs for the freedom to have more context size, highlighting the utility of larger context sizes even with slower speeds.
- One user wondered what would happen if they could load the context overflow into the shared memory/system RAM while keeping the entire model in VRAM, but another user noted that the LLM needs all the context in VRAM to generate the next token.
Nvidia Drivers Fail after 10 Hours: A user reported Nvidia driver instability after running models for 10-12 hours, requiring a driver reinstall to resolve performance issues, clarifying the issue was with the Nvidia driver itself, not the Windows OS.
- A user inquired about performance results for the Tenstorrent Wormhole (n150d and n300d) within the Discord community, expressing interest in obtaining TOK/s metrics for these models.

aider (Paul Gauthier) Discord

Gemini 2.5 Pro is a Mixed Bag: Users are seeing mixed results with Gemini 2.5 Pro, with some models hallucinating, DC'ing, or being top-tier for coding tasks.
- One user found Gemini 2.5 Pro and DeepseekV3 to be "almost free and top tier", whereas others are giving up, and throwing away their computers, as shown in this GIF.
RateLimitError Fixes Sought: Users are experiencing frequent RateLimitErrors when requesting summaries and clearing history.
- It was clarified that the rate limit is likely based on the number of requests per minute or day, and a possible solution may be found in this Github issue.
Dot Command Revolution?: A user is promoting the use of .dotcommands as a productivity tool for developers, to automate tasks with single-line commands such as .status and .next.
- The goal is to provide cognitive shortcuts optimized for clarity and specific functionality, but it was noted that *THE DOT REVOLUTION IS HERE 🔥 Coders everywhere will want to try this one cool trick.
Aider's Subtree Savior Emerges: Members are seeking ways to limit aider to a subdirectory of a monorepo.
- The solution is to use the --subtree-only switch after changing to the desired directory, setting aider to ignore the repo outside the starting directory, however, the asker pointed out the FAQ on large monorepos.
The Case of the Misconfigured Model: A member reported that specifying model names in a local YAML config file wasn't working as expected.
- Despite the startup message showing the correct config settings, aider still defaulted to anthropic/claude-3-7-sonnet-20250219 rather than the configured deepseek/deepseek-chat.

OpenRouter (Alex Atallah) Discord

Organizations Feature Escapes Beta!: OpenRouter announced the Organizations feature is out of beta, enabling teams to manage billing, data policies, provider preferences, and API keys in one place, according to this X post.
- Over 500 organizations were created during the two-week beta, offering full control over data policies and billing.
Web Search Invades Chatroom!: Web search results are now integrated into the chatroom, using Perplexity results formatted similarly to OpenRouter's :online model variants.
- A user requested that OpenRouter post on Bluesky to avoid Xitter reliance.
Gemini Flash 2 Transforms!: OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in.
- These transforms are applied by default on endpoints with context length less than 8192 tokens, and only once the 1M limit is reached.
Usage Downloads Coming Soon!: A member requested downloads of their usage data, including tokens and costs, as displayed on the activity page, for credit verification.
- A maintainer responded that while this feature is unavailable, we're working on it.
EU Provider Selection Quandary!: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer noted the need but mentioned limited coverage, recommending seeking an EU certified provider for strict EU data guidelines if provider selection is not enough.

Eleuther Discord

Stanford Teaches Transformer Class, Online: Stanford has opened its CS25 Transformers seminar course to the public via Zoom, featuring discussions with researchers and covering topics from LLM architectures to creative applications and past lectures available on YouTube.
- The course includes lectures, social events, networking sessions, and a Discord server for discussions.
Deep Sets finds Triangles, Achieves Nothing: A member shared a link to a paper titled Deep Sets for the Area of a Triangle (arxiv link), which presents a polynomial formula for triangle area in Deep Sets form.
- The abstract concludes that the project, motivated by questions about computational complexity of n-point statistics in cosmology, gained no insights of any kind.
Neuronpedia Opens the Data Floodgates!: The interpretability platform Neuronpedia is now MIT open source and available on GitHub, offering a quick Vercel deploy.
- A trove of interpretability data, totaling over 4 TB, is available for download as Public Datasets.
SmolLM Scores Zeroes Out, PR fixes Aggregate Scores: A member reported that aggregate scores for tasks like leaderboard_bbh, leaderboard_math_hard, and leaderboard_musr were empty in the results JSON when running leaderboard evaluations with lm-eval on SmolLM-1.7B.
- A member shared a PR adding subtask aggregation to address missing aggregate scores in tasks with subtasks.

Interconnects (Nathan Lambert) Discord

CodeScientist Automates Science: AllenAI introduces CodeScientist, an autonomous system using genetic search over research articles and codeblocks to generate and evaluate machine-generated ideas, with 19 discoveries from experiments in agents and virtual environments detailed in their paper.
- The system addresses limitations in current ASD systems by exploring broader design spaces and evaluating research artifacts more thoroughly.
OpenAI Teases Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2, seeking developer feedback via this form to maximize its utility, according to Sam Altman's tweet.
- Altman stated that they will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.
Meta Preps Screened Smart Glasses: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman's report.
- Members are interested to see how they'll do against xreal.
Pydantic Evaluates LLMs: Pydantic Evals is a powerful evaluation framework designed to help systematically test and evaluate the performance and accuracy of the systems you build, especially when working with LLMs.
- It provides a structured environment for assessing model capabilities and identifying areas for improvement.
Lambert Returns to OpenAI: Nathan Lambert shared his thoughts on OpenAI returning in a substack post, mentioning that he may use this format for unbaked career thoughts too.
- He also mentioned DMing some OpenAI folks about it, hoping to find allies of open source who feel exiled by the current situation.

GPU MODE Discord

A100 Parallel Threads Face Reality Check: Members debated the maximum number of parallel threads on an A100 GPU, but practical tests using GeoHot's tool revealed a limit of 24576 or 256 threads per SM before performance degrades.
- The conversation clarified that GPUs use oversubscription to hide latencies with cheap (~1 cycle) context switches, suggesting adding threads beyond the "parallel threads" limit doesn't significantly increase runtime.
FlexAttention Lets it All Hang Out: FlexAttention now supports arbitrary sequence lengths, removing the restriction for segment sequence lengths to be a multiple of 128 in PyTorch 2.6.
- This enhancement was discussed with Horace He at a GPU mode event in San Jose.
Desire for Memory Savings Using Tensor Deletion: A user seeks methods to delete argument tensors within a loss function to achieve memory savings of approximately 7GB, with a related GitHub Issue available.
- The user wants to free the storage associated with a tensor after it's no longer needed, even if a reference exists in the outer scope, while ensuring it remains compatible with torch compilation to avoid graph breaks.
Apple Pushes MLX to the Max: Apple is hiring engineers for their MLX team to build scalable, distributed training and research pipelines, advancing the frontier of ML and systems.
- The company seeks system engineers and software developers with a background in ML to build technologies powering future products.
Megatron Tensor Parallelism gets the Deep Dive Treatment: A member wrote an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, and seeks feedback on the content, available here.
- The article aims to deepen the understanding of ML scalability & performance techniques.

Latent Space Discord

Cursor Codes Cash with Huge Round: Cursor closed a $625M funding round at a $9.6B post-money valuation, led by Thrive and A16z, with Accel as a new backer, and achieved $200M ARR, a 4x increase from its previous round in November 2024, according to The Information.
- The round sparked chatter about vibe coding, with Abe Brown noting the company's valuation has rapidly grown, approaching $10B.
Etched Etches $85M for Transformer ASIC: Etched, a startup building transformer ASICs, closed an unannounced $85M round at a $1.5B valuation, following two stealth rounds at $500M and $750M, according to Arfur Rock.
- The company claims its chip Sohu can process over 500,000 tokens per second running Llama 70B, with one 8xSohu server replacing 160 H100s, although it cannot run CNNs, LSTMs, SSMs, or other AI models.
OpenAI Opens the Gates with Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback on how to maximize its utility, according to OpenAI.
- The company will evaluate the model using its preparedness framework and host developer events in SF, Europe, and APAC, with Nathan Lambert anticipating a 30B parameter reasoning model with MIT/Apache license, according to his tweet.
OpenDeepSearch Searches Deeper than GPT-4o: Seoong79 announced the release of OpenDeepSearch (ODS), an open-source search agent that works with any LLM and outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on the FRAMES benchmark from DeepMind, according to his tweet.
- Specifically, OpenDeepSearch achieved +9.7% accuracy over GPT-4o-Search when paired with DeepSeek-R1.
Sophont Seeks to Solve Medical AI with Open Models: iScienceLuvr announced the launch of Sophont, a company building open multimodal foundation models for the future of healthcare, aiming to create a DeepSeek for medical AI, according to his tweet.
- The new company seeks to create foundation models that can perform well in healthcare.

HuggingFace Discord

DeepSeek R1 Zips Past Rivals: A tweet lauded DeepSeek R1 for outperforming larger Western labs with efficient resource utilization and a permissive MIT license.
- The release also democratized RL for the GPU poor through GRPO.
xAI Swallows X Corp!: xAI acquired X in an all-stock deal, valuing xAI at $80 billion and X at $33 billion according to Elon's Tweet.
- The merger aims to combine xAI's AI expertise with X's extensive user base.
LLM Hyperparameter Tuning Hot Takes: Members sought guidance on selecting hyperparameters for fine-tuning LLMs, and were directed to Unsloth's LoRA Hyperparameters Guide.
- The question focused on how contextual changes influence hyperparameter settings.
Coding Model OpenHands LM Opens!: The open coding model OpenHands LM, a 32B parameter model, is now on Hugging Face.
- The coding model is intended for use in autonomous agents for software development, as mentioned on the project blog.
Gradio Rides the Million Developer Wave!: Gradio announced they've hit 1,000,000 monthly active developers for building and sharing AI interfaces.
- The Gradio team expressed gratitude for the community's contributions.

Modular (Mojo 🔥) Discord

MAX 25.2 Livestream Glitches: Modular's MAX 25.2 livestream faced technical difficulties, but a cleaned-up recording and Chris' lightning talk are now available on YouTube and YouTube, respectively.
- The team apologized and promised a better system for future events, with one member humorously mistaking a GTC video of Chris for the live event.
Compiler Error Bugging Users: A user reported a confusing compiler error message when defining a method for a Dataset struct, suspecting a compiler bug, see GitHub issue #4248.
- A potential cause could be using out self instead of mut self, highlighting the need for clearer error messaging.
Enums MIA in Mojo: Inquiries about enum updates in Mojo revealed that there are no updates available at this time.
- The response was a simple "Sadly no. 🙃🙃🙃"
FlexAttention MAXed Out?: A user inquired about implementing flex-attention in Mojo, linking to a PyTorch blog post and suggesting it as a custom op in MAX.
- The response indicated that Mojo on the GPU is close to CUDA and *"unless you run into something that's a work in progress, MAX should be able to do more or less whatever you want."
Float-to-String Algorithm Fails to Float: A user ported a new float to string algorithm to Mojo from this code, referencing the creator's CPPCon talk, but found it slower than the standard library's dragonbox implementation.
- Stringifying canada.json went from mid 30ms to low 40s, despite ripping the formatting from the standard library.

Nous Research AI Discord

OpenAI API Gets One-Line Fix: Any tutorial working with OpenAI API should work with the Nous Research AI API, provided the endpoint is changed to endpoint = "api.nousresearch.com".
- A member confirmed the fix and noted that they will be adding styles.
Midjourney Models Write Creatively: Midjourney released a new research paper with NYU on training text-based large language models (LLMs) to write more creatively, moving past image generation.
- The company also revealed it is developing its own AI hardware, announced in late summer 2024.
Sam Altman Teases Open-Weight Model: Sam Altman announced plans to release a new open-weight language model with reasoning capabilities, seeking developer feedback through events in SF, Europe, and APAC, according to this announcement.
- This marks OpenAI's first open-weight model release since GPT-2.
DeepSeek Jiu Jitsu saves the Open Source Community: Members expressed gratitude to DeepSeek for their sophisticated maneuvers in enabling an Open Source community.
- The sentiment was linked to this YouTube video discussing OpenAI's shifting strategy related to the open-weight model.
CamelAIOrg Launches Project Loong 🐉: CamelAIOrg introduces Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data, and this blog post details the integration of synthetic data generation with semantic verification.
- The project features a multi-agent framework ensuring accuracy and consistency.

Yannick Kilcher Discord

Graphs Experience Learning Renaissance: A Google Research blogpost highlights the evolution of graph learning since 2019, tracing the history of graph theory back to Leonhard Euler in 1736 and its applications in modeling relationships.
- Community members showed great interest in recent advancements of the area.
AI/ML Reshapes Job Landscape: Recent AI/ML improvements primarily impact low-level jobs, such as minor programming tasks, yet human adaptation remains crucial, reducing dependencies on others, exemplified by AI/ML's role in initial legal assistance.
- This shift saves resources and enables multidisciplinary tasks, suggesting a significant restructuring of professional roles.
RLHF Produces Nerfed Models: Concerns rise over RLHF leading to emergent misalignment if models are penalized for useful tasks such as ML R&D, potentially resulting in open-source models becoming increasingly evil as they compensate for suppressed behaviors.
- Discussion also touched on whether open-source models may become nerfed.
Gemini 2.5 Pro Bombs Math Test: Testers found Gemini 2.5 Pro (experimental) to be totally trash in math, with issues in UI math display, with ChatGPT and Grok 3 demonstrated superior question comprehension in information theory and geometry.
- Results led the user to guide the language model to write correctly.
AI Model Feedback Aired Out: With the launch of the OpenAI Open Model Feedback forum, there was renewed discussion of Ilya Sutskever's quote of if there was one great failing it would be that you always had to check the results.
- The forum aims to improve models using community input.

MCP (Glama) Discord

Pichai Promotes MCP?: Sundar Pichai's tweet asking 'To MCP or not to MCP, that's the question' has ignited significant interest in MCP, amassing over a million views.
- The moderator of /r/mcp even proposed hosting an AMA if Google adopts MCP.
ActivePieces Abandons MCP!: Active pieces, an open-source Zapier alternative, discontinued its support for MCP.
- There was no reason stated, but it might be related to the general MCP protocol still undergoing active development, along with the growing pains of many MCP-related side projects being deprecated.
MCP RBAC approaches Explored: Users are exploring Role-Based Access Control (RBAC) implementations on MCP servers for segmented tool visibility, with one suggestion being integration with WorkOS.
- Another member mentioned Toolhouse API handles RBAC based on the API key.
SDK Governance goes Open Source!: An open source SDK for enterprise governance (Identity, RBAC, Credentials, Auditing, Logging, Tracing) within the Model Context Protocol framework, is available at ithena-one/mcp-governance-sdk.
- Community feedback is welcomed.
Asynchronous MCP Cometh: The extension MCPC mitigates MCP's synchronous limitations by adding asynchronous support.
- It maintains backwards compatibility, so existing setups remain functional, while the new features are available to both client and server setups.

Notebook LM Discord

NotebookLM Chases Webby Wins: NotebookLM is nominated for three Webby Awards and is asking for community votes at this link.
- Voters should confirm their votes by clicking the verification link in their email, and check their spam folder.
Google Tasks Tempts Integration: A user suggested that Google Tasks could integrate with NotebookLM by allowing users to pick a task list via a dropdown/popup.
- They proposed that this could work similarly to how Google Tasks allows selecting a task list for sharing.
Archival Aspirations Arise: A user requested a way to archive notebooks in NotebookLM to hide them and reduce the number of notebooks counting against their limit.
- They suggested that hidden/archived notebooks should not appear in the list of notebooks available for sharing content.
Gemini 2.5 Pro: Prompting Parity: A user requested that the NotebookLM IA be updated to Gemini 2.5 Pro, citing their love for the updated Gemini version.
- They hope that NotebookLM will perform even better with the new model, but the NotebookLM team has not commented on any ETAs.
Notes, Not Sources Needed: A user with personal notes managed in Obsidian (2000+ short notes) finds the 300-note limit restrictive.
- They propose limiting the total number of words instead of the number of sources to better accommodate mesh note systems; a user suggests that folders or zipped files as a single source would also solve the problem.

Torchtune Discord

Torchtune Scheduled for Next Friday: Members announced the next Torchtune office hours next Friday, linking to the Discord event.
- Members celebrated Discord's automatic timezone conversion feature.
Hurry Review PR #2441: A member requested a final review for PR #2441 to expedite the merge process.
- Regression testing for PR #2477 is paused awaiting Qwen model upload to S3 for download during the regression test script, but the S3 bucket hookup is encountering internal infra snags.
Llama2 Called Geriatric: A member suggested swapping the regression tests using the Llama2 model with something more current.
- It wasn't clear if the member's issues were related to regression test failures or simply the test suite using older components.
Recursive Reshard Routine Removed: PR #2510 removes the recursive_reshard utility because it wasn't needed.
- This PR was initially intended to address #2483, but further examination revealed the utility was unnecessary.

tinygrad (George Hotz) Discord

ImageDtype's Purpose Revealed: A member asked about the purpose of ImageDtype and the IMAGE environment variable in tinygrad, referencing its influence on Tensor.conv2d implementation with a link to a VAE training script.
- Another member thinks that it is related to accelerating comma.ai models on Qualcomm (QCOM) hardware, by utilizing mobile GPUs' texture performance and caching.
tinygrad BEAM Leaves tf-metal in the Dust: A user reported performance gains on an M1 Pro, going from 3.2 it/s without BEAM to 28.36 it/s with BEAM=2; while Keras with tf-metal achieved about 25 it/s.
- George Hotz was pleased to see that it's "faster than tf-metal with BEAM!"
Mobile GPUs Get Accelerated Via Textures and ImageDType: Discussion suggests ImageDType and associated functions optimize for mobile GPUs' texture performance, referencing a Microsoft research paper on mobile GPUs.
- A member questioned the hardcoding of layout specifics and suggested HWC (Height, Width, Channel) handling should be part of normal conv2d with user-defined padding.
arange() Algorithm Optimized: A member identified suboptimal code generation for small arange ranges (e.g., arange(1, 2, 0.1)) compared to larger ranges (e.g., arange(1, 10, 0.1)) and documented their findings on .arange() here.
- They also noticed an unnecessary addition in the generated code, proposing a fix from ((float)((ridx0+1)))*0.1f)+0.9f) to (((float)((ridx0)))*0.1f)+1.0f).

LlamaIndex Discord

LLM Agents Open New Frontiers for Docs: An underrated use case for LLM agents is every field that depends heavily on complex technical documentation like manufacturing, construction, and energy, where an agent can do structured extraction from documents.
- These docs are often full of screenshots as mentioned in this tweet.
OpenAI RateLimitError Hinders ReAct Agent Locally: A user encountered an OpenAI RateLimitError (Error 429) when using a ReAct agent with a local model set up via Ollama, questioning if ReAct agents are exclusively for OpenAI LLMs, with setup details in their GitHub repository.
- The suggestion was that the embedding model might be the cause of the OpenAI error, as it could be defaulting to OpenAI's embedding model if not explicitly set, even though the user confirmed that they are using a Hugging Face embedding model, set during document creation.
VectorStoreIndex Setup Needs LLM and Embedding Model: It was advised to pass in both the llm and embed_model when creating the VectorStoreIndex.
- Also, make sure to specify llm when calling index.as_query_engine().

Nomic.ai (GPT4All) Discord

GPT4All Expands Globally with Translations: Official translations have been rolled out for the GPT4All documentation, now supporting Simplified Chinese, Traditional Chinese, Italian, Portuguese, Romanian, and Spanish.
- This broadens accessibility and usability of GPT4All for non-English speaking developers.
Users Debate Llama3 8B Instructor Model Use Case: A user inquired whether the Llama3 8B Instruct model is optimal for generating blog posts and web pages from video and text-based course materials.
- Another user requested that they rephrase their question.
Clarification on .bin vs .gguf File Formats: A user initially questioned the interchangeability of .bin and .gguf file formats.
- The user then retracted the question, noting they were mistaken about the incompatibility.

LLM Agents (Berkeley MOOC) Discord

MOOC Quizzes Completion-Based: Members confirmed that the MOOC quizzes are completion based.
- Instructors hope students will attempt their best for their own learning.
Llama 3 Cookbook Unveiled: The LLM Agents Cookbook mentioned in Week 5 coding agents refers to the Llama 3's cookbook found here.
- Meta released the Meta Llama 3 family of LLMs in 8 and 70B sizes, optimized for dialogue use cases and outperforming other open source chat models on industry benchmarks according to their blogpost.
Loong Verifiers Validate Reasoning Models: As discussed in Project Loong, Large Reasoning Models like DeepSeek-R1 greatly improved general reasoning when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward.
- The ability to verify accuracy is crucial for improving domain-specific capabilities, particularly in mathematics and programming.
High-Quality Datasets Enhance CoT Learning: The consensus is that abundant, high-quality datasets, featuring questions paired with verified correct answers, are a critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs).
- The community believes that these datasets provide the necessary signals for models to reliably arrive at correct answers.

Cohere Discord

Command A Screams Eternally: A user found that Command A gets stuck generating the same character endlessly when encountering a context where a character is screaming with repeated letters.
- This issue occurs even with default API Playground settings, freezing the interface and preventing feedback, reliably reproduced with prompts like "Please generate a scream in fiction inside quotation marks".
Rem App wants you to journal Dreams: A user shared Rem, a dream journaling app created with a friend to easily record, analyze, and share dreams.
- The app aims to provide a platform for users to log their dreams and gain insights into their subconscious.
New Cohere Members make Introductions: The community welcomes new members to the Cohere Discord server, encouraging them to introduce themselves and share what they're working on.
- New members are prompted to share their company, favorite tech tools, and what they hope to gain from this community.
Members eager to participate and learn: New members are eager to participate, learn, and get feedback on their projects.
- They are excited to engage in discussions about their favorite technologies and tools within the community.

MLOps @Chipro Discord

Decoding Legalese Seminar: The Silicon Valley Chinese Association Foundation (SVCAF) will host a seminar on April 2, 2025, discussing AI applications in legislation, featuring the Founder of Legalese Decoder.
- The seminar will explore how AI, ML, and NLP simplify legal documents for public understanding.
SVCAF Launches AI4Legislation Competition: SVCAF is holding a competition this summer to develop open-source AI solutions for citizen engagement in the legislative process, with details available in the official Github repo.
- The competition aims to harness AI's power to make legislative processes more equitable and effective, aligning with SVCAF's mission to educate the Chinese community in public affairs.
AI4Legislation Seminar Series to Commence: The AI4Legislation seminar series will recur during the first week of each month to provide project guidance and information about legislative AI tools, accessible here.
- Each seminar features a different guest sharing insights on utilizing AI to address key challenges in lawmaking, exploring the potential of AI-driven governance.

AI21 Labs (Jamba) Discord

Multilingual User Misses Poll: A member noted their absence from a recent poll, mentioning they regularly communicate in both French and English.
- They also indicated occasional use of Greek and Hebrew.
AI21 Labs Discussed: The discussion briefly touched on AI21 Labs and their new Jamba model.
- However, no specific details or opinions about the model were shared.

Codeium (Windsurf) Discord

Windsurf Sounds Kickstarts Auditory UX: Windsurf AI debuted Windsurf Sounds, their initial project in sound design and Auditory UX, with the goal of boosting flow state and productivity.
- Check out the full video announcement on X.com for more details.
Windsurf Next Beta Program Opens to Early Adopters: The Windsurf Next Beta program is ready for early testers to check out new features, with downloads available at Codeium.com.
- Minimum requirements include OS X Yosemite, glibc >= 2.28 for Linux, and Windows 10 (64-bit).

Gorilla LLM (Berkeley Function Calling) Discord

v0 Dataset: Vanished or Merged?: A member inquired about the fate of the v0 openfunctions dataset within io_uring.h and whether it was completely merged into the v1 dataset.
- The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the openfunctions dataset in io_uring.h.
Architectural Changes in Datasets: The conversation explores the architectural changes between the v0 and v1 versions of the openfunctions dataset in io_uring.h.
- The members seek to understand the data migration strategies, if any.

The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Manus.im Discord ▷ #showcase (1 messages):

Amazing case

Case gets lauded as "amazing": A member labeled a certain case as amazing, using celebratory emojis.
- Unfortunately, no details or context about what this 'case' refers to, or why it's considered exceptional, were shared.
Mystery 'Amazing Case' Piques Curiosity: A user highlighted a 'case' as amazing with emojis, but provided no specifics.
- The lack of context leaves the community wondering about the nature and significance of this purportedly noteworthy event.

Manus.im Discord ▷ #general (753 messages🔥🔥🔥):

Manus credits, Credit system, Pricing Structure, Token-Based System

R1 Users Lament New Credit System: Many R1 users expressed dissatisfaction with the new credit system, especially because testing projects often exhausts credits quickly, with some experiencing complete credit depletion after just a few requests, with members recommending alternative AI research tools to save credits.
- They observed that the system is like gambling and proposed more clear and transparent options for future plans, and urged it to be reconsidered for user adoption.
Decoding Manus' Credit Consumption Mechanism: Credits are depleted based on LLM tokens, virtual machines, and third-party APIs, increasing with task complexity and time; tasks are now consuming credits despite just browsing online, making it hard for those in programming.
- Members pointed out that projects failed to upload, with some pointing to needing 800 credits and 1800 more debugging, but that debugging on ChatGPT was superior.
OpenManus Open-Source Alternative Gains Traction: Despite security concerns with PAT and API keys, there's rising interest in OpenManus, with some planning to evaluate its capabilities, though members caution of capability deficiences when adapting to the Manus' work scenarios.
- A member asks whether the tool's output could improve, prompting replies that it can generate interactive study guides as websites and in depth research, but that it depends on the situation.
Manus Offers Support for Creating and Hosting Websites: Members are reporting success with Manus on creating a hosted website, pointing out that the software provides DNS and hosting services, while members also report they are combining services like Perplexity and Gemini Deep Research.
- One member says there's a video if you would like to watch this*, leading other members to inquire about how to get people to use the website.
Android App for Manus Is Available: Users discover that Manus has an Android app, accessible via the browser by clicking a phone icon, which redirects to the Play Store, while some suggest purchasing an iPhone to solve the issue.

Links mentioned:

LMArena ▷ #general (977 messages🔥🔥🔥):

Meta Model Safety Downgrades, Decoding 'venom' Prompts, Gemini 2.5 Pro's 'Aliveness', New LMArena models

Meta Models Get a "Safety" Downgrade: Newer models from Meta are reportedly becoming safer, with one member noting the shift by testing how the AI infers hidden context from corrupted text and observing that the models now apparently sanitize the censored details.
- In contrast, previous models like Themis, Cybele, and Spider were eager to go where other models couldn't.
Decoding the "Venom" Prompt: A System Prompt Analysis: Members analyzed the system prompt for models like Spider, Cybele, and Themis, believing they share a similar prompt to the now exposed venom prompt.
- The analysis reveals a whacky but intelligently crafted prompt that heavily influences the models' style and responses, particularly in how they format and structure their outputs.
Gemini 2.5 Pro's Spooky "Aliveness" Sparks Turing Test Debates: Members express intrigue over the aliveness and curiosity of Gemini 2.5 Pro, with one suggesting it might be the first to pass a serious Turing Test due to its unique interaction style.
- They highlight Gemini's exceptional creative writing capabilities and top scores on Philip's SimpleBench as evidence of its potential and note the model appears to be more creative and engaging, leading to calls for a double-blind Turing Test.
LMArena Introduces a Pantheon of New Models: LMArena introduces a flood of anonymous models like Aether, Maverick, Ray, Stargazer, Riveroaks, with members trying to uncover their origins and capabilities.
- Stargazer is said to be made by Google (=== Nebula), and Riveroaks claims to be from OpenAI, gpt 4o, while Maverick, Spider and 24_karat_gold seem to have a similar style due to their shared system prompts and origins at Meta.

Links mentioned:

LMArena ▷ #announcements (1 messages):

Alpha Arena updates, Copy Code feature, Image generation, Bug reports

Alpha Arena Adds Copy Code and Images: The Alpha Arena now features a copy code function and image generation capabilities.
- Users can try out the new features at alpha.lmarena.ai using the password still-alpha.
Alpha Arena Testers Requested to Give Feedback: Testers are encouraged to provide feedback via a Google Forms link and report bugs via an Airtable link.
Outdated Browsers Cause Airtable Issue: Users experiencing issues with Airtable are advised to use the desktop app or update to the latest version of Chrome, Firefox, Safari, or Edge.
- This suggestion was made to resolve potential compatibility issues.

Links mentioned:

Cursor Community ▷ #general (867 messages🔥🔥🔥):

Gemini 2.5 Pro Reasoning, Trial Abuse and Account Flagging, Roo Code Alternatives, Model Context Protocol, AI-Generated KFC Ad

Gemini 2.5 Pro: Reasoning Debated: A member questioned why Gemini 2.5 Pro doesn't reason, stating He doesn't think, he responds very quickly, sparking a discussion about its capabilities.
- Others defended Gemini's abilities in specific scenarios, while some suggest Claude 3.7 handles complexity and detail more effectively.
Account Restrictions Spark Trial Abuse Debate: After a user expresses confusion about their account limitations, another member claims the account was flagged for abusing the trial, needing a credit card.
- Another user suggested alternatives like Windsurf or Cline to bypass the payment issue.
Comparing AI Model Performance and Tooling: Members discussed the performance of Gemini 2.5 Pro versus Claude 3.7, with some preferring Gemini 2.5 Pro and others finding it only useful for simple tasks, while one preferring Sonnet 3.7 Thinking.
- Discussions also covered the use of different tools like Roo Code and methods for prompt engineering, with emphasis on keeping prompts simple and clear and focusing on multiple shots for each task.
Discussing AI replacing Jobs, and the need for ML and AI knowledge: Members discussed the future of AI and its potential impact on employment, with one suggesting that 86% of jobs could be replaced by 2030.
- The response was to learn ML/AI and Prompting properly and polynomials with regressions.
Cursor's Free Model Questioned: Members questioned Cursor for charging people to use a free model, with an explanation that Cursor's API usage is managed through their wallet, and they’ve got deals with some AI models via Fireworks.
- The consensus was that Cursor has limited token usage but it is like 10x cheaper then Claude.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (256 messages🔥🔥):

Blackwell Support, VLM Training, GRPO Usage, Gemini 2.5 Pro, Training with Unsloth

RTX Pro 6000 gets PyTorch Support: A user asked about Blackwell support and mentioned having an RTX Pro 6000 with CUDA 12.8 and sm_120 to finetune Mistral.
- Another user replied that PyTorch nightly supports it, but recompiling everything is required.
RAG Reward Ruminations: A user asked about using GRPO for RAG or similar variants with tools, and how to reward that.
- Another user outlined primary reward components, including retrieval quality rewards (relevance, diversity, accuracy), generation quality rewards (factual consistency, citation accuracy, completeness), and tool usage rewards (appropriate selection, correct usage, effective incorporation).
Unsloth's Training Precision Pointers: In a discussion about training speed and precision, it was stated that 16-bit LoRA is the most precise and fastest if VRAM is not limited, and that Unsloth has optimizations for 16-bit.
- It was also suggested to benchmark both 4-bit and 8-bit to see the difference and gain practical experience.
Multi-GPU Marvels Incoming: It was revealed that multi-GPU support is coming soon to Unsloth.
- The first release will likely include only data parallelism, and fsdp (Fully Sharded Data Parallelism) may not be included initially and will be under the AGPL3 license.
DeepSeek Training Trials: One member lamented that they can't train on deepseek!
- It may require two nodes of H100s to train DeepSeek, even with QLoRA.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (48 messages🔥):

Lightweight Pretraining Techniques, Bonsai pretraining, BitNet training Costs, Qwen Model Rebenched, Exllama2 vs vLLM Inference

Investigating SOTA Lightweight Pretraining for 64-GPU alternative: A member suggested investigating SOTA lightweight pretraining techniques (MoE, FP8) to achieve pretraining with a single node in a couple of weeks instead of 64 GPUs.
- They shared a link to deepgrove-ai/Bonsai suggesting it might be possible to pretrain with only $70 and 3.8b tokens on a BitNet.
DeepGrove's Bonsai Claims $70 BitNet Pretraining: A member expressed skepticism about DeepGrove's Bonsai claim of pretraining a BitNet with only $70 and 3.8b tokens.
- They are running the model in Kaggle to see if it holds up and explore possibilities of the model being a blindly copied Qwen model or continue trained Qwen to BitNet.
BitNet Verification Challenges Explored: A member shared a code snippet for a modified weight quantization to determine if a model is based on BitNet architecture.
- The code uses per-tensor quantization to 1.58 bits, with no grouping needed for quantization.
Fastest inference engine debate: exllama2 vs vLLM: A member asked about the fastest inference engine for single request non-batched decoding for Llama / Mistral 4bit quants, particularly comparing Sglang/lmdeploy and vLLM.
- The member assumes vLLM might not perform well in non-batched decoding due to its engine needing to go through llm_engine.step().
TurboDerp's exllama2's Dynamic Mode Explored: A member shared the exllama2's dynamic mode link, noting all forward calls go through the generator, requiring control handoff to the generator job scheduling.
- Other members suggest TensorRT LLM for single token generation, while some suggest hooking the forward pass in exllama.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #help (248 messages🔥🔥):

Orpheus Dataset Issues, Model Evaluation Problems, Gemma 3 Inference Samples, Fine-tuning with PDFs, Vision Fine-tuning with Gemma 3

Dataset causes value error: A user encountered a ValueError: expected sequence of length 203 at dim 1 (got 885) when using a custom dataset in Unsloth Orpheus format, later resolving it by using a GPU.
- Another user mentioned that the Orpheus dataset uses SNAC, which operates at 24kHz.
Model Evaluation Incoherence Surfaces: A user reported experiencing issues during model evaluation, with the model generating incoherent text despite coherent text generation during normal inference runs.
- It was suggested that enabling report_to can help log metrics to platforms like Wandb, especially when using a custom compute_metrics function.
Gemma 3 does Text-to-Image!: A user asked for image and text inference samples for Unsloth/Gemma 3 using Hugging Face, referencing a Gemma 3 demo on Hugging Face Spaces.
- It was noted that while Llama 3.2 Vision requires an image, Gemma 3 should not have the same issue.
Turning PDFs into Chatbots, Dataprep needed: A user sought guidance on fine-tuning a model using only documents (PDFs) to specialize in a language or field, after converting the PDFs to text using Langchain.
- It was suggested to use synthetic data generation via augmentoolkit, emphasizing that this process is outside the scope of Unsloth and that they should look at Unsloth docs.
Mamba Gains Eager Attention: Users discovered a fix for Mamba implementation issues by setting attn_implementation = "eager", as highlighted in a GitHub pull request.
- Despite the fix, Mamba training was noted to be significantly slower.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (23 messages🔥):

Model Evaluation, Coding benchmarks, Long context benchmarks, Math benchmarks, Gemma 3 vs small LMs

General performance benchmarks non-existent: A member stated that there is no such thing as general performance benchmark and every model is good/bad at a set of different verticals.
- They said it's like a fallacy to believe that by aggregating a bunch of verticals together you'll get a score result that satisfies your particular vertical.
Coding benchmarks: Aider Polyglot and SWE Bench: For coding benchmarks, a member suggested Aider Polyglot and SWE Bench as decently applicable benchmarks.
- However, SWE Bench has issues with being based on llm frameworks, and Aider Polyglot might show how good an llm is when you use it with aider.
RULER is bare minimum for Long Ctx Bench: For long ctx benches, a member stated that RULER is the bare minimum for what should be considered a long ctx bench, and NIAH is garbage.
- They added that some of the recent ones are alright.
Math Benchmarks: AIME is good enough: For math benchmarks, a member suggested that AIME is good enough as long as there's no contamination and there is proper assessment with COT.
- They also mentioned that most coding benches are based on python, but there's WebDev Arena for JS.
Small LMs vs Gemma 3: A member expressed interest in comparing Gemma-3 4B with existing small LMs.
- They asked if Open LLM doesn't have Gemma 3, are there any other viable leaderboards containing Gemma 3 against small LMs.

Perplexity AI ▷ #announcements (2 messages):

Discord improvements, Simplified onboarding, Feedback consolidation, Pro channel access

Discord Overhaul Incoming: The mod team has gathered feedback to enhance the Discord experience and plans to implement three key improvements over the next week.
- Users can expect changes to the onboarding flow and feedback channels, with announcements made in advance to avoid surprises.
Streamlined Onboarding for Newbies: The onboarding flow will be simplified to reduce the number of steps and choices required before engaging with the community.
- The goal is to make it easier for new users to get started and quickly become active members.
Feedback Central: One Channel to Rule Them All: Feedback channels will be consolidated to streamline the process, ensuring the PPLX team stays informed about community requests.
- This aims to make feedback more effective and ensures the team is always aware of user needs.
Pro Channel VIP Access: Efforts are underway to automate access to the Pro Channel, providing advanced support from mods for urgent requests.
- This will ensure that users with time-sensitive needs receive prompt and dedicated assistance.

Perplexity AI ▷ #general (544 messages🔥🔥🔥):

Space Instructions limitations, Image generation discontinued?, Apple Intelligence in the EU, Samsung AI vs Apple Intelligence, GPT Omni shortcomings

Space Instructions offer limited control, members discover: Users discussed that Space Instructions in Perplexity AI do not provide full control over the search experience, mainly affecting output summarization rather than initial data sourcing.
- The limitation means instructions cannot prevent the AI from searching specific topics, as instructions only apply after the relevant data has already been extracted, causing frustration among some users.
Perplexity Image Generation: Missing in Action?: A user inquired about the discontinuation of image creation within Perplexity, noting the feature's absence.
- Another user suggested using the web search to find the generate option, but another confirmed that the function doesn't seem to appear for everyone, perhaps indicating phased rollout or feature testing.
Apple Intelligence Blocked from EU?: A user casually noted yey apple intelligence in the EU now, implying availability, though without further elaboration.
- Following the statement, others swiftly shifted the focus to discussing Samsung AI, with one user claiming it's superior, triggering a debate on the merits of each.
Perplexity Users Grumble About GPT Omni: Users expressed dissatisfaction with GPT Omni, with one describing it as suck ass and questioning how to revert to a previous GPT version.
- Another user explained that Omni is designed for smarter interaction with audio, video, and images, but has been dumbed down compared to GPT-4 for cost reasons.
Rumors abound: Perplexity to launch more Deep Research: A Perplexity team member hinted at an upcoming, more powerful version of Deep Research in the coming weeks.
- Speculations include a potential partnership with Groq, following the addition of text, but not the actual new deep research feature; users report that Deep Research completes in seconds instead of minutes.

Links mentioned:

Perplexity AI ▷ #sharing (10 messages🔥):

Code Tracing in Python, AI Accuracy in Reading, API Research

Python Code Tracing Tricks: A user asked how to trace a code on python.
- No answers were given in the context.
AI Reading Accuracy Questioned: A user asked how accurate is AI in reading.
- No answers were given in the context.
API Research Questioned: A user asked about researching API.
- No answers were given in the context.

Perplexity AI ▷ #pplx-api (5 messages):

Sonar API Access, Tier 2 Credits, JSON Formatting with Pydantic

Sonar API Access Sought: A user inquired about obtaining access to the Sonar API for work-related purposes and requested contact information for a relevant person on the Perplexity team.
- James from the API team responded, offering assistance.
Tier 2 Credits Acquired: A user confirmed they reached Tier 2 with credits.
- They want to tell it that this response will be read by brand managers of FMCG companies so please structure in a manner that is actionable for them.
JSON Formatting Issues with Web Search Results: A user reported issues with the Sonar API adding weird special characters (e.g., "<") to the JSON results when searching the web, despite using pydantic for formatting.
- The user provided an example where extra characters were added to the source_name, source_title, summary, and url fields in the JSON output.

OpenAI ▷ #annnouncements (1 messages):

ChatGPT's new voice Monday, voice mode, voice picker

ChatGPT Introduces New Monday Voice Option: A new voice option called Monday has been introduced in ChatGPT, accessible via the voice picker in the top right corner of voice mode, as demonstrated in the attached video.
Monday Voice Quick Access: Users can quickly access the new Monday voice in ChatGPT by opening voice mode and selecting the voice picker located in the top right corner of the interface.

OpenAI ▷ #ai-discussions (314 messages🔥🔥):

Fake ChatGPT Apps, Gemini 2.5 Pro Rate Limits, Image Generation with Ghibli style, ElevenLabs Voice Model, AI and Creative Industries

Beware! ChatGPT Impersonators Swarm Play Store: Users reported buying ChatGPT through the Play Store but not receiving access, raising concerns about fake impersonator apps, and urging users to check their purchase history to confirm it was with OpenAI.
- It's important to ensure you're using the official app to avoid scams and ensure you have access to OpenAI services.
Gemini 2.5 Pro Hits Rate Limits: Users are reporting getting rate limited on Gemini 2.5 Pro, sparking discussion of whether the limit applies to both free and paid tiers, some users bypass rate limits by using a VPN.
- It was suggested to use Gemini in Google AI Studio where the limits are higher (50 per day).
Ghibli Style Conversions Tickle AI Image Generators: Members experimented with prompts to convert images to Ghibli style, one user shared their prompt "Make this image gibli style", while another suggested "Reimagine this image in the iconic Studio Ghibli style: painterly textures, soft light, and a touch of nostalgic wonder".
- The free models were used, but it was noted more improvements are needed to detail emotions and face details, with some claiming they are far better than destroying Ghibli art style for nothing.
ElevenLabs New Voice Model: Promising but Pricey?: A member shared that they are exploring ElevenLabs' new model for creating narrated audio books, highlighting its voice cloning feature.
- While impressed with initial results and high quality, they await OpenAI to release a similar voice product to avoid subscribing to external services, as for some game developers, it could be useful as a voice acting placeholder.
Navigating AI's Role in Creative Industries: a Tightrope Walk: The discussion touched on the use of AI in creative fields, particularly gaming, with the consensus that AI is often used by non-creatives, resulting in amateurish outputs and overestimation of AI's current capabilities, they referenced this discussion.
- There was an exchange of opinions, with some arguing that AI is mostly assisting professionals for ideation, while others critiqued reliance on statistically average outputs and the need for human effort in creating novel works. AI integration into existing software ecosystems like Adobe and Autodesk was seen as a more promising direction.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (24 messages🔥):

Image generation rate limits, copilot experiences, ChatGPT instructions, 4o abilities, future of image model

OpenAI implements image generation rate limits for Plus users: Due to extreme load since the new image model was released, Plus users are now experiencing rate limits, an interim measure to mitigate the flood of users.
- One user, presumably facetiously, remarked, "At $200 a month you better not get rate limited," referencing a story that OpenAI added 1 million new users in an hour.
Users adapt to copilot: A member shared feeling adapting to copilot, when copilot is "something dumb over and over".
- The member expressed a feeling of adaptation while using copilot.
Users seeking guidance on ChatGPT Prompting: A user sought help to prevent ChatGPT from adding "cool or edgy" concluding remarks to its descriptions.
- Another member suggested a revised prompt, including the line: "Do not add concluding remarks outside the direct scope of fulfilling the request based on the chosen purpose. Stick to the process."
Experimenting photo editing with 4o abilities: A user suggested a mode for editing photos or setting a custom vibe in 4o, preloading context with a game of 20 questions to narrow in on obscure human touches.
- The idea suggests leveraging 4o's ability to read between the lines for better follow-up requests and a personalized experience.
Debating the Future of Image Model Improvements: A member asked whether OpenAI will continue to improve the image model or leave it for a few years, similar to what they did with DALL-E.
- No concrete answers were provided, but the question sparks curiosity about OpenAI's future plans for image generation technology.

OpenAI ▷ #prompt-engineering (9 messages🔥):

Custom Instructions in 'About Me' Box, Memory-Stored Prompts, Personalization in Model Responses, Model Pattern Recognition and Formatting

Custom Instructions Work in 'About Me' Box: A member confirmed that extending custom instructions into the 'About Me' box works perfectly, as information is information to the model.
- Another member noted that the model can figure out likely patterns in your intent and runs with it, even if the fields are split mid-sentence.
RAG-only context limits 'About me' Box: A member questioned whether the 'About Me' box gets confined to some RAG-only context-dependent space and whether it's reliable for storing entire prompts.
- They also mentioned having a ridiculous amount of memories and have tried to fit entire prompts into them, but it doesn’t seem very reliable. And that memory-stored prompts or personas do not activate without specifically requesting them.
Personalization May Not Always Be Evident: A member shared examples of personalized model responses and noted that the first response may not always reflect their personalization.
- They emphasized being very clear about what they want and don't want from the model, including specifications for tool usage and NPC behavior.
Model Learns To Reset Rigid Patterns: A member shared a code snippet FORMAT_RESET to help models acknowledge when they've fallen into rigid patterns and rethink their approach.
- The code encourages the model to analyze what format would better suit the response and completely rethink its approach without defaulting to templates.

OpenAI ▷ #api-discussions (9 messages🔥):

Custom instructions in 'about me' box, Memory-stored prompts, Model Guessing vs Training, FORMAT_RESET for rigid patterns

About Me as Custom Instructions?: A member asked if extending the custom instructions into the about me box works, and another member confirmed that it works perfectly because the model uses it as additional information to work with.
- The model can figure out a likely pattern in your intent and runs with it, even if you split the field mid-sentence; there is no functional reason it wouldn’t work unless “about me” gets confined to some RAG-only context-dependent space.
Memory-stored prompts unreliable?: A member noted they have a ridiculous amount of memories, and while they have tried to fit entire prompts into them, it doesn’t seem very reliable.
- They can’t get memory-stored prompts or personas to activate without specifically requesting them, leading them to think the model is not actually seeing them most of the time or they are a much lower priority than where custom instructions are placed in the model context; shared chat examples of prompt engineering and NPC generation.
Model Guessing vs Training: One member shared their process by presuming that the model either guesses a lot, leading to variation, OR it was trained to output in a typical pattern which was not specifically asked for, OR it is doing exactly what was asked for.
- Conflicts are mandatory to find and fix, as they usually degrade the performance, particularly when the model is trained that humans prefer X but the user prefers otherwise.
FORMAT_RESET for rigid patterns!: A member created a little thing for when you catch a model following a format/pattern you don't like/want, as a way to acknowledge that the model has fallen into rigid patterns and rethink its approach without defaulting to templates.
- They provided a code snippet to tell the model FORMAT_RESET: Acknowledge you've fallen into rigid patterns, analyze what format would better suit your response, and completely rethink your approach without defaulting to templates.

LM Studio ▷ #general (198 messages🔥🔥):

eGPU with LM Studio, Gemini 2.5 Pro Evaluation, Gemma 3 27B Performance, Local LLM recommendations, Copilot hurts developer experience

Plug eGPU to LM Studio!: Members discussed the feasibility of using an eGPU with LM Studio, suggesting it should work as long as the computer recognizes it, despite slower speeds, referencing a YouTube video comparing LLMs on RTX 4090 Laptop vs Desktop.
Gemma 3 Beats Gemini 1.5 Flash: A member shared a comparison where Gemma 3 27B outperforms Gemini 1.5 Flash in several benchmarks, like MMLU-Pro and Bird-SQL.
- Another member confirmed excellent results with Gemini 2.5 Pro, while another user used Gemini 2.5 Pro to produce the data, available free on OpenRouter.
Qwen Coder 7B recommended!: For coding on a system with a 4060 Ti and i5 12400F, Qwen Coder 7B was recommended and available on LM Studio's model page, with suggestions to offload most of it to the GPU and also use Qwen Coder 14B or 32B.
- Members emphasized that a local LLM would perform much worse than cloud alternatives like ChatGPT or Deepseek, but Gemini 2.0 Flash was considered a top performer, costing only $0.44 per 1M input tokens according to their pricing documentation.
Copilot's Coding Critiqued!: Members debated whether AI assistance in programming is beneficial, with one arguing that it hurts more than helps because the average user learns on AI slop.
- Others disagreed, stating that Copilot works great for experienced developers, but one person claims the recommendations given are trusted too easily by the average user, in addition to concern that copilot is trained on garbage code.
One Parameter LLM Possible?: In a lighthearted exchange, it was discussed that a one-parameter LLM is possible but useless, and one user indicated they tried 656K but it can't chat though.

Links mentioned:

LM Studio ▷ #hardware-discussion (63 messages🔥🔥):

Nvidia Drivers instability after 10-12 hours of usage, M4 Max vs 5090 Speed Comparison, Mac vs Nvidia GPUs for LLM, Tenstorrent Wormhole performance on Discord, Context Overflow and Shared Memory impact on LLM speed

Nvidia Drivers Crash After Extended Runtime: A user reported Nvidia driver instability after running models for 10-12 hours, requiring a driver reinstall to resolve performance issues.
- The user clarified the issue was with the Nvidia driver itself, not the Windows OS, and sought to find if others experienced the same.
M4 Max gets Speed Boost vs 5090: A user observed a 3.24x speedup from M4 Max to 5090 after resolving crashing issues, which aligns with the 3.28x ratio of their memory bandwidths when doing QwQ 32B 4 bit quant comparisons.
- They're now seeing around 21 tok/s on M4 Max and roughly 60 tok/s on 5090 when testing Gemma 3 32B q4.
Mac Freedom of Context Size vs Nvidia Faster GPUs: While Nvidia GPUs may be faster, users are leaning towards Macs for the freedom to have more context size.
- The user highlighted that even though NVIDIA GPUs are faster, the ability to use larger context sizes is proving to be more useful.
Tenstorrent Wormhole results sought out in Discord: A user inquired about performance results for the Tenstorrent Wormhole (n150d and n300d) within the Discord community.
- They expressed interest in obtaining TOK/s metrics for these models, but there was no follow up.
Overflowing Context impacts LLM speed: A user wondered what would happen if they could load the context overflow into the shared memory/system RAM while keeping the entire model in VRAM.
- Another user noted that the LLM needs all the context in VRAM to generate the next token, because for each token generated, all the context goes through the transformer blocks, again and again.

Link mentioned: M3 Ultra vs RTX 5090 | The Final Battle: M3 Ultra Mac Studio vs AI beast with NVIDIA RTX 5090Efficient. Productive. Organized. | Baseus Spacemate Series（MAC）11-in-1 Docking StationBuy on Amazon.US: ...

aider (Paul Gauthier) ▷ #general (230 messages🔥🔥):

Gemini 2.5 Pro experiences and limitations, RateLimitError automation strategies, Dot Command Revolution, F#, Video analysis

Gemini 2.5 Pro: A Hot Mess of Highs and Hallucinations?: Users are experimenting with Gemini 2.5 Pro and reporting mixed results with some models hallucinating/DC'ing while others are providing top-tier performance for coding tasks.
- One user noted, "Gemini is hallucinating / dc'd for me, same for you guys?", while another stated that the combination of Gemini 2.5 Pro and DeepseekV3 is *"almost free and top tier."
RateLimitError Woes: Token Limits or Request Frequency?: A user reported frequent RateLimitErrors when requesting summaries and clearing history, and was looking for solutions to automate this process.
- Paul Gauthier clarified that the rate limit is likely based on the number of requests per minute or day, rather than the token count. One possible solution may be found in this Github issue.
Dot Command Revolution: Aider's productivity hack?: A user is trying to promote the use of .dotcommands as a productivity hack for developers, enabling them to automate tasks with single-line commands such as .status and .next.
- The goal is to provide cognitive shortcuts optimized for clarity and specific functionality, but no one is using them. It has led to the quip *"THE DOT REVOLUTION IS HERE 🔥 Coders everywhere will want to try this one cool trick."
****F#: Condolences or Kudos?: A user mentioned rebuilding their app from Python into F#, prompting mixed reactions, including condolences and suggestions to "use Haskell".
- While the user explained they were working on ML projects, the community seemed skeptical about the choice of F# for such tasks.
Video Analysis: Beyond the Transcript: A user inquired about AI models' comprehension of videos, wondering if they understand emotional impact or follow visual storylines beyond just processing transcripts.
- One response indicated that "Gemini's video understanding is 1 frame per second of the video fed into the model as an image."

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (30 messages🔥):

Temperature for coding, Stopping benchmarks, Aider with subdirectories, Aider local config, Model Summarization fails

The Coder's Icy Preference for Temperature: Members discussed the optimal "temperature" for coding, with 0 being the popular choice in the channel.
- A member asked for justification for this value, requesting is it based on smth?
Aider's Subtree Savior for Mono-Repos: A member asked how to limit aider to a subdirectory of a monorepo, prompting a response to use the --subtree-only switch after changing to the desired directory.
- This sets aider to ignore the repo outside the starting directory, though the asker noted the docs need updating and pointed to the FAQ on large monorepos.
Config Conundrums: Model Settings in Aider: A member reported that specifying model names in a local YAML config file wasn't working as expected.
- Despite the startup message showing the correct config settings, aider still defaulted to anthropic/claude-3-7-sonnet-20250219 rather than the configured deepseek/deepseek-chat.
Linting Loops Launching with Aider: A member inquired about running linters within aider, with another suggesting the use of /run [npm|pnpm] run [lint|fix|whatever-command] for a tight feedback loop.
- Another member pointed to the sample aider.conf.yml file for listing multiple linters.
Architect Model Results Get a Promotion: A member sought a way to directly send a satisfactory response from the architect model to the editor model to conserve 2.5 Pro shots.
- The suggestion was to open a new aider instance with --restore-chat-history and a suitable editor configured, though the lack of a --no-architect flag was noted as an inconvenience.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):

Organizations leave Beta, Web search results in Chatroom, Cerebras on OpenRouter, PDF support for OpenRouter API

Organizations is Out of Beta!: OpenRouter announced that the Organizations feature is now out of beta, allowing teams to control billing, data policies, provider preferences, and API keys in one place, detailed in this X post.
- During the two-week beta, over 500 organizations were created, giving teams complete control over data policies and consolidated billing.
Web Search Hits the Chatroom!: Web search results are now available in the chatroom, with Perplexity results formatted similarly to OpenRouter's :online model variants.
Bluesky plea: A member requested that OpenRouter post on Bluesky as well, suggesting less reliance on Xitter.
Call for Cerebras!: A member asked OpenRouter to talk to Cerebras about adding them to OpenRouter.
PDF support in API?: A member inquired about when the OpenRouter API will support PDF files.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Today we're taking Organizations out of beta.With Organizations, teams have complete control over data policies and consolidated billing, adding peace of mind across dozens of model providers.Key ...

OpenRouter (Alex Atallah) ▷ #general (98 messages🔥🔥):

Aider OpenRouter Copilot, Gemini Flash 2 Context, Usage Downloads, Enterprise Level Rate Limits, GPT4o Image Generation

Gemini Flash 2 Middle-Out Transforms: Members confirmed that OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in and only applied by default on endpoints with context length less than 8192 tokens.
- One member clarified that middle out only applies once you hit 1m right? (on flash) even if it's turned on.
Requesting Usage Download: A member inquired about obtaining downloads of their usage data, including tokens and costs, as displayed on the activity page, to verify their credit usage.
- A maintainer responded that while this feature isn't currently available, we're working on it.
OpenRouter Enterprise Level Rate Limits: A user asked about enterprise-level rate limits, clarifying that they disappear with a balance of $500 or more, subject to the upstream provider.
- Another member chimed in that well technically it depends upon the upstream provider.
Auto Router for Fallback Models: A user requested a fallback model option, similar to the existing fallback provider feature.
- Another member pointed out that OpenRouter already has this via the Auto Router and the models parameter, as detailed in the documentation.
OpenRouter EU Provider Selection: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer acknowledged the need but noted limited coverage today, mentioning OpenRouter allows provider selection, recommending seeking an EU certified provider for strict EU data guidelines.

Links mentioned:

Eleuther ▷ #general (43 messages🔥):

Cosine Annealing LR, Mini-batch vs Batch, Gradient Accumulation, Stanford CS 25 Transformers Course, Category theory

Debate on Cosine Annealing Learning Rate (LR) Updates: Discussion on whether Cosine Annealing LR is best updated after every batch or sample, with concerns raised about different samples receiving different training when updating after every sample.
- The recommendation was to update after every mini-batch, ignoring the exposure problem, or attempting to fix it.
Mini-Batch vs Batch Jargon Jungle: Members discussed the difference between mini-batch and batch in machine learning, with the distinction becoming increasingly blurred due to techniques like gradient accumulation and distributed training.
- It was mentioned that a mini-batch is run before each optimizer step, while a batch is a set of unique data, but the term batch size refers to the size of the mini-batch.
Gradient Accumulation: Pro or Con?: Members debated the merits of gradient accumulation, with one member recalling it being previously dismissed but now seeing potential advantages in the early stages of training to calibrate optimizer states.
- Another member noted that gradient accumulation can be beneficial when network communications are slower than the compute, but otherwise, it is considered bad.
Stanford Launches CS25 Transformers Course to the Public: Stanford has opened its CS25 Transformers seminar course to the public via Zoom, featuring discussions with researchers and covering topics from LLM architectures to creative applications.
- The course includes lectures, social events, networking sessions, and a Discord server for discussions, with past lectures available on YouTube.
Category Theory: Reverse Engineering DL?: Someone shared a link to a thought experiment on whether category theory could be the optimal language for reverse engineering deep learning.
- The original post argued that neural networks have embeddings, or meaningful patterns of neuron activation rather than representations.

Links mentioned:

Eleuther ▷ #research (21 messages🔥):

ACL Rebuttals, Deep Sets for Triangle Area, Comparing Language Model Embeddings, Relative Representations, Convergence of Representations in AI

Reviewers Get Nudged After Rebuttal Submission: A member asked about sending an extra message to ACL reviewers the day after submitting a rebuttal and another member suggested it's reasonable if the rebuttal deadline is closing or if follow-up might take a few days.
- The original poster planned to follow up that evening, with the deadline being Thursday.
Deep Sets Calculate Triangle Area, Yields Zero Insights: A member shared a link to a paper titled Deep Sets for the Area of a Triangle (arxiv link), which presents a polynomial formula for triangle area in Deep Sets form.
- The abstract concludes that the project, motivated by questions about computational complexity of n-point statistics in cosmology, gained no insights of any kind.
Comparing Language Model Embedding Matrices: A member inquired about methods for analyzing and proving the similarity of two language models trained with the same tokenizer but different dimensionality embedding matrices.
- Suggestions included relative representations, least squares mapping, and comparing the leading entries of the eigenvalue decomposition of W^T W.
Relative Representations Proposed as Solution, But Maybe Not: A member suggested relative representations (arxiv link) as a potential solution for comparing language model embeddings, while also cautioning about their limited applicability.
- They linked a paper discussing cosine similarity inflation in neural representations (arxiv link) and further pointed out related works discussing whether cosine is the best way to assess similarity.
AI Representation Convergence, Plato Style: A member linked a paper arguing that representations in AI models, particularly deep networks, are converging towards a shared statistical model of reality, akin to Plato's concept of an ideal reality (arxiv link).
- Others suggested using CCA or SVCCA to compare the embedding matrices, referencing papers on Singular Vector Canonical Correlation Analysis (arxiv link) and projection weighted CCA (arxiv link).

Links mentioned:

Eleuther ▷ #scaling-laws (4 messages):

Learning Rate Impact, Scaling Efficiency, Model Oomph

Learning Rate Affects Scaling Efficiency: A member stated that a bad learning rate changes the efficiency of scaling, which affects constants A & B.
- Another member added that a bad learning rate also changes how much oomph the model can get out of a given amount of data, which would seem to implicate beta.
Bad Learning Rate Bad: Bad learning rate is bad.
- Like, really bad.

Eleuther ▷ #interpretability-general (5 messages):

Neuronpedia Open Source, Delphi auto-interp server update, Actionable Interpretability Workshop at ICML 2025, Neuronpedia Datasets

Neuronpedia Goes Open Source!: The interpretability platform Neuronpedia is now MIT open source and available on GitHub with a quick Vercel deploy.
Delphi Auto-Interp Server Set for Update: Neuronpedia's auto-interp server, which utilizes Eleuther's Delphi (previously sae-auto-interp), is slated for an update to the latest version.
- The update aims to introduce new scoring and explaining types, facilitated by Neuronpedia's modular design and the existing OpenAPI schemas for the Delphi auto-interp server.
Dive into 4+ TB of Neuronpedia's Data!: A trove of interpretability data, totaling over 4 TB, is available for download as Public Datasets.
Actionable Interpretability Workshop Accepted to ICML 2025: The Actionable Interpretability workshop has been accepted to #ICML2025 and is accepting paper submissions until May 9th per this tweet.

Links mentioned:

Eleuther ▷ #lm-thunderdome (28 messages🔥):

Debugger updates, SmolLM Evaluation Issues, Open LLM Leaderboard Normalization, Subtask Aggregation PR

Debugger Status remains vague: Members requested updates on the debugger's progress, but the specific status remained unclear, with a member offering assistance by asking what branch the debugger was working on.
- A member shared code modifications, suspecting they might be causing unnecessary load, and later reported fixing a bug related to the number of choices in questions, suggesting a PR submission.
SmolLM leaderboard evals return empty aggregate scores: A member reported that aggregate scores for tasks like leaderboard_bbh, leaderboard_math_hard, and leaderboard_musr were empty in the results JSON when running leaderboard evaluations with lm-eval on SmolLM-1.7B.
- They provided the command used and example output, noting that individual tasks reported numbers as usual, and linked to the Hugging Face Dataset Card.
Non-standard Normalization on the Open LLM Leaderboard: The discussion highlighted the use of a non-standard normalization method on the Open LLM Leaderboard for evaluating and comparing LLMs.
- The normalization was introduced to address issues with optimized prompts and evaluation setups that inflate model scores.
Subtask Aggregation PR Adds Subtask Scores: A member shared a PR adding subtask aggregation, copied from a Hugging Face fork, to address missing aggregate scores in tasks with subtasks.
- Another member tested the PR, reporting that installing lm_eval via editable triggered an unrelated error, but the PR otherwise appeared to work as expected.

Links mentioned:

Eleuther ▷ #gpt-neox-dev (5 messages):

GPT-NeoX Pre-training on NVIDIA DGX Cloud, SLURM cluster restrictions, torchrun, DeepSpeed Launch modes

Bypassing DeepSpeed Launcher on NVIDIA DGX Cloud: A member is pre-training GPT-NeoX on NVIDIA DGX Cloud but must bypass the default deepy.py launcher due to SLURM and SSH restrictions, using a custom script that leverages a hostfile and torchrun.
- The member is using this script to perform argument parsing and launch train.py, and has questions regarding their approach.
Debating Direct train.py Execution: A member asked if they could directly start python train.py with encoded ds_config and megatron_config arguments, and how to handle GPU process spawning without torchrun.
- Another member confirmed this approach for manual bypassing, suggesting further modular refactoring to inject node-local processes that self-assign rank and detect GPU count, coordinating through principle rather than protocol.
Navigating DGX Cloud with Torchrun: A user is using torchrun due to cluster restrictions and disabled SSH, referencing a comment and sample script from the NVIDIA DGX Cloud documentation.
- They are seeking guidance on whether they are implementing their custom solution correctly and whether they have reinvented the wheel.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (70 messages🔥🔥):

CodeScientist, OpenAI open language model, Meta's smart glasses, Multi-subject RLVR

CodeScientist Automates Scientific Discovery: AllenAI introduces CodeScientist, a system for autonomous scientific discovery that uses genetic search over research articles and codeblocks to generate and evaluate machine-generated ideas, with 19 discoveries resulting from hundreds of experiments in agents and virtual environments, detailed in their paper.
- The system addresses limitations in current ASD systems by exploring broader design spaces and evaluating research artifacts more thoroughly, though one user noted that generated papers are rather short listicles PDFs and all papers are negative results.
OpenAI Teases Open-Weight Language Model: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback to maximize its utility, detailed in Sam Altman's tweet and OpenAI's feedback form.
- Altman stated that they will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.
Meta Plans Smart Glasses with Screen: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman's report.
- Members are interested to see how they'll do against xreal.
Multi-subject data for paper Expanding RL: A multi-subject multiple-choice QA dataset ExamQA is used in the Expanding RL with Verifiable Rewards Across Diverse Domains paper.
- The dataset consists of 638k college-level instances, with both questions and objective answers written by domain experts for examination purposes.
ChatGPT Gets a New Voice: OpenAI announced a new voice in ChatGPT, generating excitement and speculation about potential capabilities.
- One member joked about this being an April Fool's joke, while others expressed genuine interest.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (24 messages🔥):

Pydantic Evals, Grok solves math, Gemini vs GPT 4.5, MidJourney v6, GPT-4o translation

Pydantic Evals is here: Pydantic Evals is a powerful evaluation framework designed to help you systematically test and evaluate the performance and accuracy of the systems you build, especially when working with LLMs.
Grok solves math problems: After several unsuccessful attempts, a member found a prompt that got Grok to solve a math problem (the well-known Dubnovy Blazen problem in graph theory), showcased in this tweet.
Gemini is overly eager: A member compared Gemini to GPT-4.5, observing that Gemini is overly eager to explain everything, write a lot, while making subtle, childlike jokes here and there, like an autistic engineer.
MidJourney is cooking: MidJourney is currently in a preview / rating phase for the final model (likely drops tomorrow) and they are absolutely cooking.
GPT-4o translation: Members commented that the GPT-4o is a simple translation which represents the reader's preference for simple english but a lot is lost in the translation itself, showcased in this YouTube video.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #rl (4 messages):

KL Penalty in RL, Base Models vs Instruct Models, Reasoning and Reinforcement Learning

KL Penalty Dropping Debated for RL: The question arose as to why dropping the KL penalty might be beneficial when performing RL on base models but not on instruct models, as mentioned in Nathan Lambert's post.
Reasoning Needed for Base Models: It was suggested that a larger change is needed on base models, but it may change for models that have a reasoning component.
RLHF Book: Nathan Lambert is writing a book on RLHF that he strongly recommends reading.

Link mentioned: Recent reasoning research: GRPO tweaks, base model RL, and data curation: The papers I endorse as worth reading among a cresting wave of reasoning research.

Interconnects (Nathan Lambert) ▷ #reads (7 messages):

OpenAI returning, Long timelines to advanced AI

Lambert airs OpenAI thoughts: Nathan Lambert shared his thoughts on OpenAI returning in a substack post, mentioning that he may use this format for unbaked career thoughts too.
- He also mentioned DMing some OpenAI folks about it, hoping to find allies of open source who feel exiled by the current situation.
Toner's Rising Tide Substack launch: Helen Toner launched her new Substack called Rising Tide and shared a post on long timelines to advanced AI.
- In the post, she noted that arguing for anything like human-level AI in the first half of the 21st century used to be a bold claim requiring strong evidence.

Link mentioned: "Long" timelines to advanced AI have gotten crazy short: The prospect of reaching human-level AI in the 2030s should be jarring

GPU MODE ▷ #general (46 messages🔥):

CUDA occupancy, GPU parallel processing, A100 thread limit, GRPO training with Qwen

Debate sparks over maximum parallel threads on A100 GPUs: A discussion arose regarding the calculation of the maximum number of threads that can run in parallel on an A100 GPU, with one member stating the number is 96 * 2048.
- Another member uses GeoHot's tool to test this hypothesis, showing that the practical limit on their A100 (96SM) GPU is 24576, or 256 threads per SM before performance degrades.
Warp scheduling explained as a way to hide latencies: The discussion clarified that while a GPU can have many concurrent threads, they may not all run truly in parallel due to resource limitations like register space and shared memory.
- A member pointed out that GPUs use oversubscription to hide latencies, and context switches between warps are cheap (~1 cycle), unlike CPUs, adding threads above the limit for "parallel threads" does not necessarily increase runtime, or at least not significantly/measurably.
Experiments in GRPO training with Qwen 0.5B Model: A member shared that they have finished a GRPO training run with Qwen 0.5B (code instruct) on the GPUMODE kernel dataset, but the model didn't effectively generate Triton kernels.
- They hypothesize that SFT to teach the model the basics of Triton implementation, followed by GRPO for refinement, will be more successful.

GPU MODE ▷ #triton (2 messages):

Disable autotune, Triton kernel

Disable Triton Autotune Temporarily: A member asked for a way to disable autotune temporarily because their Triton kernel is called in two situations, where only one needs autotuning, and they are using the triton.autotune decorator.
- Another member suggested using a global variable to turn autotune on/off and reload the module, or using autotune inside the kernel instead of as a decorator.
Global variable trick: One member suggests turning autotune on/off using a global variable and reloading the module that contains the Triton kernel.
- The other option is to use autotune inside the kernel, not as a decorator, which doesn't require reloading the module.

GPU MODE ▷ #cuda (2 messages):

Request for PMPP book PDF, PMPP book

Request for PMPP book PDF: A member asked for the PMPP book PDF of the recent version.
- No links or further details were provided in the message.
PMPP Book Inquiry: A user requested the latest version of the PMPP book PDF.
- This request did not include any links or further context.

GPU MODE ▷ #torch (5 messages):

FlexAttention, Arbitrary Sequence Lengths, PyTorch 2.6, Tensor Subclass Use Case, Memory savings

FlexAttention Embraces Arbitrary Sequence Lengths: FlexAttention now supports arbitrary sequence lengths, addressing the previous requirement for segment sequence lengths to be a multiple of 128, as of PyTorch 2.6.
- This enhancement was discussed with Horace He at a GPU mode event in San Jose.
Tensor Subclass Use Case Questioned: A user inquired about the intended use case for a tensor subclass.
- This suggests a potential issue or area for improvement in PyTorch's tensor subclassing functionality, prompting further investigation.
Desire for Memory Savings Using Tensor Deletion: A user is seeking methods to delete argument tensors within a loss function to achieve memory savings of approximately 7GB.
- The user wants to free the storage associated with a tensor after it's no longer needed, even if a reference exists in the outer scope, while ensuring it remains compatible with torch compilation to avoid graph breaks; see the GitHub Issue for more information.

Link mentioned: Graph break on Tensor._make_subclass · Issue #150265 · pytorch/pytorch: 🐛 Describe the bug I am having the following problem from torch import nn import torch torch_compile_options = { "epilogue_fusion" : True, "max_autotune" : True, "shape_paddi...

GPU MODE ▷ #cool-links (1 messages):

marksaroufim: https://arxiv.org/abs/2503.20313

GPU MODE ▷ #jobs (1 messages):

MLX, Apple hiring, ML systems

Apple's MLX Team is Recruiting: Apple is hiring engineers to join their MLX team and advance the frontier of ML and systems.
- The role involves building scalable, distributed training and research pipelines, working with researchers and software engineers on novel ML research algorithms.
ML System Development at Apple: Apple's Machine Learning Research org focuses on building technologies that will power future products.
- They seek engineers with system engineering and software development backgrounds to build scalable, distributed training and research pipelines.

Link mentioned: AIML - Software Engineer for MLX, MLR - Jobs - Careers at Apple: Apply for a AIML - Software Engineer for MLX, MLR job at Apple. Read about the role and find out if it’s right for you.

GPU MODE ▷ #beginner (2 messages):

CUDA Program Execution, GPU Volumetric Data Processing

GPU Eats Gigabytes in Volumetric Data: For models processing volumetric data, like in the medical domain, a volume of 512³ voxels, 32 channels and fp16 activations can result in 8GiB of data per layer.
- This highlights the significant memory requirements for certain types of GPU computations.
CUDA Kernel Code Compilation and Execution Explored: A member is trying to understand how execution of a CUDA program works and wants to know what exactly is sent over the PCIe bus, from the CPU to the GPU.
- They assume that kernel code is compiled into some GPU-machine byte code, and when a call to kernel code is made, this code is then sent to the GPU.

GPU MODE ▷ #off-topic (2 messages):

Egg noodles with chicken and vegetables, Image Analysis with YouTube

Egg-cellent Noodle Dish Debuts: A member showcased a dish of egg noodles with chicken and vegetables in soy sauce with black pepper, featuring egg noodles, soy sauce, chicken fillet, onion, sweet red pepper, French green beans, beef fat, and sesame.
- An image of the dish was shared (IMG_20250401_045505.jpg).
YouTube Analysis Uploads: An image analysis was conducted with a YouTube video titled * - YouTube*, although the description of the video is undefined.

Link mentioned: - YouTube: no description found

GPU MODE ▷ #irl-meetup (3 messages):

NYC Meetups, Community Meetup

NYC Meetups in the Works: A member inquired about any meetups in NYC and another member confirmed they are planning something.
- The inquiring member responded with excitement, indicating interest in attending.
Community Plans Meetup: A community meetup is planned.
- Enthusiastic community member is excited about upcoming plans.

GPU MODE ▷ #self-promotion (1 messages):

Megatron Tensor Parallelism, Fused/Parallel CE Loss

Deep Dive into Megatron Tensor Parallelism is Illustrated!: A member wrote an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, seeking feedback on the content.
- Check out the illustrated deep-dive to deepen your understanding of ML scalability & performance techniques.
Feedback Requested on Megatron-Style Parallelism Deep Dive: The author of an illustrated deep-dive on Megatron-style tensor parallelism is soliciting feedback.
- The article covers aspects like fused/parallel CE loss and aims to enhance understanding of ML scalability and performance.

Link mentioned: Tweet from Daniel Vega-Myhre (@vega_myhre): For any ML folks who want to deepen their understanding of ML scalability & performance techniques, I wrote an illustrated deep-dive into Megatron-style tensor parallelism: https://danielvegamyhre.git...

GPU MODE ▷ #🍿 (1 messages):

AlphaGeometry, LLM for kernel optimization

Scoping AlphaGeometry-style LLM + verifier for kernel optimization: A member inquired about the prior exploration of using an AlphaGeometry-style LLM + verifier approach for kernel optimization.
- They asked if it had been attempted or discussed previously, acknowledging their potential rediscovery of existing concepts, given their newness to the field.
Newbie questions about LLM and kernel optimization: A very new member is rediscovering ideas about kernel optimization and is requesting pointers to past discussions.
- They expressed an interest in anyone pointing them to what happened with the idea of using AlphaGeometry style LLM + verifier for the kernel optimization process.

GPU MODE ▷ #reasoning-gym (9 messages🔥):

OpenAI Open-Weight Reasoning Models, PR Review Requests, Arc AGI PR, Collisions PR, CodeIO Dataset Merged

New Reasoning Gym Banner Shines: A new banner for the reasoning-gym, made with 4o, was shared, with a potential PR to add it to the readme.
- Another member pointed out the Rubik's cube depicted on the banner was "not a valid Rubik's cube reeeeeeeeeee".
OpenAI to Open-Source Strong Models?: Members expressed surprise that OpenAI may publish strong open-weight reasoning models.
- One member speculated this could significantly increase OpenAI's valuation, while another reviewed two outstanding PRs.
Arc AGI and Collisions PRs Ready for Scrutiny: The arc agi and collisions PRs are up for review.
- Changes were requested to the Collisions PR, specifically to unstage notebooks that were simply run without modifications.
CodeIO Dataset Ingested: The CodeIO dataset was merged after a delay; further postprocessing will align it with the existing implementation.
- Thanks to the user who merged the CodeIO dataset.

GPU MODE ▷ #general (3 messages):

.py scripts vs .cu files, active python leaderboards

Clarification Sought: Python vs. CUDA Submissions: A member inquired whether the leaderboards currently only accept .py scripts and not .cu files.
- Another member suggested reviewing a previous message for clarification on the submission guidelines.
Active Leaderboards Confirmation: A member questioned whether all active leaderboards are currently restricted to Python submissions.
- Another member directed them to a previous message, likely containing details about active leaderboards and submission requirements.

GPU MODE ▷ #submissions (17 messages🔥):

vectorsum, conv2d, vectoradd, matmul, grayscale

Vectorsum benchmark floods leaderboard: Multiple benchmark submissions for vectorsum on L4 and H100 GPUs using Modal runners have succeeded, with submission IDs 3372, 3374, 3375, 3395, 3396, and 3397.
Conv2d benchmark succeeds on multiple GPUs: A leaderboard submission for conv2d on L4, T4, A100, and H100 GPUs using Modal runners has succeeded, with submission ID 3373.
Vectoradd benchmarks hit T4 and H100: Leaderboard submissions for vectoradd on H100 and T4 GPUs using Modal runners have succeeded, with submission IDs 3394 and 3399 respectively.
Matmul benchmarks meet A100: Leaderboard submissions for matmul on A100 GPUs using Modal runners have succeeded, with submission IDs 3400 and 3408.
Grayscale tests get going: Multiple test submissions for grayscale on H100 GPUs using Modal runners have succeeded, with submission IDs 3402, 3403, 3404, 3405, 3406, and 3407.

Latent Space ▷ #ai-general-chat (75 messages🔥🔥):

Cursor's Funding Round, Etched's New Transformer ASIC, OpenAI's New Open-Weight Language Model, OpenDeepSearch (ODS), Sophont: Open Multimodal Foundation Models for Healthcare

Cursor Closes Cashy Round, Codes Vibes: Cursor closed a $625M funding round at a $9.6B post-money valuation, led by Thrive and A16z, with Accel as a new backer, achieving $200M ARR, a 4x increase from its previous round in November 2024 (Source).
- Abe Brown noted that Cursor's valuation has grown rapidly, sparking the buzzphrase vibe coding and seeing its valuation possibly reach $10B.
Etched, the Transformer ASIC Startup Etches $85M Round: Etched, a startup developing transformer ASICs, closed an unannounced $85M round at a $1.5B valuation, following two stealth rounds at $500M and $750M, with their chip Sohu able to process over 500,000 tokens per second running Llama 70B (Source).
- Etched claims one 8xSohu server replaces 160 H100s, but Sohu cannot run CNNs, LSTMs, SSMs, or any other AI models.
OpenAI Opens Up: Open-Weight Model Incoming: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback on how to make it maximally useful (Source).
- The company will evaluate the model according to their preparedness framework and host developer events in SF, Europe, and APAC to gather feedback and test early prototypes, and Nathan Lambert expects a 30B parameter reasoning model with MIT/Apache license (Source).
OpenDeepSearch Opens Up Web Search: Seoong79 announced the release of OpenDeepSearch (ODS), an open-source search agent that works with any LLM, outperforming OpenAI’s specialized model for web search, GPT-4o-Search, on the challenging, multi-hop FRAMES benchmark from DeepMind by +9.7% accuracy (Source).
Sophont Startup Seeks to Solve Medical AI: iScienceLuvr announced the launch of Sophont, a company building open multimodal foundation models for the future of healthcare, aiming to create a DeepSeek for medical AI (Source).

Links mentioned:

HuggingFace ▷ #general (42 messages🔥):

DeepSeek R1, xAI Acquires X, Hyperparameter tuning LLMs, SFTTrainer hanging, stable_baselines3 CPU faster than GPU

DeepSeek R1 outmaneuvers Western labs: A user linked to a tweet criticizing lazy attacks trying to downplay DeepSeek R1, which outmaneuvered bloated Western labs through ferocious execution and resource efficiency.
- The member added that DeepSeek also released weights under a maximally permissive MIT license and democratized RL for the GPU poor through GRPO.
xAI gobbles up X: A member linked to a tweet announcing that xAI has acquired X in an all-stock transaction, valuing xAI at $80 billion and X at $33 billion.
- The combination unlocks immense potential by blending xAI’s advanced AI capability and expertise with X’s massive reach.
LLM Hyperparameter Tuning Resource Quest: A member inquired about resources for choosing hyperparameters when fine-tuning LLMs, seeking a god send of a resource that addresses how changing context affects certain hyperparameters.
- Another member suggested checking out Unsloth's Discord and linked to Unsloth's LoRA Hyperparameters Guide.
SFTTrainer freezes mid-training: A user reported that their SFTTrainer was hanging after truncating the training dataset, and it timed out after an hour.
- A member suggested that the issue might be due to a lack of progress bar appearance and potential misconfiguration of TrainingArguments or Trainer settings.
CPU outruns GPU with stable_baselines3: A user reported receiving a warning about running PPO on the GPU with MlpPolicy, suggesting it's primarily intended for the CPU, and linked to a GitHub issue.
- The user was confused why the CPU might be faster than the GPU when running a Multi-Layer Perceptron.

Links mentioned:

HuggingFace ▷ #today-im-learning (3 messages):

Agents Course Unit 2.1, Run Jupyter Lab Locally, RL Course Frozen Lake issue

Agents Course Unit 2.1 Runs Locally: A member mentioned they are learning Agent Course Unit 2.1 and it works when run using the kernel of a local venv having jupyterlab and its widgets installed.
- They noted that Colab is not an option for them because they do not have a Google account.
Instructions to run Jupyter Lab Locally: A member is looking into how to get a notebook to run, asking if they should clone the repo and run jupyter-lab locally using these instructions.
- The user expressed confusion on where they should run it but mentions that if they used colab google, they're unsure how to link the notebook to the Colab Google Workspace.
Frozen Lake Code Fixed: A member noticed that the code offered in Unit 2 in the RL Course for Frozen Lake was not working due to a Python Version issue.
- They shared a link to their HuggingFace page with code to resolve the pickle5 problem.

Link mentioned: What are LLMs? - Hugging Face Agents Course: no description found

HuggingFace ▷ #cool-finds (3 messages):

OpenHands LM, Autonomous Agents, Nature article on data access

OpenHands LM opens coding!: The new open coding model OpenHands LM is available on Hugging Face and a reasonable size at 32B to run locally, according to a member.
- It is intended for use in autonomous agents for software development and more information is available on the project blog.
Data access is restricted on Nature's article: A Nature article has restricted access to data under a clinical trial protocol, to share deidentified information with researchers but prohibits it from being publicly available.
- To protect the participant’s anonymity, any information that could identify her will not be part of the shared data, specifically her personalized voice synthesizer.

Links mentioned:

HuggingFace ▷ #i-made-this (1 messages):

tonic_1: very cool

HuggingFace ▷ #computer-vision (1 messages):

YOLO vertical object detection, CNN vertical object detection, Instance segmentation fragments

YOLO & CNN Seek Vertical Vision Boost: A member inquired about enhancing YOLO or any CNN's ability to detect vertical objects, asking if increasing depth would help.
- Responses suggested exploring data augmentation techniques or custom loss functions.
Fragmented Instance Segmentation Fixes: A member is facing issues with their instance segmentation model detecting fragments of the same objects.
- They asked for suggestions on how to make the model recognize these fragments as one object, such as using label tags across segments.

HuggingFace ▷ #gradio-announcements (2 messages):

Gradio Milestone, Million monthly active developers

Gradio Reaches One Million Monthly Active Developers!: Gradio announced it has achieved a milestone of 1,000,000 monthly active developers using the platform to create and share AI interfaces.
- The Gradio team expressed gratitude to the community for their invaluable contributions in achieving this significant milestone.
Community Celebrates Gradio's Success: Members of the Gradio community celebrated the platform's achievement of reaching one million monthly active developers.
- Community members acknowledged Gradio's impact on enabling ML researchers and companies to build production-ready AI interfaces, highlighting the platform's growth and importance in the AI landscape.

HuggingFace ▷ #agents-course (16 messages🔥):

OpenAIServerModel with Ollama, Langraph OpenAI API model alternatives, Release of Unit 3

Ollama Plays Well With OpenAIServerModel: Members discussed using OpenAIServerModel with Ollama, given its compatibility with the OpenAI API.
Seek Alternatives to Langraph OpenAI API Model: One member requested recommendations for alternatives to the Langraph OpenAI API model for an email agent.
Unit 3 Delayed, Community Screams Into Void: Many members are eagerly awaiting the release of Unit 3 for the Agent course, though one member noted its release was delayed.
- One member joked maybe if we all keep yelling into the void, the void will call back.

HuggingFace ▷ #open-r1 (1 messages):

Liger Kernel, GPU Memory Occupation, Speed vs. Memory Trade-off

Liger Kernel: Speed Boost vs. Memory Hog: A user found that applying the Liger kernel significantly improved speed but resulted in high GPU memory occupation.
- They questioned if their application method was flawed, seeking advice to optimize memory usage without sacrificing performance.
Analyzing Liger Kernel's Memory Footprint: The user's experience highlights a potential trade-off between computational speed and memory consumption when using the Liger kernel.
- Further investigation is needed to understand the kernel's memory management and identify possible optimization strategies.

Modular (Mojo 🔥) ▷ #general (9 messages🔥):

MAX 25.2 livestream, Chris lightning talk, GTC Chris video

MAX 25.2 Livestream Kicks Off!: Modular's MAX 25.2 livestream was announced, inviting viewers to join via LinkedIn or YouTube to ask the team questions live.
- Due to technical difficulties, a new livestream link was shared (YouTube), “Introducing MAX 25.2 Live!”.
Apologies for Tech Glitches During Livestream!: Members apologized for the technical issues during the MAX 25.2 livestream, assuring a better system for the next event.
- One member humorously recounted accidentally watching a video of Chris at GTC thinking it was part of the livestream.
Cleaned Up Livestream and Chris's Talk Available!: A cleaned-up recording of the MAX 25.2 livestream was posted (YouTube) for those who missed it live.
- A full recording of Chris' lightning talk at the Modular booth is also available on YouTube.

Links mentioned:

Modular (Mojo 🔥) ▷ #mojo (59 messages🔥🔥):

Compiler Bug, Enums, Flex Attention, Float to String Algorithm, FlashAttention-2 in Mojo

Confusing Compiler Error Message Exposed: A user reported a confusing error message when defining a method for a Dataset struct, suspecting a compiler bug, and provided a GitHub issue link.
- Another user suggested the issue might be due to using out self instead of mut self, while acknowledging the compiler error message was still confusing.
Enum Updates Still MIA: A user inquired about updates on enums in Mojo, but unfortunately, there are no updates available.
- The response was a simple "Sadly no. 🙃🙃🙃"
MAXing FlexAttention Implementation: A user asked about implementing flex-attention in Mojo and whether it's difficult, linking to a PyTorch blog post on flex-attention.
- It was suggested that implementing it as a custom op in MAX is possible and that Mojo on the GPU is close to CUDA, allowing control over memory movement, so "unless you run into something that's a work in progress, MAX should be able to do more or less whatever you want."
Float-to-String Algorithm Porting Disappoints: A user ported a new float to string algorithm to Mojo, referencing the creator's CPPCon talk, but found it slower than the standard library's dragonbox implementation and shared a link to the relevant code.
- The user noted that stringifying canada.json went from mid 30ms to low 40s, despite ripping the formatting from the standard library.
FlashAttention-2 Recipe Revealed: A user shared a link to a recipe containing a version of FlashAttention-2 in Mojo, emphasizing it was written for readability, not super-optimized performance, see custom-ops-ai-applications.
- Another link was provided to a recipe showing progressive optimization of matrix multiplication using Mojo's memory layout abstractions, see custom-ops-matrix-multiplication.

Links mentioned:

Nous Research AI ▷ #general (40 messages🔥):

OpenAI API, Midjourney New Research, Sam Altman open-weight language model, Psyche p2p, Anthropic Insights on LLMs

One-Line Fix Makes OpenAI API tutorials work: Any tutorial that works with OpenAI API should work with the Nous Research AI API, provided you change the endpoint in the code to endpoint = "api.nousresearch.com".
- One user confirmed they had it running with that change and will be adding styles.
Midjourney's LLMs Write More Creatively: Midjourney released a new research paper alongside machine learning experts at NYU on training text-based large language models (LLMs) to write more creatively, expanding beyond its image generation focus.
- The company is also building its own computing and AI hardware, announced in late summer 2024.
Sam Altman Teases New Open-Weight Model: Sam Altman announced plans to release a new open-weight language model with reasoning capabilities in the coming months, seeking developer feedback to maximize its usefulness.
- Developer events are planned in SF, Europe, and APAC to gather feedback and test early prototypes, marking OpenAI's first open-weight model release since GPT-2 (link to the announcement).
Tracing Thoughts in Language Models: Anthropic's Insights: Anthropic has released research (Tracing Thoughts in Language Models) indicating that LLMs have a thinking language of their own and think ahead more than previously thought.
- LLMs operate in more complex ways than just processing single tokens.
DeepSeek jiu jitsu makes open source Open AI Model possible: Members on the channel expressed 'gratitude to DeepSeek for applying complex Jiu Jitsu maneuvers to make this a reality for the Open Source community'.
- This sentiment was echoed along with link to YouTube video discussing OpenAI's shifting strategy related to the open-weight model.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (8 messages🔥):

DeepHermes Reasoning, Structured Output with Langchain, DeepHermes AI, Tool Calling with Reasoning

DeepHermes Reasoning Reliability Investigated: According to a member, it is currently more reliable to avoid using JSON or tool calling with reasoning mode in DeepHermes, instead opting for the non-reasoning mode.
- The next version of DeepHermes is expected to improve on reasoning and tool calling; however, for current use, combining a reasoning system prompt, a newline, and a tool calling system prompt may yield acceptable results.
DeepHermes AI Discovered: A member excitedly noted the existence of DeepHermes AI, discovering it is a 3B model.
- The same member observed that reasoning in DeepHermes appears to be implemented as a chain of thoughts with <think> </think> tags.

Nous Research AI ▷ #research-papers (2 messages):

Project Loong Release, Synthetic Data Generation

CamelAIOrg Launches Project Loong 🐉: CamelAIOrg introduces Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data.
- The project features a blog post detailing the modular design that integrates synthetic data generation with semantic verification and a multi-agent framework ensuring accuracy and consistency.
Sampling Presets Impact Metric Distribution: A member expressed curiosity about how the distribution of a beautiful metric shifts with sampling presets.

Link mentioned: Tweet from CAMEL-AI.org (@CamelAIOrg): Introducing Project Loong 🐉Blog: https://camel-ai.org/blogs/project-loong-synthetic-data-at-scale-through-verifiers…• Our structured approach to generating and validating synthetic data for enhanced ...

Nous Research AI ▷ #interesting-links (10 messages🔥):

Nous Research Portal Git Repo, X Link Removal, Contributing to Nous Research, Google's Style Guide

Git Repo for Nous Research Portal Remains Elusive: A member inquired about the Git repository for the Nous Research portal, but it was clarified that no Git repository is needed as it uses the OpenAI library.
- The portal's environment details include production status for VercelEnv and NodeEnv, with an unknown commit, branch, last commit, and modification status.
X Link Vanishes Due to Security Concerns: A member asked about the location of a specific X link, only to be informed that it was deleted due to concerns that it could steal user keys.
- No further details were provided regarding the nature of the link or the specific security risks it posed.
Applications are the Way to Contribute to Nous Research: In response to an inquiry about contributing to Nous Research, it was suggested that users can contribute by making applications with the models.
- Clarification was given that users should focus on building services using the API, rather than modifying the service itself.
Google's Style Guide Offers Steroids for Code Generation: A member shared a link to Google's Style Guide, describing it as steroids for code generation.
- The style guide (google/styleguide) includes guidelines for AngularJS, Common Lisp, C++, and C#.

Links mentioned:

Nous Research AI ▷ #research-papers (2 messages):

Project Loong, Synthetic Data Generation, Model Performance Enhancement

Camel AI Launches Project Loong: Camel AI introduced Project Loong 🐉, a modular solution for generating and verifying synthetic data, and requests shares and reposts of the announcement.
- Project Loong employs a structured approach integrating synthetic data generation with semantic verification.
Project Loong Enhances Model Performance: Project Loong aims to enhance model performance through a multi-agent framework ensuring accuracy and consistency.
- The project focuses on empowering domain-specific models with reliable reasoning signals generated from synthetic data.

Yannick Kilcher ▷ #general (35 messages🔥):

Graph Learning Evolution, AI/ML Job Impact, RLHF Alignment and Nerfed Models, Gemini 2.5 Pro Math Abilities, Dream Journaling App

Graphs Evolve Beyond 2018, Sparks Graph Learning Renaissance: A member shared a Google Research blogpost about the evolution of graph learning and expressed interest in recent advancements since 2019.
- The blogpost traces graph theory back to Leonhard Euler in 1736 and discusses its applications in modeling relationships and connections.
AI/ML Transforms Job Market: Low-Level Roles Threatened: A member suggested that recent AI/ML advancements primarily impact low-level jobs, such as minor programming tasks, but emphasized the human capacity to adapt.
- They noted AI/ML reduces dependencies on others, like using AI/ML for initial legal assistance, which saves resources and enables multi-disciplinary tasks.
RLHF Alignment: Suppressed Behaviors Resurface in AI Models: Discussion revolved around RLHF and the potential for emergent misalignment if models are penalized for useful tasks, like ML R&D or data collection.
- Concerns were raised that if open-source models are nerfed, the first self-improving models might become increasingly evil as they compensate for suppressed behaviors.
Gemini 2.5 Pro Flunks Math, UI Fails Disgracefully: A member tested Gemini 2.5 Pro (experimental) in math and found it to be totally trash, also the member added that Google's UI doesn't display math correctly.
- When asked about information theory and geometry, ChatGPT and Grok 3 were better at understanding questions, even when poorly written and the user later guided it to write correctly.
Dream Journaling App Aims to Analyze Lucid Dreams: A member announced the creation of Rem, a dream journaling app designed for easy recording, analysis, and sharing of dreams.
- No secondary summary given.

Links mentioned:

Yannick Kilcher ▷ #paper-discussion (5 messages):

RLHF, Reward Hacking, Response Diversity, Reasoning Task Verifiers, Generative Reward Model

Aligning LLMs via Reinforcement Learning from Human Feedback: A paper on Reinforcement Learning from Human Feedback (RLHF) was shared, noting its importance for aligning large language models with human preferences, available at arxiv.org/abs/2503.22230.
Overlooking Prompt-Data Construction in RLHF: The paper addresses the overlooked importance of prompt-data construction and explores data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity.
Hybrid Reward System Mitigates Reward Hacking: The paper introduces a hybrid reward system combining Reasoning Task Verifiers (RTV) and a Generative Reward Model (GenRM) to mitigate reward hacking.
Prompt Selection Method Enhances Learning Effectiveness: A novel prompt-selection method, Pre-PPO, is proposed to maintain response diversity and enhance learning effectiveness.
Prioritizing Tasks Early Improves Performance: The paper finds that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance, with experiments across two model sizes validating the methods' effectiveness and scalability.

Link mentioned: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback: Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of...

Yannick Kilcher ▷ #ml-news (20 messages🔥):

AI Dog Chasing Tail, AI Model Feedback, Runway Relevance, OpenAI Model Release Speculation, GPT-3.5 vs Thinking Models

AI Called Out for Chasing its Tail: A member expressed skepticism about current AI capabilities, describing them as probabilistic text completion with a few hacks and questioning whether they constitute true thinking.
- They expressed disillusionment with the hype surrounding AI, feeling that improvements since GPT-3.5 have been more about fine-tuning than significant breakthroughs and want end to end no hand holding.
OpenAI Model Feedback Forum Launches: A member linked to the OpenAI Open Model Feedback forum.
- Another quoted Ilya Sutskever, noting that if there was one great failing it would be that you always had to check the results.
Runway's Relevance Debated: A member questioned whether anyone still cares about what Runway does.
- Another member linked to a tweet showcasing an AI-generated KFC ad concept made with Runway, Pika, Kling AI, Google DeepMind Veo2, Luma AI, OpenAI Sora, and Topaz Labs.
OpenAI Release Speculation Surfaces: Members speculated about OpenAI's next model release, guessing it might be a smaller model for mobile, especially since their Apple deal fell through.
- Some joked and pondered if they'd be releasing GPT 2.5, 100M parameters.
Deepseek R1 is like a university student: A member stated there's a massive gap between GPT 3.5 and any thinking models, likening GPT 3.5 to a 10 year old while describing Deepseek R1 as being like a university student.

Link mentioned: Tweet from Salma (@Salmaaboukarr): I'm blown away!😱 This KFC concept ad is 100% AI generated!My friend David Blagojevic (he's not on X) created this ad concept for KFC and it's incredible! Tools used: Runway, Pika, Kling...

MCP (Glama) ▷ #general (38 messages🔥):

MCP RBAC Implementation, Docker alternatives, MCP server for webapp, VirusTotal Integration, MCP for make.com or n8n cloud

MCP Gains Traction with Pichai's Tweet: Following a tweet from Sundar Pichai asking 'To MCP or not to MCP, that's the question', interest in MCP has surged, with his tweet gaining over a million views.
- A Reddit moderator of /r/mcp even suggested doing an AMA if Google is leaning into MCP.
Crafting RBAC on MCP Server: Users are exploring Role-Based Access Control (RBAC) implementations on MCP servers to segment tool visibility based on user roles.
- One user suggested integrating with WorkOS and another mentioned that Toolhouse API does RBAC based on the API key.
SDK Governance layer is Open Sourced!: A member shared an open source SDK designed for implementing enterprise governance (Identity, RBAC, Credentials, Auditing, Logging, Tracing) within the Model Context Protocol framework at ithena-one/mcp-governance-sdk.
- Feedback from the community is encouraged and very welcome.
DesktopCommanderMCP crafts code for you: A user recommends DesktopCommanderMCP to create and update files for Claude, providing terminal control, file system search, and file editing capabilities via wonderwhy-er/DesktopCommanderMCP.
- They suggest to get the llm to pick the right servers and only get the context of those, instead of overwhelming the context with 30 mcps.
Nova act is considered for MCP: A member suggested that it would not be difficult to have Claude spit out the act calls (from amazon's Nova) and feed them to an MCP server hooked up to whatever is performing the actual browsing (i.e. some nova endpoint), see this video.
- This approach involves Claude generating nova.act commands based on user requests, which are then executed by the MCP server.

Links mentioned:

MCP (Glama) ▷ #showcase (13 messages🔥):

ActivePieces drops MCP support, MCP Autotest Tool, MCP Weekly Newsletter, Playwrite MCP server with Smithery, MCP synchronous limitations

ActivePieces Cuts Off MCP Support: Active pieces, an open-source Zapier alternative, has dropped support for MCP.
Autotest Utility for MCP Servers Released: The mcp-autotest is a tool that defines expected server behavior in yaml files, and see if it complies or not.
- Version 0.2.1 tests using stdio or new streamable http transports.
MCP Bits Newsletter goes live!: A new MCP Weekly Newsletter called MCP Bits has been published.
- It contains the latest news, articles, video and project updates; subscribe to the newsletter here.
Playwrite MCP Server Now Works via Smithery Hosting: The Playwrite MCP server now works via Smithery hosting, enabling Sage to grab web content on iOS.
MCPC enables two-way asynchronous communication: An extension called MCPC has been created to mitigate MCP's synchronous limitations and add asynchronous support.
- The new extension offers backwards compatibility, so nothing breaks—you just won’t get the extra features unless both the client and server support MCPC.

Links mentioned:

Notebook LM ▷ #announcements (1 messages):

Webby Awards, Voting, NotebookLM nominations

NotebookLM bags Three Webby Nominations!: NotebookLM has been nominated for THREE Webby Awards and is asking for community votes at this link.
- Voters should confirm their votes by clicking the verification link in their email, and check their spam folder.
Search yields No Results: The search function displays Displaying top {{maxSearchResults}} results. and **No results**.
- It prompts users to refine your search criteria to narrow the results.

Link mentioned: Vote for the best of the internet: I just voted in The Webby People's Voice Awards and checked my voter registration.

Notebook LM ▷ #use-cases (9 messages🔥):

Google Tasks integration with NotebookLM, Archiving notebooks in NotebookLM, Sharing sources on different notes in NotebookLM

Google Tasks could integrate w/ NotebookLM: A user suggested that Google Tasks could integrate with NotebookLM by allowing users to pick a task list via a dropdown/popup.
- They proposed that this could work similarly to how Google Tasks allows selecting a task list for sharing.
Notebook Archival Feature could reduce Notebook Count: A user requested a way to archive notebooks in NotebookLM to hide them and reduce the number of notebooks counting against their limit.
- They suggested that hidden/archived notebooks should not appear in the list of notebooks available for sharing content.
Source Sharing Between Notes: An Available Feature?: A user inquired whether it's possible to share sources on different notes within NotebookLM.
- They were unsure if this feature is currently available.

Notebook LM ▷ #general (39 messages🔥):

Timestamped sections on the todo list, NotebookLM to Gemini 2.5 Pro, Conversation ending early, Limit the total number of words, not the number of sources?, Maths notation in NLM is very hard to read

Timestamped Todo Lists Triumph: A user requested adding timestamped sections to the to-do list, similar to Audible, for skipping and re-listening to specific sections.
- This suggestion aims to enhance user experience and accessibility for longer audio content.
Gemini 2.5 Pro Prayers: A user requested that the NotebookLM IA be updated to Gemini 2.5 Pro, citing their love for the updated Gemini version.
- They hope that NotebookLM will perform even better with the new model, but the NotebookLM team has not commented on any ETAs.
Conversation Cut-Off Catastrophe: A user reported that the conversation is ending prematurely and not covering the second resource uploaded and asked if there was a fix.
- The team requests documenting the issue in the dedicated discord channel, including a sample notebook ID if possible.
Notes not Sources Needed: A user with personal notes managed in Obsidian (2000+ short notes) finds the 300-note limit restrictive.
- They propose limiting the total number of words instead of the number of sources to better accommodate mesh note systems; a user suggests that folders or zipped files as a single source would also solve the problem.
Math Notation Menace: A user reported that math notation in NLM is very hard to read in normal chats, asking if there's a fix.
- The team acknowledged the issue and is investigating, but currently, no ETA is available for a change.

Torchtune ▷ #general (11 messages🔥):

Torchtune office hours, Discord timezone handling

Torchtune Time Next Friday: Members announced the next Torchtune office hours next Friday, linking to the Discord event.
Discord Timezone Auto-Conversion Big Brain: Members were converting timezones manually, before realizing Discord handles that automatically.
- One member then posted a Big Brain meme.

Link mentioned: Brain Brain Meme GIF - Brain Brain meme Big brain - Discover & Share GIFs: Click to view the GIF

Torchtune ▷ #dev (16 messages🔥):

PR #2441 Review, Regression Testing for PR #2477, Qwen Model Upload, S3 Bucket Hookup Issues, PR #2510

PR #2441 Needs Final Review ASAP: A member requested a final review for PR #2441 to speed up the merge process.
Regression Testing on hold due to S3 troubles: Regression testing for PR #2477 is desired, but is blocked while waiting to upload the Qwen model to S3 for download as part of the regression test script.
- However, another member realized that there is more work to hook up their S3 bucket due to internal infra changes and suggested putting the regression testing on hold for a bit.
Modern Models like Llama2 win the race: A member suggested using something a bit more modern than Llama2 for tests, but the current regression test uses the Llama2 model.
PR #2510 Removes Recursive Reshard Utility: PR #2510 removes the recursive_reshard utility because it wasn't needed.

Links mentioned:

tinygrad (George Hotz) ▷ #learn-tinygrad (15 messages🔥):

ImageDtype and IMAGE env, tinygrad BEAM Performance, Mobile GPUs and ImageDType, arange() optimization

Delving into ImageDtype and IMAGE env: A member inquired about the purpose of ImageDtype and the IMAGE environment variable in tinygrad, noting its influence on Tensor.conv2d implementation and linking to a VAE training script.
- Another member suggested it's related to running comma.ai models faster on Qualcomm (QCOM) hardware, leveraging mobile GPUs' texture performance and caching capabilities.
tinygrad BEAM Blazes Past tf-metal: One user reported achieving 3.2 it/s on an M1 Pro without BEAM, 28.36 it/s with BEAM=2, and about 25 it/s using Keras with tf-metal.
- George Hotz responded, "glad to see it's faster than tf-metal with BEAM!"
Mobile GPUs Get Texture Boost With ImageDType: The discussion indicates ImageDType and related functions might optimize for mobile GPUs' texture performance, citing a potential Microsoft research paper on mobile GPUs.
- One member questioned the necessity of hardcoding layout specifics and suggested HWC (Height, Width, Channel) handling should be part of normal conv2d with user-defined padding.
arange() Gets Optimized: A member discovered suboptimal code generation for small arange ranges (e.g., arange(1, 2, 0.1)) compared to larger ranges (e.g., arange(1, 10, 0.1)), then added a chapter on .arange() here.
- They also spotted an unnecessary addition in the generated code, suggesting a fix from ((float)((ridx0+1)))*0.1f)+0.9f) to (((float)((ridx0)))*0.1f)+1.0f).

Links mentioned:

LlamaIndex ▷ #blog (1 messages):

LLM Agents for Technical Documentation, Structured Extraction from Complex Documents

LLM Agents Tackle Technical Documentation: An underrated use case for LLM agents is every field that depends heavily on complex technical documentation like manufacturing, construction, and energy.
- It was suggested that you can build an agent that can do structured extraction from these documents.
Complex Docs Decoded with LlamaIndex: A tweet shows a thread mentioning these docs are often full of screenshots.
- The tweet in question can be found here.

LlamaIndex ▷ #general (6 messages):

ReAct Agents, Local Models via Ollama, OpenAI Rate Limit Errors, Embedding Models, Query Engines

ReAct Agent runs into OpenAI Rate Limits: A user encountered an OpenAI RateLimitError (Error 429) when using a ReAct agent with a local model set up via Ollama, questioning if ReAct agents are exclusively for OpenAI LLMs.
- They provided a link to their GitHub repository showing their agent setup.
Troubleshooting the OpenAI Error: A member suggested that the embedding model might be the cause of the OpenAI error, as it could be defaulting to OpenAI's embedding model if not explicitly set.
- The user confirmed that they are using a Hugging Face embedding model, set during document creation.
LLM and Embed Model Parameters: A member advised to pass in both the llm and embed_model when creating the VectorStoreIndex.
- Also, make sure to specify llm when calling index.as_query_engine().

Link mentioned: Agentic-Chat-RAG/agent_utils.py at jake-dev · JakeFurtaw/Agentic-Chat-RAG: Uses a Gradio interface to stream coding related responses from local models. Can be used in Chat Mode or Agent Mode. - JakeFurtaw/Agentic-Chat-RAG

Nomic.ai (GPT4All) ▷ #general (7 messages):

Official Translations, Llama3 8B instruct model, .bin vs .gguf

GPT4All Goes Global with Official Translations: Official translations are now available for Simplified Chinese, Traditional Chinese, Italian, Portuguese, Romanian, and Spanish for the GPT4All documentation.
Llama3 8B Instructor Model for Blog Posts & Web Pages?: A user asked if the Llama3 8B Instruct model would be the best model to use for making Blog posts and web pages off of a bunch of courses they have recorded (video and text).
- Another user suggested the original user ask a friend to help rephrase the question in English so that they could better understand and answer the question with confidence.
Confusion between .bin and .gguf file formats: A user asked about the difference between a .bin and a .gguf file format, apparently noting they could not interchange them.
- The same user quickly retracted this, indicating they were just mistaken.

Link mentioned: Home: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):

Quizzes, Completion based

Quizzes being completion based: A member asked what score they needed to achieve on quizzes.
- Another member replied that they are completion based.
Quizzes matter if they are attempted: A member asked if the score doesn't matter as long as quizzes are attempted.
- Another member replied yep! and added that they hope users try their best for their own learning.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

LLM Agents Cookbook, Llama 3

LLM Agents Cookbook linked to Llama3: A member inquired whether the "LLM agents cookbook" mentioned in week 5 of Coding Agents refers to Llama 3's cookbook.
- A link to the cookbook was provided for reference.
Meta released Llama 3: Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
- The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

Link mentioned: Llama3 Cookbook - LlamaIndex: no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):

DeepSeek-R1, Reinforcement Learning, Chains-of-Thought, Project Loong

Verifiable Rewards Boost Reasoning Models: Recent Large Reasoning Models like DeepSeek-R1 show greatly improved general reasoning capabilities when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward (as discussed in Project Loong).
- The ability to easily verify accuracy is crucial for improving domain-specific capabilities, particularly in mathematics and programming.
High-Quality Datasets Enhance CoT Learning: An abundance of high-quality datasets, featuring questions paired with verified correct answers, is a critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs).
- These datasets provide the necessary signals for models to reliably arrive at correct answers.

Link mentioned: 🐉 Loong: Synthesize Long CoTs at Scale through Verifiers: Project Loong is a collaborative effort lead by CAMEL-AI to explore Long CoTs data generation through verifiers at scale.

Cohere ▷ #「💬」general (3 messages):

Command A issues, Rem dream journaling app

Command A screams eternally: A user testing Command A found that the model gets stuck generating the same character endlessly when encountering a context where a character is screaming with repeated letters.
- This issue occurs even with default API Playground settings, freezing the interface and preventing feedback; reproduction is reliable with prompts like "Please generate a scream in fiction inside quotation marks".
Rem app wants you to journal dreams: A user shared Rem, a dream journaling app created with a friend to easily record, analyze, and share dreams.
- The app aims to provide a platform for users to log their dreams and gain insights into their subconscious.

Links mentioned:

Cohere ▷ #「🤝」introductions (2 messages):

Introductions, Community growth, User interests, Networking

Community Welcomes New Members: The community welcomes new members to the Cohere Discord server, encouraging them to introduce themselves.
- New members are prompted to share their company, what they're working on, favorite tech tools, and what they hope to gain from this community.
New members share interests: New members are eager to participate, learn, and get feedback on their projects
- They are excited to engage in discussions about their favorite technologies and tools within the community.

MLOps @Chipro ▷ #events (1 messages):

AI in Legislation, Legalese Decoder, SVCAF's AI4Legislation competition

Decoding Legalese with AI: Seminar Alert!: The Silicon Valley Chinese Association Foundation (SVCAF) is hosting a seminar on April 2, 2025, at 6:30pm Pacific Time to discuss AI applications in legislation, featuring the Founder of Legalese Decoder.
- The seminar will delve into how AI, ML, and NLP are used to simplify complex legal documents, making them understandable to everyone.
SVCAF Launches AI4Legislation Competition: SVCAF is holding a competition this summer to develop open-source AI-driven solutions for citizen engagement in the legislative process, with details available in the official Github repo.
- The competition aims to harness AI's power to make legislative processes more equitable and effective, aligning with SVCAF's mission to educate the Chinese community in public affairs.
AI4Legislation Seminar Series to start: The AI4Legislation seminar series will recur during the first week of each month, aiming to provide project guidance and current information about legislative AI tools, more information can be found here.
- Each seminar features a different guest sharing insights on utilizing AI to address key challenges in lawmaking, exploring the potential of AI-driven governance.

Links mentioned:

MLOps @Chipro ▷ #general-ml (1 messages):

smartinez.ai: I think you can ask Joe

AI21 Labs (Jamba) ▷ #general-chat (2 messages):

Language Use

Member Uses French and English Regularly: A member mentioned they missed the poll and regularly use French and English.
- They also use Greek and Hebrew at times.
No Topics: No topics were discussed.
- No topics were discussed.

Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf Sounds, Auditory UX, Windsurf Next Beta

Windsurf Sounds Launch: Windsurf AI introduced Windsurf Sounds, marking their initial foray into sound design and Auditory UX, aiming to enhance flow state and productivity.
- The full video announcement is available on X.com.
Windsurf Next Beta Program Available: The Windsurf Next Beta program is now available for early adopters to test new features.
- Downloads are available at Codeium.com with minimum requirements including OS X Yosemite, glibc >= 2.28 for Linux, and Windows 10 (64-bit).

Links mentioned:

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

io_uring.h, v0 openfunctions dataset, v1 dataset

io_uring.h's v0 Dataset: Vanished or Merged?: A member inquired about the fate of the v0 openfunctions dataset within io_uring.h and whether it was completely merged into the v1 dataset.
v0 vs v1 Datasets in io_uring.h: A Deep Dive?: The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the openfunctions dataset in io_uring.h.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}