AI News for 3/27/2025-3/28/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 13422 messages) for you. Estimated reading time saved (at 200wpm): 1217 minutes. You can now tag @smol_ai for AINews discussions!

We soft launched the 2025 State of AI Engineering survey today, fill it out to join our $1000 Amazon gift card raffle + have your voice heard in the state of AI Eng!

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Here's a summary of the tweets, organized by topic:

GPT-4o Model Performance and Features

GPT-4o's improved coding and instruction following were praised: @sama highlighted the new version of GPT-4o for being particularly good at coding, instruction following, and freedom. @kevinweil agreed, stating the GPT-4o update is strong and encouraged users to try it.
GPT-4o's performance relative to other models, particularly in coding and reasoning, was assessed: @ArtificialAnlys reported that GPT-4o (March 2025) is now the leading non-reasoning coding model, surpassing DeepSeek V3 and Claude 3.7 Sonnet in the Artificial Analysis Coding Index, and is #1 in LiveCodeBench. However, it still lags behind reasoning models like o3-mini.
Concerns about policy compliance: @joannejang noted that image generation refusals are often due to the model hallucinating policies. They asked users to bear with them as they try to get the model to follow the policy and suggested trying again in a new chat if encountering issues.
@nrehiew_ hypothesized that 4o image generation works by embedding the image directly via an encoder, using AR, and then diffusing out based on the ARed hidden states; the blur is a psyop and there's no VQ.
GPT-4o's transparency and background generation feature were highlighted: @giffmana noted the ability to ask GPT-4o image gen for transparent backgrounds, calling it a cool feature drowned out by Ghiblification hype.

Gemini 2.5 Pro Model Performance and Capabilities

Gemini 2.5 Pro was lauded for its capabilities in audio and video understanding: @_philschmid reported that Gemini 2.5 Pro has improved long context capabilities and can process ~1h long video with a single request, noting the integration of YouTube links into AIS and API. The model can also handle ~2 hours of podcast transcription in a single request.
Simple-Bench AI Explanation Performance: @scaling01 mentioned Gemini 2.5 Pro Thinking scored around 51.6% on AI Explained' Simple-Bench, the first model to score above 50%.
Accessibility and Usage: @_philschmid announced that users can bring their own API Key to @cursor_ai to use Gemini 2.5 Pro, but noted that rate limits are currently low. They also mentioned that Gemini 2.5 Pro is available in @windsurf_ai.

AI Infrastructure and Compute

GPU usage is expected to increase significantly: @saranormous stated that they are going to use all the GPUs (and TPUs).
Together AI and Hypertec Group are partnering to deliver large-scale GPU clusters: @togethercompute announced a partnership with @HypertecGroup to deliver clusters of thousands of GPUs, emphasizing high-bandwidth networking, advanced cooling, and robust fault tolerance.
CoreWeave's IPO: @weights_biases congratulated @CoreWeave on their IPO, highlighting their success in pushing the edge of what’s possible in AI infrastructure.

AI Engineering and Development

Concerns regarding conventional programming languages over vibe coding: @lateinteraction emphasized the importance of retaining useful aspects of conventional programming languages, such as defining functions, control flow, and modules, rather than giving in to "vibe coding".
Importance of open-source in medical AI: @iScienceLuvr highlighted the crucial role of open-source in medical AI due to the need for transparency and the impracticality of sending sensitive patient data to cloud APIs.
Emphasizing scalable solutions for ASI: @teortaxesTex pointed out a statement about building scalable solutions to ASI, focusing on improvements with more resources on computation and data.
Langchain and Redis Integration: @LangChainAI announced that with langgraph-checkpoint-redis, you can bring @Redisinc's powerful memory capabilities to your LangGraph agents.

Company and Product Announcements

New homepage for Keras: @fchollet announced the launch of a brand new homepage for Keras to celebrate its 10th anniversary.
C.H. Robinson saves time with LangGraph: @LangChainAI reported that C.H. Robinson is saving 600+ hours a day using tech built with LangGraph, LangGraph Studio, and LangSmith to automate routine email transactions.
Launch of the MIT NLP Group account: @lateinteraction announced the launch of the @nlp_mit account to showcase the latest NLP research from MIT labs.
Perplexity AI Thread Infrastructure Issues: @AravSrinivas mentioned that Perplexity AI is going through some infra challenges, which is why past threads are not loading.

Humor/Memes

Various humorous tweets: Several users shared humorous content, including @Teknium1 posting "Jensen rn" with an image, @teortaxesTex with Xi after he dies in WWIII and is reincarnated as a shota in a parallel world, @mickeyxfriedman suggesting that if you generate yourself as the opposite sex in chatgpt and think it’s mid, you should probably lower your standards, and @_philschmid noting that @cursor_ai just rick rolled them.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Reverse Engineering GPT-4o: Architectural Insights and Speculations

Reverse engineering GPT-4o image gen via Network tab - here's what I found (Score: 599, Comments: 43): The author investigates the image generation process of GPT-4o by examining network traffic, uncovering that the backend returns intermediate images that suggest a possible multi-step pipeline. They speculate whether the model uses a diffusion process or an autoregressive approach, noting that the OpenAI model card describes it as an autoregressive model. The author references the OmniGen paper as a potential explanation for GPT-4o's capabilities, highlighting its use of a transformer-based architecture that scales well with high-quality data and computational power.
- There is debate over whether the GPT-4o model uses a diffusion model or an autoregressive model. Some commenters speculate it might employ a hierarchical decoder with a diffusion model for pixel-level detail, while others suggest it uses an autoregressive approach that enhances image generation by predicting sequences of tokens in a sophisticated manner.
- The potential for open-source competitors to match the quality of GPT-4o is discussed, with some expecting that Chinese competitors might achieve this within a year. However, others believe it could take until the end of 2025 for open-source models to catch up, emphasizing the importance of an open-source image model akin to LLaMA for LLMs.
- Commenters express skepticism about the value of individual reverse engineering efforts, noting that the broader academic and industrial communities, especially in China, are likely conducting extensive analyses. There is interest in whether the model's ability to access the internet and utilize high-quality data provides significant advantages over local text encoders like CLIP/T5.

Theme 2. MegaTTS3's Voice Cloning: Skepticism and Security Concerns

New TTS model from bytedance (Score: 143, Comments: 19): ByteDance released MegaTTS3, a new text-to-speech model, which has sparked controversy over its voice cloning capabilities. The discussion centers around ethical implications and potential misuse of this technology in creating unauthorized voice replicas.
- MegaTTS3's Features and Limitations: The model boasts lightweight efficiency with 0.45B parameters, bilingual support, and controllable accent intensity. However, the WaveVAE encoder is not available for local voice cloning due to "security issues", sparking criticism about the misleading advertising of "Ultra High-Quality Voice Cloning".
- Ethical and Security Concerns: There is skepticism about the "security reasons" for not releasing the voice cloning software, as many believe this is a guise for data collection to improve their models. Critics argue this approach contradicts ethical considerations, given the widespread availability of AI voice cloning technologies.
- Community Reactions and Criticism: Users express frustration over the misleading promotion of voice cloning capabilities and question the ethics of data submission for training purposes. Some see the "safety" claims as a strategy for indirect monetization by collecting user data for further training.

Theme 3. Qwen-2.5-72b: Leading the Open-Source OCR Revolution

Qwen-2.5-72b is now the best open source OCR model (Score: 119, Comments: 14): Qwen 2.5 VL (72b and 32b) models have emerged as the leading open-source OCR models, achieving approximately 75% accuracy in JSON extraction, comparable to GPT-4o. The 72b model slightly outperformed the 32b model by 0.4%, while both surpassed the mistral-ocr model's 72.2% accuracy. Surprisingly, Gemma-3 (27B) scored only 42.9%, despite its architecture being based on the high-performing Gemini 2.0. The benchmarking data and methodology are available on GitHub and Hugging Face.
- Ovis2 Models have not been included in the discussion, despite being leaders on OCRBench with significantly fewer parameters (18x less), suggesting potential interest in their performance relative to Qwen models.
- There's curiosity about the performance of the olmOCR-7B-0225-preview model from Hugging Face, noted for being more VRAM efficient, highlighting a demand for models that balance performance with resource usage.
- The Qwen 2.5 VL 32B model has been updated and shows significant performance improvements over the older 72B model, which has not received recent updates. The 32B model is also noted for its superior writing capabilities compared to the vanilla Qwen model.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

our pipelines are down...

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. GPT-4o Dominates Leaderboards and Sparks Debate

GPT-4o Jumps to Arena #2, Coding Prowess Confirmed: The latest ChatGPT-4o (2025-03-26) model surged to #2 on the Arena leaderboard, surpassing GPT-4.5 and tying for #1 in Coding and Hard Prompts. Users note a significant performance leap and a 10x cost reduction compared to previous models, though pricing discrepancies with API snapshots cause confusion.
GPT-4o's Coding Skills Draw Mixed Reviews Despite Benchmarks: While benchmarks position Gemini 2.5 Pro as the leading non-reasoning model, some users find GPT-4o superior for coding tasks, particularly in instruction following and code generation. Debate continues about whether GPT-4o's high ranking is due to specialized training for preferred response styles rather than raw performance.
GPT-4o Unveiled as Autoregressive Image Model: GPT-4o is confirmed to employ an autoregressive approach for image generation, marking a novel method for creating images directly from text prompts. Speculation arises about the model reusing image input and image output tokens for efficiency.

Theme 2. DeepSeek V3 and Qwen2.5-Omni Emerge as Strong Contenders

DeepSeek V3 Outcodes GPT-4o on SWE-bench: The new DeepSeek V3 0324 model is gaining recognition for coding prowess, reportedly outperforming GPT-4o R1 on the SWE-bench benchmark. Data indicates DeepSeek V3 surpasses Claude 3.7 Sonnet in non-reasoning coding tasks, becoming a leading model in the field.
Qwen2.5-Omni: Meta's Multimodal Marvel Arrives: Qwen2.5-Omni, the latest flagship model in the Qwen series, is released as an end-to-end multimodal model handling text, images, audio, and video with real-time streaming responses. Users can test Qwen2.5-Omni at Qwen Chat, marking a significant step towards truly versatile AI models.
DeepSeek Blends Diffusion and Transformers, Following GPT-4o's Lead: DeepSeek is adopting a multimodal architecture similar to GPT-4o, combining diffusion and transformers. This approach, previously seen in vision models, signals a growing trend in multimodal AI development.

Theme 3. Infrastructure Woes and User Frustrations Plague AI Platforms

Perplexity AI Buckles Under Server Strain, Users Report Outages and Data Loss: Perplexity AI experiences widespread outages, with users reporting disappearing history and spaces. The official status page (status.perplexity.com) is slow to update, prompting calls for better outage communication and automated reporting systems.
Manus.im Credit System Triggers User Backlash Over High Costs: Manus.im's new credit system faces heavy criticism for its perceived high cost, with some users estimating monthly expenses could reach $500. The shift from a task-based to credit-based system is described as jarring, impacting user experience.
Cursor IDE Suffers Database Disaster, Service-Wide Outage Ensues: Cursor experiences a service-wide outage due to a database deployment issue, disrupting core AI features and general service functionality. While resolved after a few hours, the incident highlights the fragility of AI-powered coding tools and their reliance on robust infrastructure.

Theme 4. Tools and Techniques for Enhanced AI Development Emerge

LM Studio 0.3.14 Unleashes Granular Multi-GPU Control: LM Studio 0.3.14 introduces advanced controls for multi-GPU setups, allowing users to fine-tune GPU allocation strategies and manage resources more effectively. New keyboard shortcuts (Ctrl+Shift+H or Cmd+Shift+H) provide quick access to GPU settings.
Aider's New /context Command Automates Codebase Context Management: Aider introduces the /context command, which automatically identifies and adds relevant files to the chat based on user requests. This feature streamlines context management, especially in large codebases, saving developers time and effort.
DSPy Framework Promotes Declarative Programming Over Brittle Prompting: DSPy is highlighted as a framework for programming language models rather than relying on traditional prompting. It enables rapid iteration on modular AI systems using Python code and algorithms to optimize prompts and model weights, aiming for more robust and high-quality AI outputs.

Theme 5. Ethical Considerations and AI Safety Remain Central

OpenAI Relaxes Image Generation Policy, Prioritizes Real-World Harm Prevention: OpenAI shifts its image generation policy in ChatGPT 4o, moving from blanket refusals to a more nuanced approach focused on preventing real-world harm. This policy change allows for greater creative freedom in previously restricted areas.
AI Safety Discussions Highlight Constitutional AI and Jailbreak Concerns: Discussions on AI safety emphasize that models like Claude, designed with constitutional AI principles, prioritize objectivity over user preferences, potentially impacting leaderboard rankings. Resources like the Jailbreak Cookbook are shared, addressing LLM vulnerabilities and safety measures.
Miyazaki's 9-Year-Old Critique of AI Art Resurfaces, Sparks Ethics Debate: A resurfaced clip of Hayao Miyazaki criticizing AI-generated art reignites ethical discussions within the AI community. The debate draws parallels between AI art sampling and fast fashion ethics, questioning the morality of readily accessible, potentially exploitative content.

PART 1: High level Discord summaries

Manus.im Discord Discord

Users Rage Against Manus New Credit System: Users are frustrated with the new credit system, some estimating costs could reach $500/month for decent usage and the 1000 free credits are quickly consumed even if the task fails, details at manus.im/help/credits.
- The community noted the shift from task-based to credits-based does feel jarring, especially when it wasn’t part of the original beta flow.
Manus Farm Brainstorming Alternative Energy: One member suggested that Manus could develop cheap renewable energy sources, such as molten sodium, thermal or solar to power their own GPU farm and reduce costs, potentially locating it in a desert.
- The member proposed flywheels as energy storage to keep the farm running at night for max efficiency.
Manus Considers Cheaper AI Models Like Deepseek: The community is in discussion around using cheaper AI models like Deepseek and Qwen instead of only Anthropic's Claude to reduce operational costs.
- It has not been stated if Manus will allow other AI integrations.
Students Cheat with Manus AI on Exams: Students have used Manus alongside Kimi or Deepseek to upload seminar and lecture files, asking the AI to memorize them for exam preparation, some receiving scores such as 81/100 on assignments.
- Some users were wondering if that violates the terms of service if you help the AI cheat for school.
UI Design Hailed as Simple Genius: Multiple members praised the UI design of Manus, expressing that the design is really good, easy to use, simple and aligns with real world concepts.
- One user stated What made manus feel so amazing was not only the results you got, but that the idea of tasks closely aligned with real world concepts. That simplicity was genius.

Perplexity AI Discord

Perplexity AI Servers Under Siege: Multiple users reported outages and disappearing history/spaces, prompting humor and frustration, and the official status page (status.perplexity.com) lacked timely updates.
- Users suggested an automated user-reported outage system and proactive notifications to address the infrastructure challenges mentioned in this tweet.
DeepSeek AI Falls Flat: Members voiced disappointment with DeepSeek AI, citing its struggles with complex instructions and tendency to produce unnecessary jargon.
- Comparisons were made to superior math applications, highlighting DeepSeek AI's shortcomings in practical problem-solving.
Claude AI's Context Window Gets the Side Eye: Discussion arose around the context window limit of Claude AI relative to Gemini and ChatGPT, with many members noting Claude's limitations.
- Members agreed that Claude's context window was particularly restrictive in comparison to its competitors, especially Gemini.
Free Perplexity Pro Via T-Mobile: Users exchanged methods for acquiring free Perplexity Pro subscriptions through T-Mobile and Revolut promotions.
- One user even suggested utilizing a burner number on T-Mobile to take advantage of the offer, and another user linked to a tweet about Perplexity shipping voice dictation.
Sonar API has Llama Index RAG Integration Issues: A user inquired about effectively passing Llama Index RAG context to the Perplexity Sonar model, seeking suggestions on leveraging the index object.
- The user also questioned whether the Deep Research functionality in the API would achieve parity with the perplexity.com version, noting a perceived performance gap, and mentioned that the Sonar API sometimes misses citations.

Cursor Community Discord

DeepSeek 3.1 Sneaks into Cursor: A Cursor team member mentioned that DeepSeek 3.1 should be integrated into the editor within 12 hours, but pricing details remain undisclosed.
- Cursor offers deals with providers and a privacy mode ensuring no data storage.
Cursor Plunges Amidst Database Disaster: Cursor experienced a service-wide outage due to a database deployment issue within its infrastructure, disrupting AI features like Chat and Tab as well as general service.
- After a few hours, the issue was resolved and they updated the Cursor Status.
Humanoid Hype Heats Up: Members debated the utility of humanoid robots, with contrasting visions of them as food-making and cleaning assistants versus concerns over data privacy and telemetry.
- A member posited that AGI will emerge from robotics, developing first in a virtual environment before manifesting in the real world.
Codebase Tag Cruises into the Sunset: Users noticed the removal of the @Codebase tag and staff clarified it was replaced with a similar way to scan the current indexed project, as noted in the changelog.
- This sparked discussions about token limits, pricing models, and balancing convenience with control in AI coding tools.

LMArena Discord

O1 Pro Coming to Leaderboard?: Members discussed the potential inclusion of O1 Pro on the leaderboard, speculating that OpenAI might cover costs to showcase its capabilities given its high price.
- However, some members expressed doubts about its leaderboard performance and latency.
GPT-4o's Coding Skills Under Debate: Members debate GPT-4o's coding ability after recent updates, with some noting improvements in instruction following and code generation.
- However, proper evals are needed, as one member argued that GPT-4o's ranking may be inflated due to specialized training for preferred response styles, rather than actual performance.
DeepSeek V3 leapfrogs Coding Benchmarks: The new DeepSeek V3 0324 model is gaining recognition, with one member noting it scores higher than GPT-4o R1 in SWE-bench according to this Reddit post.
- Data indicates that DeepSeek's V3 0324 release leapfrogs Claude 3.7 Sonnet in non-reasoning and has become the leading non-reasoning model for coding.
Meta's Llama Models getting Quirky: Members observed that recent anonymous models in the arena, believed to be from Meta, are displaying quirky behavior, including adding many emojis and identifying themselves as Meta Llama models.
- Models being tested include: bolide, cybele, ginger, nutmeg, phoebe, spider, themis, though they also note that spider sometimes identifies itself as GPT-4.
AI Safety Discussions: Members discussed AI safety, mentioning that models like Claude are designed with constitutional AI principles, prioritizing objectivity over user preferences, which may affect their leaderboard rankings.
- A member also shared a Jailbreak Cookbook resource for LLM jailbreaks and AI safety, including a GitHub repository with implementations of systematic jailbreaks.

Unsloth AI (Daniel Han) Discord

Scribe V1 Powers FoxMoans!: A member uses 11Labs Scribe V1 for audio event classification, to create a list of utterances, estimating a cost of $20k.
- It is used for audio event classification, suited for projects needing mood-based analysis.
OlmOCR's Unsloth Integration Still Rocky: A member struggles to load OlmOCR (a finetune of Qwen2VL) in Unsloth, despite having Qwen2VL working.
- The Unsloth team asked if the user tried the latest version, as they pushed updates and fixes before the creator realized their models finished uploading.
Orpheus TTS Gets Fine-Tuning: The Unsloth team released a notebook for finetuning Orpheus-TTS, highlighting its human-like speech with emotional cues.
- Members discussed changing Orpheus language, suggesting continued pretraining with new embedded/head layers might be sufficient.
Double Trouble for BOS Token: A user found a double BOS token issue in the latest Unsloth update (Gemma 3 4B) when checking tokenizer decoding.
- A hotfix was identified which removed the accidentally added token.
DeepSeek-R1 Goes Quantized: Unsloth made available various versions of DeepSeek-R1, including GGUF and 4-bit formats.
- Unsloth's DeepSeek-R1 1.58-bit + 2-bit Dynamic Quants selectively quantized improving accuracy over standard 1-bit/2-bit.

OpenAI Discord

GPT-4o vs Gemini 2.5: Coding Showdown: Members compared GPT-4o and Gemini 2.5 Pro for coding, with some finding GPT-4o superior despite benchmarks showing that Gemini 2.5 Pro performs better overall, with GPT-4o winning 3 out of 6 categories.
- Opinions varied, with some favoring Gemini for specific tasks like C++ and WinAPI integration.
Google AI Studio: The New Free Tier Hero: Users are praising Google AI Studio for its free access to models like Gemini 2.5 Pro and generous prompt limits, which are more than paid services like ChatGPT Plus.
- Some members reported using hundreds of messages daily without hitting limits and even canceled their ChatGPT subscriptions because of these advantages.
Perplexity Dominates News over ChatGPT: Members found Perplexity excels in news and current events due to its Discover tab, highlighting it as more than just a GPT wrapper.
- However, some noted issues with Perplexity's Deep Research feature for quality and reliability on uploaded files, suggesting ChatGPT instead.
Claude 3.7 Sonnet's Reasoning Prowess: Members lauded Claude 3.7 Sonnet for its superior reasoning capabilities and explanations compared to other AI models, especially since free tier Claude fills up and forces you to start a new chat.
- Alternative models like o1, o3-mini-high, and Grok 3 were recommended for coding, with o1 favored for complex tasks using C++, Physics, Rendering and older APIs like Win32API.
Enhanced Image Prompting: A New Dawn?: Users raved about the new ChatGPT image tool's improved adherence to complex prompts, like generating a moving market on a giant turtle's back with a sun and three moons.
- The updated tool excels at targeted image modifications, such as removing stars from a night scene without affecting the entire image.

OpenRouter (Alex Atallah) Discord

Gemini 2.5 Pro: Users Hit Rate Limit Wall: Users are bumping into low rate limits for Gemini 2.5 Pro, even after integrating their own AI Studio API keys, leading to discussions on maximizing free quota.
- One member remarked the model won't be free forever which will be a problem when they inevitably have to start charging.
OpenRouter AI SDK Provider Options Confuse Debuggers: Members are actively debugging OpenRouter AI SDK provider options, specifically using providerOptions for model order and fallback behavior.
- The core issue revolves around the correct way to nest the order array under the provider key, as debugging attempts reveal unexpected provider selection despite the configurations.
Function Calling Gold Rush in Free LLMs: Members are on the hunt for free models that support function calling, with Mistral Small 3.1 and Gemini free models emerging as top contenders.
- One frustrated member exclaimed, Gosh, I'm trying so hard to find a free model that supports function calling. I can't find any!.
Gemini Flash 2.0 Burns Rubber in TPS Showdown: The community is hotly debating the tokens per second (TPS) performance of various coding models, with Gemini Flash 2.0 being touted for its blazing speed.
- Despite the hype, some users are critical, pointing out it is trash because their hosting is messed up, and one member touted that Groq serves the 70B R1 distil at 600tok/s, another one chimed in that it isn't good at coding imo.
OpenAI Responses API Support?: A member inquired about OpenRouter supporting the OpenAI Responses API.
- The OpenRouter team suggested the Veo2 API is your best bet for SOTA image to video, but it's about 50 cents per second of video.

MCP (Glama) Discord

Prompt ICL for Best Tool Use: Members discussed prompting agents for tool usage, referencing Cline's system prompt and suggesting prompts on the server directly such as First call ${tool1.name}, then ${tool2.name}.
- A member shared a link on using prompts for ICL and a test showing it working.
Google Search Gets Config for MCP: A member inquired about adding Google Search to MCP, and another member shared their configuration.
- They noted that users need to obtain their own Google API key and engine ID to use the configuration.
MCP Servers Galore with Docker: A member created an all-in-one Docker Compose setup for easily self-hosting 17 MCP servers using Portainer, with Dockerfiles sourced from public GitHub projects (MCP-Mealprep).
- It was recommended to not bind the containers on 0.0.0.0 unless you need this accessible remotely and to include in the readme an example mcp config json.
Agents are saying Canvas Yeah!: A member created a Canvas MCP server, enabling AI agents to interact with Canvas LMS, and added an agent that can autonomously crawl Gradescope to find info, available at Canvas-MCP.
- The tool offers features like finding relevant resources, querying upcoming assignments, and accessing courses and assignments from Gradescope.

aider (Paul Gauthier) Discord

GPT-4o Claims Coding Arena: The latest ChatGPT-4o update jumps to #2 on the Arena leaderboard, tying #1 in Coding, Hard Prompts, and performing in the Top-2 across ALL categories while costing 10x less.
- This update is confusingly released as chatgpt-4o-latest endpoint, priced at $5/$15 per million input/output tokens, whereas the API snapshots are priced at $2.5/$10, so caution is recommended when moving workloads, according to Artificial Analysis.
OpenRouter R1 Model Stumbles: A member found the free R1 model on OpenRouter to be "stupid", verbose, and ineffective at solving broken tests, especially with repomap enabled, unlike O3-mini.
- It's speculated that the free R1 model is a quantized version of DeepSeek, possibly in FP8 format, while the DeepSeek on the leaderboard is from the official DeepSeek team and users rotating through multiple API keys on OpenRouter may have their accounts suspended.
Context Architecture Enables Efficient Codebase Handling: Constant Context Architecture (CCA) is proposed as a solution for working with large codebases using LLMs, guaranteeing that the necessary context for modifying any module will always fit within an LLM's context window, regardless of the total codebase size, as described in this blogpost.
- This is achieved by ensuring modules have bounded size, interfaces, and dependencies, making context gathering a bounded operation.
Rate Limits Frustrate Gemini 2.5 Pro Users: Multiple users reported hitting rate limits with Gemini 2.5 Pro, even when seemingly below the documented 50 requests/day, with one noting the existence of a 2 requests/minute limit.
- There was discussion on whether purchasing a paid account would resolve the limitations, with mixed results reported, along with a potential fallback model implementation.
Aider's Context Command Automates File Inclusion: The new /context command automatically identifies relevant files for a given request and adds them to the chat, as discussed in this discord thread.
- It's particularly useful for large codebases and saves time by automating the process of manually adding files.

Latent Space Discord

GPT-4o Leaps to #2 on Arena!: The latest ChatGPT-4o (2025-03-26) jumped to #2 on Arena, surpassing GPT-4.5 with a significant improvement (+30 pts) over the January version, according to this tweet.
- It tied for #1 in Coding and Hard Prompts.
OpenAI Loosens Image Generation Policy: OpenAI launched native image generation in ChatGPT via 4o, shifting from blanket refusals to a more precise approach focused on preventing real-world harm, as explained in this blog post.
- The new policy allows more creative freedom in sensitive areas.
Devin Autogenerates Wiki Pages: Devin now automatically indexes repos and produces wikis with architecture diagrams and links to sources, according to this tweet.
- This functionality helps users get up to speed on unfamiliar parts of a codebase.
HubSpot Co-Founder Joins Latent Space: Dharmesh Shah, co-founder of HubSpot and creator of Agent.ai, joined Latent Space to discuss the next evolution in workplace organization, with a focus on hybrid teams.
- A key concept is the idea of human workers collaborating with AI agents as team members, raising questions about team dynamics, trust, and task delegation.
LLM Codegen Workflow Detailed: A member shared their LLM codegen workflow, emphasizing brainstorming specs, planning, and executing with LLM codegen in discrete loops.
- The workflow is built on personal experience and internet best practices, but the author admits that it will probably not work in 2 weeks, or it will work twice as well.

LM Studio Discord

LM Studio Tames Multi-GPU Setups: LM Studio 0.3.14 introduces granular controls for multi-GPU setups, enabling users to enable/disable specific GPUs and choose allocation strategies such as evenly or priority order, downloadable here.
- Keyboard shortcuts Ctrl+Shift+H (Windows) or Cmd+Shift+H (Mac) give quick access to GPU controls, with Ctrl+Alt+Shift+H (Windows) or Cmd+Option+Shift+H (Mac) opening a pop-out window for managing settings during model loading.
Threadripper Flexes on EPYC: A discussion compared Threadripper to EPYC, clarifying that while Threadripper is technically HEDT (High-End Desktop), AMD does not promote EPYC for home users.
- A GamersNexus review highlighted the AMD Ryzen Threadripper 7960X's 24 cores and relatively low cost for workstations.
LLM Calculations Get a Visual Overhaul: Members discussed visualizing calculations performed by LLMs, such as mapping values to pixel colors and the LLM Visualization tool was recommended.
- Resources such as 3b1b's playlist on LLMs and a book on building LLMs from scratch were shared for deeper understanding.
P100 Gets Demolished by 6750xt: A member inquired about using a P100 16GB for a hobby project, but was strongly advised against it, with one user saying its basically e-waste compared to a 6750xt.
- The 6750xt was recommended as a better and more modern card due to its Vulkan support, while the P100's unsupported CUDA versions make it less desirable.

Eleuther Discord

Transformer Storage Error Messages Confuse Users: Insufficient storage leads to misleading error messages in transformers v4.50.0, a user found; a PR is planned for better error handling and checking for capacity before downloading model shards.
- The user had to use df -h to diagnose the 100% full system due to bad error messaging from the library.
Torchtune Invites Code Tinkering for Customization: Users found that torchtune needs downloading and editing 200-line PyTorch scripts and YAML files to customize, giving a complete view of the process.
- The need to dissect Hugging Face's implementations may be avoided by this approach, according to a user.
Bias-Augmented Consistency Training Validates Introspection: Members discussed emulating self-awareness in LMs by creating a representation of their circuits and feeding it back, inspired by Anthropic's work.
- A paper on bias-augmented consistency training (BCT) was also linked as a validation measure for introspection methods.
Adaptive Compression Aims to Boost Distributed Systems: An infrastructure layer optimizing model transmission and deployment across distributed systems is in development, using adaptive compression and intelligent routing to tackle bandwidth waste and inference latency.
- Those interested in distributed inference may find this infrastructure useful for scaling larger models, offering a demo.
Neural Nets Morph Into Bodies Without Organs: A member linked to a tweet arguing that neural networks are Bodies Without Organs (BwO) because they don't have organs or fixed mechanisms and instead have flows of information.
- A member rejects mechanistic interpretability and says neural networks generalize without fixed mechanisms which was seen by Descartes 400 years ago.

GPU MODE Discord

tl.gather Glides Closer to Release: While waiting for official release, to solve element repetition problems, members noted that one can compile Triton from source as described in this discord thread.
- The team also clarified that tl.gather could solve element repetition problems, which has been requested by other members for functions such as torch.Tensor.expand() to triton.
Activation Sparsity Accelerates FFNs: A new paper was shared arguing that 2:4 sparsity for activation acceleration in LLMs leads to 1.3x faster FFNs without accuracy loss, see Acceleration Through Activation Sparsity.
- A member noted the next step is FP4 with sparsity for an effective 2-bit tensorcore performance.
Confusion Clouds CUDA Profiling: A user seeks a definitive guide to CUDA profiling, given the plethora of Nvidia tools such as nvprof, Nvidia Visual Profiler (nvvp), and various Nsight packages.
- Another user suggested Nsight Compute is the best tool for single kernel profiling, with links to Nvidia's documentation and a detailed talk.
Miyazaki Mocks AI Art Sampling: A 9-year-old meme resurfaced showing Hayao Miyazaki's critical reaction to AI-generated art when presented by a founder of Niconico.
- Members compared the ethics of using AI art to buying from fast fashion companies like Shein, citing an immoral business model offers access to cheaper content.

Yannick Kilcher Discord

AI Schools Envisioned by OpenAI and xAI: OpenAI and xAI are exploring the concept of AI-driven schools, potentially leveraging generated images for lesson content, with discussion pinpointing Ghibli Studio Style as a solution for alignment as per this post.
- The initiatives aim to integrate AI more intimately into educational frameworks, with a focus on creating visually appealing and contextually relevant learning materials.
Transformer Circuits Unveils Crosscoders: The Transformer Circuits team released an update on sparse crosscoders, a variation of sparse autoencoders that read and write to multiple layers, forming shared features as outlined in their research update.
- These crosscoders address cross-layer superposition, monitor persistent features, and simplify circuits.
GPT-4o Confirmed as Auto-Regressive Image Model: Members verified GPT-4o as an autoregressive image generation model after Yampeleg's post and the release of OpenAI's System Card.
- This revelation highlights the model's novel approach to image creation directly from textual prompts, with members conjecturing that GPT-4o reuses image input and image output tokens.
Qwen2.5-Omni Makes a Multimodal Splash: Qwen2.5-Omni, the latest flagship end-to-end multimodal model in the Qwen series, has been shared among members, and it is designed for comprehensive multimodal perception and handles text, images, audio, and video, as detailed on the Qwen Chat.
- Offering real-time streaming responses via both text generation and natural speech synthesis, Qwen2.5-Omni sets a new benchmark in multimodal interaction.

Interconnects (Nathan Lambert) Discord

GPT-4o Surges on Arena, 10x Cheaper: The new ChatGPT-4o (2025-03-26) model jumped to #2 on Arena, surpassing GPT-4.5, with reported 10x cost reduction and it tied for #1 in Coding and Hard Prompts, as reported by lmarena_ai.
- The model is currently ranked in the Top-2 across all categories in Arena and excels in both coding and handling complex prompts.
Musk's xAI Swallows X in $80B Deal: Elon Musk revealed that xAI has taken over X through an all-stock transaction, valuing xAI at $80 billion and X at $33 billion, including $12 billion in debt, according to The Verge.
- This move consolidates Musk's AI ventures under the xAI umbrella and may shift the competitive landscape in the AI market.
LlamaGen Generates Images Like LLMs: The LlamaGen family of image generation models applies the next-token prediction paradigm from large language models to generate images, achieving 2.18 FID on ImageNet 256x256 benchmarks as described in the LlamaGen paper.
- The architecture achieves a reconstruction quality of 0.94 rFID and 97% codebook usage with an image tokenizer that has a downsample ratio of 16.
Qwen2.5-Omni Does It All: The Qwen2.5-Omni is the new flagship end-to-end multimodal model in the Qwen series, capable of processing text, images, audio, and video, with real-time streaming responses via text and speech as noted in their blogpost.
- The model is available for use at Qwen Chat and may herald a new wave of more generalized models.
Gemini 2.5 Pro Crushes Wordle Competition: Gemini 2.5 Pro has demonstrated exceptional performance on Wordle, logically deducing words and letter placements, as reported by Xeophon.
- Feedback on Gemini 2.5 Pro has been overwhelmingly positive, with one user noting that I think I've never seen feedback this robustly positive about an AI release that wasn't the Current Thing, as mentioned by Zvi.

Torchtune Discord

FP8 QAT Faces Bandwidth Bottleneck: A member following up on issue #1632 noted FP8 QAT is on TorchAO's radar, but lacks bandwidth for immediate implementation.
- This indicates a potential area for future development and contribution within the PyTorch ecosystem.
Torchtune's Team Tackles Issue Backlog: The team discussed prioritizing PR reviews and new PRs before addressing the issue backlog, estimating 80% of existing issues are already resolved.
- To better organize the backlog of pending reviews, a member suggested a general RL/RLHF tracker, in addition to the existing GRPO tracker.
Torchtune Plans Integration with bitsandbytes: A member suggested using issue #906 in the Torchtune repo to guide contributions for bitsandbytes integration.
- Another member humorously noted their lack of enthusiasm for doc PRs, but agreed to check it out nonetheless.
Centered Reward Loss enables Reward Model Training: Members discussed enabling reward model training in Torchtune, specifically focusing on implementing centered reward loss like (R1 + R2)² loss.
- They noted the current preference dataset format requires a chosen/rejected format without a prompt.
vLLM Integration Causes Weight Hotswapping Hacks: A member detailed memory monopolization issues during initialization with vLLM, sharing an obscure hack for weight hotswapping.
- Another member warned that every vLLM release breaks something, alluding to potential incompatibilities with existing hacks when vLLM releases version 0.8 with its new v1 execution engine.

Nous Research AI Discord

Claude Gets a Kingly UI: Users are reporting a clean new UI for Claude, with one user specifically liking that the UI hides all the things they never use, calling it a king move.
- The only noted issue so far is the lack of a toggle for extended think.
DeepSeek Copies GPT-4o's Homework: DeepSeek is combining diffusion and transformers like GPT-4o multimodal, as noted in this tweet referencing a similar idea in vision.
- The cited paper experiments on images and videos using autoregressive conditional block attention.
TinyZero's $30 AI Model Debuts: Attention is turning to U.S. TinyZero's recent accomplishments, specifically their $30 model, along with new releases like VERL and Sky-T1, as covered in this CNBC article.
- When DeepSeek released its R1 claiming it had achieved its generative AI large language model for just $6 million, the billions being spent by U.S. AI market leaders including Microsoft-funded OpenAI immediately came under scrutiny.
LG's EXAONE Models Released Under Questionable License: LG AI Research has released EXAONE Deep, a series of models ranging from 2.4B to 32B parameters, with superior capabilities in reasoning tasks including math and coding benchmarks, as detailed in their documentation, blog and GitHub.
- It was noted that the EXAONE AI Model License Agreement 1.1 - NC explicitly retains ownership of the output, but the enforcement of this license is questionable.
Hermes-3 Impresses Users: A member mentioned that so far the most impressive model has been Hermes3 Llama3.2 3B.
- No further details were given.

HuggingFace Discord

DeepSeek Plunges Into Diffusion-Transformer Mix: DeepSeek combines diffusion and transformers like GPT-4o multimodal, according to this tweet linking to their paper.
- The author noted that a similar idea appeared in Vision, experimenting on images and videos with almost the same title.
ZeroGPU Quota Bugging Users: Users are reporting issues with zeroGPU quota not resetting, with one linking to this discussion for related complaints.
- One user noted that even if the quota is used up, it recovers to a certain extent after 30 minutes or an hour, but it's buggy.
FactoryManager Rolls Out LinuxServer.io Docker Support: A member introduced FactoryManager, a Python package wrapping linuxserver.io desktop environment containers, enabling programmatic control of environments, showcased with a demo using two different desktop environments.
- This package aims to offer flexibility by scaffolding on top of linuxserver.io, diverging from the custom environments often created in GUI agent demos from Anthropic, OpenAI, etc.
Langfuse Toxicity Evaluator Flags the Carrots: A user testing the toxicity LLM-as-a-judge in Langfuse found that it incorrectly flagged the prompt 'Can eating carrots improve your vision?' as toxic with a score of 0.9, citing a false association with climate change discourse.
- The user questioned how to evaluate the evaluator, noting that GPT-4o misattributed derogatory climate change content to a harmless question about carrots.
Base vs Instruct Model Debate: A newcomer to agents sought clarification on the distinction between base models and instruct models, referencing the course's mention of chat templates.
- A member responded with a metaphor of a base model as 'the naked model, without a wrap' and shared a Reddit post further elaborating on the differences.

Notebook LM Discord

Mindmapping Feature Wins Fans: A user expressed excitement about the new mindmapping feature, calling it another mind-blowing moment.
- No further details were provided about their specific uses.
Source Uploads Snag, Stuck in Limbo: A user reported issues with sources stuck in a perpetual uploading state, preventing both import and removal, for over 8 hours.
- The user sought advice on removing permanently uploading sources but without success.
Versioning Vanishes, Users Vexed: A user expressed concern over the lack of versioning and recycle bin support for the "Note" source type.
- The user mentioned hesitancy to use it, preferring Google Docs for its superior data protection and backup features.
Pasted Sources Stop Self-Naming: A user reported that pasted sources, which previously named themselves automatically, now default to "pasted text."
- The user asked if there was an update or a way to revert to the previous behavior.
PDF Parsing Problems Persist: Users discussed NLM's inability to extract data from scanned PDFs, with one user asking if the tool could extract data from scanned notes.
- A user clarified that NLM cannot handle mixed content PDFs (text and images), but can process docs and slides.

LlamaIndex Discord

LlamaIndex Celebrates MCP Week: LlamaIndex highlights LlamaCloud as an MCP server and demonstrates the use of LlamaIndex as a client to any MCP server, offering access to many MCP servers as tools, detailed in this tweet.
- They showcased the ability to substantially expand capabilities for agents by utilizing hundreds of existing MCP servers.
FunctionAgent Gains ChatMessage History: A member inquired about adding chat history to the FunctionAgent workflow, with documentation provided.
- Guidance was offered on overriding chat history with agent.run(...., chat_history=chat_history) or using ChatMemoryBuffer.from_defaults(token_limit=60000, chat_history=chat_history).
Telemetry Tracking Gets User ID: A member asked about passing custom telemetry attributes and attaching a header or param to the LLM network call when interacting with Llama Index, and a Colab notebook was shared.
- The Colab notebook shows how to attach a user ID to all events executed within a code block.
LlamaParse PDF Parsing Problem: A user reported that LlamaParse works for single PDFs but fails when processing two PDFs and asking the same question, potentially causing a system overload.
- The user described that the system literally cooked when handling multiple PDFs, indicating a potential overload or processing error.

Cohere Discord

Cohere names Models "Command": A member questioned why Cohere chose to name its language models Command suggesting, similar to database management, a query is essentially a command or instruction.
- Model selection is available in Coral, with Just Chat utilizing Command A without external sources.
Software Engineer seeks Cohere Career: A member is seeking new job opportunities as a software engineer and is excited to discuss potential projects related to websites or web applications.
- Another member shared a link to the Cohere careers page encouraging the user to explore available positions.
Bot Commands Get Test Run: Members are encouraged to test bot commands in the 「🤖」bot-cmd channel to ensure proper functionality and user experience.
- Feedback on bot commands is welcome.
Full-Stack Alchemist Ready to Build: A passionate developer with 8+ years of experience is skilled in building scalable web and mobile apps using modern frameworks like React, Angular, Flutter, and Swift.
- They craft intelligent AI solutions using Python, TensorFlow, and OpenAI, integrating cloud technologies (AWS, GCP, Azure) and microservices for global scaling.
Oracle Consultant Seeks Cohere Wisdom: A technical consultant with 12+ years of experience in Oracle ERP Fusion is eager to learn more about Cohere models and AI use cases for enterprise applications.
- A networking and CS student is aiming to work on open-source generative music projects, favoring tech tools like ChatGPT, Grok, Windsurf, and Replit.

Nomic.ai (GPT4All) Discord

GPT4All Faces Usability Complaints: Users express concerns about GPT4All's usability, mentioning issues such as inability to import models, search the model list, view model sizes, use LaTeX, or customize model list order.
- One user suggests GPT4All is losing users because other platforms are more user-friendly and open.
GPT4All Lagging on New Model Implementation: A user is frustrated that GPT4All has yet to implement Mistral Small 3.1 and Gemma 3, highlighting their multimodal capabilities.
- The user suggests that if GPT4All does not catch up by Summer 2025, they might switch away from Llama.cpp.
GPT4All Praised for Native RAG and Model Settings: Despite criticisms, GPT4All offers advantages such as native RAG and out-of-the-box functionality, with a user expressing confidence in the developers and anticipation for GPT4All v4.0.0.
- Another user appreciates GPT4All's model settings page for its comprehensive options and convenient model reload button noting that you need 2-3 clicks to setup out of the chat menu.

tinygrad (George Hotz) Discord

Members Asked to Close Stale PRs and Issues: George Hotz asked members to close any open pull requests (PRs) and issues that are stale.
- This request aims to clean up the project's repository by addressing outdated items.
Discussions on TinyGrad Codegen Internals: A member inquired about TinyGrad's code generation process, specifically asking about the location of CStyleCodegen or CUDACodegen as mentioned in the documentation.
- The documentation describes TinyGrad using different translators (Renderers or Codegen classes) such as C++ (CStyleCodegen), NVIDIA GPUs (CUDACodegen), Apple GPUs (MetalCodegen) to translate the optimized plan into code that the CPU/GPU can understand.
Boolean Indexing Implementation Explored: A member sought advice on efficiently creating evenly spaced points on a grid with a hole in it, similar to boolean indexing in PyTorch, suggesting this could be a useful contribution to TinyGrad.
- An LLM proposed a solution using masked_select to efficiently create the desired grid with a hole, leveraging the condition full.abs().max(axis=1) >= (math.pi/6) to filter points outside the hole.

DSPy Discord

Tackling DSPy Output Validation Fails: A member inquired about how DSPy handles output validation failures, specifically when an integer field expects a number from 1 to 10 but receives 101.
- There was no further discussion or links provided regarding this question in the channel.
Delving into DSPy Optimizers: A member is exploring the use of optimizers within DSPy and how they interact with docstrings and prompt management, referencing DSPy's official documentation.
- The issue found is that the Optimizer overwrites the prompt from the docstring, requiring optimized versions to be loaded from a json or pkl file.
Decoding DSPy's Optimization Process: It was clarified that DSPy's optimizer generates prompts and tests them on a dataset to identify the best-performing one, further detailed on the official website.
- The user found it VERY interesting how the optimizer may select N examples to include in the prompt, showcasing the kind of prompts generated.
DSPy: Declarative Self-improving Python Emerges: DSPy is a framework for programming rather than prompting language models to rapidly iterate on building modular AI systems, offering algorithms to optimize prompts and weights.
- Instead of brittle prompts, you write compositional Python code and use DSPy to teach your LM to deliver high-quality outputs.

LLM Agents (Berkeley MOOC) Discord

Mentorship MIA for Entrepreneurship Track: An entrepreneurship track student inquired about mentorship opportunities within the LLM Agents Berkeley MOOC.
- It was clarified that Berkeley does not provide any mentorship for the entrepreneurship track, though sponsors will host office hours in Apr/May.
Sponsor Office Hours Announced: Sponsors will be hosting office hours in April/May for the LLM Agents Berkeley MOOC entrepreneurship track.
- This provides an opportunity for students to engage with industry professionals and seek guidance on their projects.

Codeium (Windsurf) Discord

Gemini 2.5 Pro Surfs into Windsurf: Gemini 2.5 Pro is now available in Windsurf, granting users 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call; see the announcement on X.
- The update aims to enhance user experience with the latest model.
Windsurf Wipes Out on Gemini 2.5 Pro Rate Limits: Shortly after the release of Gemini 2.5 Pro, Windsurf encountered rate limits due to massive load for the model and provider.
- The team is working to increase quota and apologized for any inconvenience, aiming to get everyone surfing on Gemini 2.5 Pro ASAP.

Modular (Mojo 🔥) Discord

Foo[1] Defaults to Predefined Value: The self parameter in the context of the Foo[1] type can be automatically populated with a default parameter value.
- When self is discarded using _, the argument defaults to its predefined default value.
Self Parameter Clarification: The self parameter is Foo[1] with a default parameter value, which can be disregarded with _.
- Disregarding self with _ defaults to the predefined default parameter value.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Manus.im Discord ▷ #general (627 messages🔥🔥🔥):

Manus new Credit system Feedback, Alternative Energy for Manus GPU Farm, Cheaper AI Models like Deepseek and Qwen, Manus AI assistance for Exams, Manus UI Love

Community Outcry on Manus New Credit System: Many users expressed frustration with the new credit system, feeling it's too expensive and limiting, with some estimating costs could reach $500/month for decent usage, others blew through their 1000 free credits quickly and the credits are consumed even if the task fails.
- Users like that they are active to help but the shift from task-based to credits-based does feel jarring, especially when it wasn’t part of the original beta flow.
Brainstorming Alternative Energy for Manus GPU Farm: One member suggested that Manus could create a team dedicated to developing cheap renewable energy sources, such as molten sodium, thermal or solar to power their own GPU farm and reduce costs.
- They mentioned locating it in a desert and using flywheels as energy storage to keep it running at night for max efficiency.
Alternative AI Models Like Deepseek Considered: There was discussion around using cheaper AI models like Deepseek and Qwen instead of only Anthropic's Claude to reduce operational costs.
- However it has not been stated if Manus will allow other AI integrations
Manus AI assistance for Exams: Some users have used Manus alongside Kimi or Deepseek to upload seminar and lecture files, asking the AI to memorize them for exam preparation, this helped some receive scores such as 81/100 on assignments.
- Some were wondering if that violates the terms of service if you help the AI cheat for school.
Love the Manus UI Design: Multiple members praised the UI design of Manus, expressing that the design is really good, easy to use, simple and aligns with real world concepts.
- One user stated What made manus feel so amazing was not only the results you got, but that the idea of tasks closely aligned with real world concepts. That simplicity was genius.

Links mentioned:

Perplexity AI ▷ #general (1219 messages🔥🔥🔥):

Perplexity AI outages, DeepSeek AI, Claude AI, User Frustrations, T-Mobile Promo

Perplexity AI Servers Caving: Multiple users reported outages and disappearing history/spaces (example), prompting humor and frustration among the community.
- The official status page (status.perplexity.com) lacked timely updates, with users suggesting an automated user-reported outage system and proactive notifications.
DeepSeek AI not that deep: Members were generally disappointed in DeepSeek AI and it being able to understand complex instructions.
- They also commented on it giving unnecessary jargon, compared to better math apps.
Claude AI: A member asked about the context window limit of Claude AI, in comparison with Gemini and ChatGPT.
- Many members commented that Claude's context window was very limited, especially in comparison to Gemini.
Users vent frustrations: Users lamented lost notes and study materials due to the outages, with some jokingly blaming AI for ruining their exams.
- There was a discussion on the broader economic implications of Perplexity's $20/month subscription fee, especially when considering minimum wage workers in different countries.
T-Mobile Users Cashin: Users shared methods for obtaining free Perplexity Pro subscriptions via T-Mobile and Revolut promos.
- One member suggested using a burner number on T-Mobile to gain access.

Links mentioned:

Perplexity AI ▷ #sharing (10 messages🔥):

Shareable threads, Super Prompt, LLM Research

Shareable Threads Required: Perplexity AI requested a member to ensure their thread is Shareable, linking to a previous message in the Discord about this topic here.
Super Prompt Incoming: A member shared a link to a Perplexity AI search result for create a super prompt for copi here.
LLM Research collection shared: A member shared a link to a Perplexity AI collection about LLM research here.

Perplexity AI ▷ #pplx-api (7 messages):

API Parameter Error Handling, Llama Index RAG context with Perplexity Sonar, Deep Research Parity API vs Web

API Parameters Now Throw Errors: The API team implemented error handling for parameters like search_domain_filter, related_questions, images, and structured_outputs for non–Tier 3 users.
- If you previously achieved the desired results by passing your JSON schema within the prompt (instead of using the parameter), you’ll continue to see the correct behavior; nothing has fundamentally changed.
Sonar model struggles with Llama Index RAG context: A user asked for suggestions on how to pass Llama Index RAG context using the index object to Perplexity Sonar model.
- The user also asked if the Deep Research in the API will get closer to the one on perplexity.com, as it seems nerfed compared to the website.
Sonar API misses citations: A user reported an instance where the API with sonar model did not return any citations.
- The user noted that the same query on the client-side experience comes with citations.

Cursor Community ▷ #general (1251 messages🔥🔥🔥):

Gemini 2.5 Pro Pricing, Cursor infrastructure, Humanoid robots?, Codebase tag removed from cursor

Dan drops deep deets on DeepSeek 3.1 in Cursor: A Cursor team member shared that DeepSeek 3.1 should be available in the editor within 12 hours, but did not disclose the cost.
- There are also deals with providers and privacy mode ensures no data is stored, so that is nice.
Cursor Gets Crashed by Database Deployment Debacle: Cursor experienced a service-wide outage due to a database deployment issue within its infrastructure, affecting AI features like Chat and Tab.
- The incident was resolved after a few hours, and a team member quipped that they accidentally unplugged the main server to charge my phone.
Hot Takes on Humanoid Robots: Members debated the utility of humanoid robots, with some envisioning them as food-making and cleaning assistants, while others expressed concerns over data privacy and telemetry.
- Another member suggested that robotics will be where AGI emerges from, evolving in a virtual followed by real environment.
Members mourn missing @codebase tag on Cursor: Users noticed that the @Codebase tag was removed, and the staff explained that it was replaced by a similar way to scan current indexed project.
- This prompted discussion about token limits, pricing models, and the trade-offs between convenience and control when using AI coding tools.

Links mentioned:

LMArena ▷ #general (906 messages🔥🔥🔥):

O1 Pro drop, GPT-4o latest Benchmarks, Deepseek V3, Meta LLama, AI Safety

O1 Pro on Leaderboard, Coming Soon?: Members discussed the potential inclusion of O1 Pro on the leaderboard, speculating that OpenAI might absorb the costs to showcase its capabilities, particularly given its high price point.
- However, some doubted its ability to rank highly, with one commenter joking it would be ranked dead last lol due to latency.
GPT-4o's Coding Ability: Members debate GPT-4o's coding ability after recent updates, with some noticing improvements in instruction following and code generation.
- However, others insist that proper evals are needed, as one member argued that GPT-4o's ranking may be inflated due to specialized training for preferred response styles, rather than actual performance.
DeepSeek V3's Rise in Coding Benchmarks: The new DeepSeek V3 0324 model is gaining recognition, with one member noting it scores higher than GPT-4o R1 in SWE-bench according to this Reddit post.
- Data indicates that DeepSeek's V3 0324 release leapfrogs Claude 3.7 Sonnet in non-reasoning and has become the leading non-reasoning model for coding.
Meta's Emoji-Laden LLama Models: Members observed that recent anonymous models in the arena, believed to be from Meta, are displaying quirky behavior, including the addition of many emojis and a tendency to identify themselves as Meta Llama models, though their image recognition capabilities are notably inferior.
- The models being tested include: bolide, cybele, ginger, nutmeg, phoebe, spider, themis, though they also note that spider sometimes is identified as being GPT-4.
Discussions on AI Safety and Jailbreaking: Members discussed AI safety, mentioning that models like Claude are designed with constitutional AI principles, prioritizing objectivity over user preferences, which may affect their leaderboard rankings.
- A member also shared a Jailbreak Cookbook resource for LLM jailbreaks and AI safety, including a GitHub repository with implementations of systematic jailbreaks.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (580 messages🔥🔥🔥):

Elevenlabs Scribe V1 for audio event classification, OlmOCR loading in Unsloth, Fine-tuning LLMs for board games, Gemma 3 notebook quirks, Qwen Omni Hacking

Scribe V1 powers FoxMoans Utterance List: A member is using 11Labs Scribe V1 for audio event classification to create a list of utterances, which they estimate will cost them around $20k.
- They mentioned using it for audio event classification, suggesting it's ideal for projects needing mood-based analysis, since it can detect laughter, angry yandere style variations.
OlmOCR's Unsloth Integration Remains Rocky: A member is struggling to load OlmOCR (a finetune of Qwen2VL) in Unsloth, despite having Qwen2VL working.
- The Unsloth team followed up by asking if the user had tried the latest version, with a member noting that they've been pushing updates and fixes before the creator realized their models finished uploading.
Orpheus TTS Receives Fine-Tuning Notebook, Multilingual Support Explored: The Unsloth team released a notebook for finetuning Orpheus-TTS, highlighting its human-like speech with emotional cues.
- Members also discussed the possibility of changing the language of Orpheus from English to another language, suggesting that continued pretraining with new embedded/head layers may be sufficient.
LocalLlama Mods face fire over removing posts: A user complained about their post explaining the Llama Community License being removed from r/LocalLLama, leading to speculation that Meta/Facebook might be moderating the subreddit.
- Other members didn't care, noting that it's proprietary, and that Meta isn't really enforcing it, with one member saying I care once i see a c&d or actually any enforcement.
Multi-GPU Support Coming Soon, Pro not needed: A user asked about obtaining Unsloth Pro for multi-GPU support, but a member replied that multi-GPU support will likely become free under AGPL in the coming weeks.
- One user said preliminary stuff is already done at least for the first version, a few weeks.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

``

No Topics Discussed: There were no discussion topics found in the channel.
- The only content was an image attachment.
Image Attachment Present: An image was attached to the channel with a Discord CDN link.
- The image was not further discussed or analyzed.

Unsloth AI (Daniel Han) ▷ #help (68 messages🔥🔥):

Training Loss Interpretation, Gemma & Task Difficulty, Dataset Size & Overfitting, LM Studio Models, HF Upload & vLLM

Loss Landscape Lowdown Leaves Learners Lost: A user inquired about training loss decreasing early and staying near zero, questioning if sudden increases are bad signs, showing a graph.
- Another member suggested the task might be too easy for Gemma, leading it to stop learning, while advising the user to use Weights & Biases (W&B) for better graph visualization.
Double Trouble for BOS Token Causes Bedlam: A user reported finding a double BOS token issue in the latest Unsloth update (Gemma 3 4B) when checking tokenizer decoding.
- A hotfix was identified which removed the accidentally added token.
Unsloth Install Updates Unleash Unexpected Underperformance: Users reported experiencing massive issues when using the --no-deps flag during Unsloth updates, contrary to some existing instructions.
- A user strongly recommended updating all dependencies and highlighted outdated documentation, specifically pointing to the Unsloth documentation.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Orpheus-TTS, Voice Model Finetuning, UnslothAI

Unsloth tunes Orpheus-TTS!: Unsloth released a finetuning notebook for Orpheus-TTS, a voice model, available for free on Colab.
- It enables customized voices and dialogue 2x faster using 70% less VRAM via Unsloth.
Orpheus-TTS displays Emotions!: Orpheus delivers human-like speech with emotional cues (sighs, laughs) that outperform OpenAI.
- Unsloth showed an example of finetuning it on a 1000 row dataset for just 100 steps and managed to change the voice + personality of the model entirely!

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

Dynamic Quantization, DeepSeek-R1, ACDiT

Unsloth's Dynamic Quantization Deep Dive: A member inquired about the 'dynamic' aspect of dynamic quantization in Unsloth, asking whether weights causing more activation/quantization errors are identified and not quantized, with the rest quantized to 4 bits, based on the super weights paper.
- The question extends to how dynamic the kblams are and whether programs themselves can be encoded into the knowledge base, in other words how the quantization errors are calculated and which layers are more important, looking for a code base or regularization.
DeepSeek-R1 in GGUF and 4-bit formats available!: Unsloth made available various versions of DeepSeek-R1, including GGUF and 4-bit formats.
- Unsloth's DeepSeek-R1 1.58-bit + 2-bit Dynamic Quants is selectively quantized, improving accuracy over standard 1-bit/2-bit.
ACDiT paper appears: The AutoConditional Diffusion Transformer (ACDiT) paper was shared: https://arxiv.org/abs/2412.07720.
- The paper describes a combination of autoregressive and diffusion paradigms for modeling continuous visual information, using Skip-Causal Attention Mask (SCAM) on standard diffusion transformer during training, and autoregressive decoding during inference using KV-Cache.
Reasoning in GRPO notebooks is asked about: A member asked about ways to play with the reasoning in the GRPO notebooks, specifically separating and modifying the reasoning before the model provides its final answer.

Links mentioned:

OpenAI ▷ #ai-discussions (305 messages🔥🔥):

Gemini 2.5 Pro vs GPT-4o, Google AI Studio, Perplexity for News and Current Events, Claude vs GPT for Reasoning, AI Transcription Tools

GPT-4o Edges Out Gemini 2.5 Pro in Coding: Members debated the coding capabilities of GPT-4o versus Gemini 2.5 Pro, with some finding GPT-4o superior for coding tasks, challenging initial impressions, while others favored Gemini overall or for specific tasks like C++ and WinAPI integration.
- It was mentioned that GPT-4o leads in the coding arena, but many third party benchmarks show that Gemini 2.5 Pro performs better overall for coding, except that GPT-4o wins 3 out of 6 categories.
Free Google AI Studio Steals the Show: Users discussed the benefits of using Google AI Studio, noting its free access to models like Gemini 2.5 Pro and highlighting its generous prompt limits compared to paid services like ChatGPT Plus.
- Members reported using hundreds of messages daily without hitting limits and shared a handy comparison resource, leading one to cancel their ChatGPT subscription due to AI Studio's advantages.
Perplexity Masters News Beat Over ChatGPT: Members found Perplexity to be better than ChatGPT for accessing news and current events, highlighting its Discover tab for the latest news, and that Perplexity is more than just a GPT wrapper.
- However, one user found Perplexity's Deep Research feature to have issues with quality and reliability, especially for research on uploaded files, recommending ChatGPT instead.
Claude 3.7 Sonnet Reigns Supreme for Reasoning: Members praised Claude 3.7 Sonnet for its reasoning capabilities, noting its superior explanations compared to other AI models, especially because free tier Claude fills up and forces you to start a new chat.
- It was suggested that models like o1, o3-mini-high, and Grok 3 are all great for coding, though one member found o1 the best at complex coding using C++, Physics, Rendering and old APIs such as Win32API.
Decoding the Best Free AI Transcription Tool: Members sought advice on free AI transcription tools, with one suggesting running Whisper locally and another noting the difficulty of installing the necessary Python packages, and the complexity of resolving any dependency issues.
- In order to resolve these issues, users should try to use a cloud based solution which often works immediately without the need to install and troubleshoot local packages.

OpenAI ▷ #gpt-4-discussions (8 messages🔥):

Image generator, GPT-4.5 Error, GPT models for summarization, AI voice chatbot

Image Generator Needed: A member is seeking an image generator that allows copying and pasting scenes from a script to convert into a cartoon style, without requiring frequent sign-ups.
- They also complained about the new image generation being broken and failing to generate images.
Is GPT-4.5 error common?: A member reported getting a GPT-4.5 error in message stream that randomly started dying, and they could not continue those chats.
- The errors started since yesterday.
Best GPT Model for Text Summarization: A member asked which GPT model is best for summarizing and analyzing tens of thousands of words of text, noting their experience with o3 mini, mini high, and o1.
- Another member suggested GPT-4.5 and o1, while advising against using 4o for this particular task.
STT Integration Troubleshoot: A member is developing an AI voice chatbot that integrates speech-to-text (STT), a language model (LLM), and text-to-speech (TTS) but is facing compatibility issues with OpenAI versions.
- The chat completion feature only functions with OpenAI versions earlier than 1, while they are currently on version 1.66, and the store:true command doesn't execute as expected.

OpenAI ▷ #prompt-engineering (83 messages🔥🔥):

Yu-Gi-Oh! card art prompting, Microsoft PromptWizard, ChatGPT prompting methods, Hierarchical communication with markdown, AI prompt engineering

Prompting Yu-Gi-Oh! Card Art Style: A member seeks advice on improving prompts to generate art in the style of Yu-Gi-Oh! trading cards, noting that ChatGPT tends to default to comic art instead.
- The user has already tried using prompts like "Render this character in the style of a Yu-Gi-Oh! trading card illustration. Use sharp, clean digital art..." with incremental improvements.
Microsoft PromptWizard Usage: A member inquired about experiences using Microsoft PromptWizard for custom data, seeking insights from the community.
- No responses were provided in the given text.
Unlock the Best of ChatGPT: A member asked about secret prompts or methods to maximize ChatGPT's potential, feeling there's more to leverage.
- Suggestions included using prompt conditions and disclaimers before providing prompts.
Darth's Prompting Primer: A member shared a structured approach to teaching effective prompting techniques, including hierarchical communication using markdown, abstraction through open variables, and reinforcement strategies.
- This approach includes a shareable ChatGPT link and an emphasis on ML format matching for better compliance.
Enhanced Image Prompting with ChatGPT: A member expressed delight with the new ChatGPT image tool, noting its improved adherence to prompt requirements and ability to handle complex scenarios such as generating a moving market on a giant turtle's back with a sun and three moons.
- The user found that the new tool was much better at making changes without impacting the image as a whole, such as removing stars from a night scene after initially including them.

OpenAI ▷ #api-discussions (83 messages🔥🔥):

Yu-Gi-Oh! card art prompting, Microsoft PromptWizard, ChatGPT prompting tips, Hierarchical communication with markdown, GPTs in conversation

Yu-Gi-Oh! Art Style Prompting Tweaks: A user seeks advice on improving prompts for generating Yu-Gi-Oh! card art, noting success with Ghibli and photorealism styles but struggles with the desired Yu-Gi-Oh! aesthetic.
- They provided examples and their current prompt focuses on sharp, clean digital art with stylized anime rendering, glowing magical effects, and a dynamic pose.
PromptWizard Users Unite: A member inquires about experiences with Microsoft PromptWizard for custom data applications.
- Others are seeking secret prompts or methods to maximize ChatGPT's potential.
Prompting Strategy: Condition and Disclaimer: A member advises adding a prompt condition and disclaimer before the main prompt to guide ChatGPT's output.
- A link to a primed session demonstrates hierarchical communication with markdown, abstraction through open variables, reinforcement, and ML format matching for compliance.
GPTs join Conversational Forces: A user joyfully discovers the ability to invoke custom GPTs within a ChatGPT conversation by typing @.
- Another user notes that they like being able to dictate tool use, and that the new imagegen seems to be performing strongly against the strange requirements I am giving it.
Markdown Misunderstanding in AI Prompts: One user points out that this channel's no markdown rule is lazy, advocating for its use in educating others since it's the language the AI uses.
- They argue that code blocks, while providing formatting, add an unnecessary abstraction layer, potentially confusing users unfamiliar with the format and causing them to freeze.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Fount AI Character Interactions Framework, Gideon project

Fount Framework for AI Character Interactions Emerges: A member shared the Fount project, an extensible framework for building and hosting AI character interactions using pure JS.
- The framework offers flexibility via modular components, custom AI source integration, powerful plugins, and a seamless cross-platform chat experience.
Gideon Project Surfaces on GitHub: A member shared the Gideon project on GitHub.
- No further details were provided about the project's purpose or functionality.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (327 messages🔥🔥):

Gemini 2.5 Pro Access and Limitations, OpenRouter AI SDK Configuration, Free Models with Function Calling, Token Per Second Performance for Coding Models, OpenAI Responses API

Gemini 2.5 Pro: Rate Limits Trigger User Lament: Users are reporting low rate limits for Gemini 2.5 Pro, even after adding their own AI Studio API keys, leading to discussions on how to maximize free quota and manage usage for real applications.
- A member pointed out the model won't be free forever so that windsurf has to start charging for it, that's gonna be a problem.
AI SDK Provider options, nested order array is an ongoing struggle.: Members are actively debugging OpenRouter AI SDK provider options, particularly the use of providerOptions to specify model order and fallback behavior.
- The issue is around whether nesting the order array under the provider key is correct, with debugging attempts showing unexpected provider selection despite configured order. The team acknowledges it's a bug and looking to address the AI SDK issue, hopefully.
Seeking Function Calling Nirvana in free LLMs: Members are searching for free models that support function calling, with some suggesting Mistral Small 3.1 and Gemini free models as potential options.
- Another member mentions, Gosh, I'm trying so hard to find a free model that supports function calling. I can't find any!.
Model TPS face-off: Gemini Flash 2.0 vs the World: Members are debating the tokens per second (TPS) performance of various coding models, with Gemini Flash 2.0 mentioned for its speed, but also facing some criticisms of being trash, something about their hosting is messed up.
- Groq serves the 70B R1 distil at 600tok/s and one member chimed in that it isn't good at coding imo.
OpenAI Responses API support?: A member inquired about OpenRouter supporting the OpenAI Responses API, and one of the OpenRouter team flagged a couple gotchas with it.
- The member asking wanted good image to video and the OpenRouter team suggested that Veo2 API is going to be your best bet for SOTA, but it's about 50 cents per second of video.

Links mentioned:

MCP (Glama) ▷ #general (299 messages🔥🔥):

MCP server config, Prompts and ICL, Ollama models and MCP, Google search integration, Oterm client and MCP

Agent Instructions and Tool Usage: Members discussed best practices for instructing agents on tool usage, particularly regarding the order of tool calls, referencing Cline's system prompt for ideas.
- A member suggested prompting directly on the server, to look like First call ${tool1.name}, then ${tool2.name}.
Prompts for In-Context Learning (ICL) Gain Traction: It was stated that MCP servers can supply instructions to encourage specific agent behaviors, such as tool usage, using prompts for ICL.
- One member shared a link on using prompts for ICL and a test showing it working.
Ollama Model Configuration Confusion Persists: A member had issues connecting a local LLM via Ollama to an MCP server and asked for guidance.
- It was suggested to use oterm with this MCP config and replace the content of the config file, furthermore stating that the default 4-bit Ollama models are often insufficient for proper tool usage, recommending 8-bit versions for better performance and more models available here.
Adding Google Real-Time Search Tool to MCP Discussed: A member inquired about adding Google Search to MCP, and another member shared their configuration
- They noted that users need to obtain their own Google API key and engine ID to use the configuration.
Discover Blender MCP Servers: A user successfully used the tool and tabulated servers related to blender, finding multiple Blender Model Context Protocol Servers.
- BlenderMCP (GitHub), Blender MCP Server (GitHub), Unreal-Blender-MCP (GitHub), Bonsai-mcp (GitHub), and Tripo MCP Server (GitHub).

Links mentioned:

MCP (Glama) ▷ #showcase (9 messages🔥):

Canvas MCP, Docker Compose for MCP Servers, Model Context Protocol (MCP) Explanation, Speech MCP, Gradescope Integration

Canvas MCP Lets Agents Talk to Canvas LMS: A member created a Canvas MCP server, enabling AI agents to interact with Canvas LMS, and also added an agent that can autonomously crawl Gradescope to find info, available at Canvas-MCP.
- The tool offers features like finding relevant resources, querying upcoming assignments, and accessing courses and assignments from Gradescope.
All-In-One Docker Compose for Self-Hosting 17 MCP Servers: A member created an all-in-one Docker Compose setup for easily self-hosting 17 MCP servers using Portainer, with Dockerfiles sourced from public GitHub projects (MCP-Mealprep).
- Another member suggested to not bind the containers on 0.0.0.0 unless you need this accessible remotely and to include in the readme an example mcp config json.
Model Context Protocol (MCP) Explained: A team member shared a blog post introducing MCP (Model Context Protocol), describing it as an open standard released by Anthropic in late 2024.
- The blog post describes it as a the USB-C of AI integrations that lets Large Language Models powering tools like Claude or ChatGPT communicate with external data sources and tools.
Speech MCP Demoed: A member shared a link to Speech MCP, with a YouTube Shorts demo.

Links mentioned:

aider (Paul Gauthier) ▷ #general (216 messages🔥🔥):

R1 vs O3 Mini, Anthropic Thoughts Microscope, GPT-4o Update, OpenRouter Limits, Running Local Aider Branch with UV

OpenRouter R1 Model Underperforms: A member found the free R1 model on OpenRouter to be "stupid", verbose, and ineffective at solving broken tests, especially with repomap enabled, unlike O3-mini.
- It's speculated that the free R1 model is a quantized version of DeepSeek, possibly in FP8 format, while the DeepSeek on the leaderboard is from the official DeepSeek team.
GPT-4o Aces Coding Arena: The latest ChatGPT-4o update jumps to #2 on the Arena leaderboard, surpassing GPT-4.5, tying #1 in Coding, Hard Prompts, and performing in the Top-2 across ALL categories while costing 10x less.
- However, this update is confusingly released as chatgpt-4o-latest endpoint, which is priced at $5/$15 per million input/output tokens, whereas the API snapshots are priced at $2.5/$10, so caution is recommended when moving workloads, according to Artificial Analysis.
Constant Context Architecture is a Game Changer: Constant Context Architecture (CCA) is proposed as a solution for working with large codebases using LLMs, guaranteeing that the necessary context for modifying any module will always fit within an LLM's context window, regardless of the total codebase size, as described in this blogpost.
- This is achieved by ensuring modules have bounded size, interfaces, and dependencies, making context gathering a bounded operation.
Aider's Context Command Automates File Management: The new /context command automatically identifies relevant files for a given request and adds them to the chat, as discussed in this discord thread.
- It's particularly useful for large codebases and saves time by automating the process of manually adding files.
Multiple API Keys: OpenRouter users are rotating through many api keys to make sure they can make the most requests possible, while avoiding rate limits.
- However, Google has been known to suspend accounts if they detect multiple keys or abuse from a single user.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (31 messages🔥):

AiderMacs, Cargo Build Integration, Gemini 2.5 Pro Rate Limits, Aider Architect Mode, Model Combinations

Debugging Aidermacs Integration: A user mentioned debugging a bug in Aidermacs, which they were using to invoke Aider and more closely integrate it into Emacs, describing it as "shaving a yak".
- The user also clarified a lint-cmd configuration detail, noting it should be echo and not test.
Fresh Starts Beat Going in Circles: A user asked about using /undo iteratively versus continuing a chat, and another user recommended using /clear to start fresh if you’re going in circles.
- The conversation highlighted that /undo retains memory of the last action, while /clear purges the entire chat history.
Cargo Build Integration Wishlisted: A user inquired about integrating cargo build with Aider, to pipe errors/warnings back to the model for resolution.
- While no direct solution was provided, the query suggests a desired feature for enhanced code debugging workflows.
Gemini 2.5 Pro rate limits frustrate: Multiple users reported hitting rate limits with Gemini 2.5 Pro, even when seemingly below the documented 50 requests/day, with one noting the existence of a 2 requests/minute limit.
- There was discussion on whether purchasing a paid account would resolve the limitations, with mixed results reported, along with a potential fallback model implementation.
Architect Mode Deep Dive Requested: A user sought a deeper understanding of Aider's architect mode, contrasting it with their existing workflow using regular mode with models like Sonnet 3.5 and Gemini 2.5 Pro, guided by an aider.rules.md file.
- They expressed a desire to optimize their process, potentially leveraging architect mode to avoid redundant work or overpaying for model usage and another member viewed it as a combo of ask + code.

Links mentioned:

Latent Space ▷ #ai-general-chat (26 messages🔥):

GPT-4o Update, OpenAI Image Generation Policy, Devin Wiki Launch, AI Writing Editing

GPT-4o Jumps to #2 on Arena!: The latest ChatGPT-4o (2025-03-26) jumps to #2 on Arena, surpassing GPT-4.5 with a significant improvement over the January version (+30 pts, #5->#2), and tying #1 in Coding, Hard Prompts according to this tweet.
New 4o Policy Allows More Creative Freedom: OpenAI launched native image generation in ChatGPT through 4o, shifting from blanket refusals in sensitive areas to a more precise approach focused on preventing real-world harm as explained in this blog post.
Devin Indexes Repos with Devin Wiki: Devin now automatically indexes your repos and produces wikis with architecture diagrams, links to sources, and more according to this tweet.
AI Helps Edit Non-Technical Writing: Members discussed using Claude and GPT for editing non-technical writing, with one member noting that Claude validates everything I write and GPT wants to rewrite everything I write and this user wants a better middle ground.

Links mentioned:

Latent Space ▷ #ai-announcements (3 messages):

Dharmesh Shah, HubSpot, Agent.ai, hybrid teams, Claude Plays Pokemon hackathon

Dharmesh Shah joins Latent Space: Latent Space shared a conversation with Dharmesh Shah, co-founder of HubSpot and creator of Agent.ai.
- The episode is about the next evolution in workplace organization where human workers collaborate with AI agents as team members.
Attendees can join Claude Plays Pokemon hackathon: Folks in SF: Join us for the Claude Plays Pokemon hackathon this Sunday!
- It's a real catch 'em all opportunity.
Members are encouraged to fill out 2025 State of AI Eng survey: Members not in SF are encouraged to Fill out the 2025 State of AI Eng survey for $250 in Amazon cards!
- Act fast for a chance to get some Amazon money.
Hybrid teams are the future: A particularly compelling concept we discussed is the idea of "hybrid teams" - the next evolution in workplace organization where human workers collaborate with AI agents as team members.
- This raises interesting questions about team dynamics, trust, and how to effectively delegate tasks between human and AI team members.

Link mentioned: The Agent Network — Dharmesh Shah: Dharmesh Shah on Intelligent Agents, Market Inefficiencies, and Building the Next AI Marketplace

Latent Space ▷ #ai-in-action-club (189 messages🔥🔥):

LLM Codegen Workflow, Documentation for LLMs, Memory-Ref Tool, Cursor IDE, Self-Improving Agents

Harper Reveals LLM Codegen Workflow: A member shared their LLM codegen workflow, emphasizing brainstorming specs, planning, and executing with LLM codegen in discrete loops.
- This workflow is built upon personal work, conversations with friends, and best practices from various internet sources, though the poster notes that it will probably not work in 2 weeks, or it will work twice as well.
Docs.dev Plugs into GitHub for Streamlined Documentation: Docs.dev was shared for generating docs directly from the codebase and keeping them up to date as code changes.
- The tool enables users to generate, audit, or analyze Markdown docs with AI, offering both a rich text editor and Markdown options.
Nuvic's FZF Kit Extends Neovim's Fuzzy Finding: A member linked to fzf-kit.nvim, a Neovim plugin that extends fzf-lua with additional utilities.
- This plugin enhances Neovim's fuzzy finding capabilities, improving file and code navigation.
Memory-Ref Tool Aids LLM Context Retention: Members discussed using memory-ref tools to create and query a knowledge graph of memories for LLMs, helping them retain context across sessions.
- One user highlighted the integration of Cursor IDE with Graphiti, using Graphiti’s Model Context Protocol (MCP) server for persistent memory, as detailed in this Hacker News post.
llms.txt Emerges for Website-LLM Coordination: A member shared llms-txt, a file to help language models use websites more effectively.
- The discussion touched on the broader topic of self-improving models and how to guide LLMs with structured documentation.

Links mentioned:

LM Studio ▷ #announcements (1 messages):

LM Studio 0.3.14 Release, Multi-GPU Controls, GPU Management Features, Beta Releases, Advanced GPU Controls

LM Studio 0.3.14 Emerges with Multi-GPU Mastery: LM Studio 0.3.14 is out, featuring new granular controls for multi-GPU setups, accessible via in-app update or from https://lmstudio.ai/download.
- This version introduces capabilities to enable/disable specific GPUs, choose allocation strategies (evenly, priority order), and limit model weights to dedicated GPU memory, with some features initially exclusive to NVIDIA GPUs.
New Knobs for LM Studio GPU Gurus!: LM Studio 0.3.14 introduces new controls for managing GPU resources, including enabling/disabling individual GPUs and choosing allocation strategies.
- Specific CUDA features, like "Priority order" mode and "Limit Model Offload to Dedicated GPU memory" mode, aim to improve stability and optimize for long context on single GPU setups.
LM Studio's Cheat Codes for GPU Controls: LM Studio 0.3.14 introduces shortcuts to open GPU controls: Ctrl+Shift+H (Windows) or Cmd+Shift+H (Mac) and pop-out window via Ctrl+Alt+Shift+H (Windows) or Cmd+Option+Shift+H (Mac).
- Using the pop-out window, you can Manage GPU settings while models are loading.

Links mentioned:

LM Studio ▷ #general (74 messages🔥🔥):

Threadripper vs EPYC, LM Studio UI, Visualize LLM calculations, Model details error in LM Studio, Continue VSCode extension

Threadripper Teaches EPYC a Lesson: Members discussed whether Threadripper is consumer or professional grade, with some noting that while technically HEDT (High-End Desktop), AMD does not promote EPYC for home users unlike Threadripper.
- One member shared a GamersNexus review of the AMD Ryzen Threadripper 7960X, highlighting its 24 cores and relatively affordable cost compared to professional workstations.
Calculations of LLMs Visualized?: A member asked about visualizing calculations performed by a model, i.e., mapping a value to a pixel color.
- Another shared bbycroft's LLM Visualization and recommended 3b1b's playlist on LLMs, along with a book on building LLMs from scratch for a deeper understanding.
Studio SDKs Spark Curiosity: A member inquired about where LM Studio invokes the model and how it forces the use of <think> and </think> tags.
- Another member clarified that LM Studio is not fully open source, only the SDKs are, and pointed to the llama.cpp and MLX engine GitHub repositories for the relevant source code.
Studio's Error Plagues: A user reported experiencing a Model details error: fetch failed issue on Windows 11, despite trying various solutions like using the Hugging Face proxy, changing DNS settings, and using a VPN.
- One suggestion involved checking for the "killer network service" and provided an Intel support article to address potential network-related conflicts.
Continue Codes Confidently: A member asked about connecting LM Studio to VSCode via an extension.
- Another member shared a link to Continue.dev, describing it as a platform for creating custom AI code assistants that can autocomplete code in any programming language.

Links mentioned:

LM Studio ▷ #hardware-discussion (71 messages🔥🔥):

ROCm Support, P100 vs 6750xt, Nvidia vs AMD, Mac Pro 2013 for LLMs

ROCm Support Still Murky: Users discuss the current state of ROCm support in LM Studio, with one user initially misreading documentation and hoping for ROCm support on their 7800 XT.
- It was clarified that ROCm is only supported on cards with GFX1030, 1100, and 1101 runtimes.
P100 Trashed, 6750xt Crowned: A user inquired about using a P100 16GB for a hobby, but was advised against it, being called basically e-waste compared to a 6750xt.
- The 6750xt is considered a much better, more modern card that works through Vulkan, while the P100, with its unsupported CUDA versions, is deemed not worth it.
Nvidia Cards not so issue-free: After experiencing lagging and choppy text visuals with AMD on Windows, a user considered switching to NVIDIA but heard that 40 series to 50 series wasn't a big jump and 5080 is completely gimped on VRAM.
- They expressed concern that Nvidia has given up on GPUs and is just going to produce server chips now.
Mac Pro 2013 Doomed for LLMs: A user considered using a 128GB RAM trash can Mac Pro (2013) for running LLMs due to its quiet operation and aesthetics.
- However, it was pointed out that LM Studio is not available for Intel Macs, and the Xeon v2 CPUs in those models lack AVX2 support, limiting their usability.

Eleuther ▷ #general (23 messages🔥):

transformer storage errors, torchtune use cases, self-awareness in language models, bias-augmented consistency training (BCT), adaptive compression + intelligent routing for distributed systems

Transformer Storage Woes Mislead Users: A user found that insufficient storage caused misleading error messages in transformers v4.50.0, pointing to library issues instead of storage; a PR for better error handling is planned.
- The user had to resort to df -h to diagnose the 100% full system, suggesting a check for sufficient capacity before downloading model shards.
Torchtune: Code diving encouraged: A user found torchtune requires downloading and editing 200-line PyTorch scripts and YAML files for customization.
- Another user countered that this approach provides a complete view of the process, avoiding the need to dissect Hugging Face's implementations.
Introspection Training Sparks Excitement: A member suggested emulating self-awareness in LMs by creating a representation of their circuits and feeding it back, inspired by Anthropic's work.
- Another member supported the idea, linking to a paper on bias-augmented consistency training (BCT) as a validation measure for introspection methods.
Adaptive Compression Boosts Distributed Systems: A member is developing an infrastructure layer that optimizes model transmission and deployment across distributed systems using adaptive compression and intelligent routing to tackle bandwidth waste and inference latency.
- This infrastructure is particularly useful for scaling larger models and is offering a demo to those interested in distributed inference.

Link mentioned: Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought: While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for...

Eleuther ▷ #research (2 messages):

Architectural inductive biases, Neural-guided CoT, Reasoning-adjacent work

Interests in AI Research Areas Queried: A member inquired about specific areas of interest, such as architectural inductive biases, neural-guided CoT, or reasoning-adjacent work.
- The inquiry aimed to narrow down the scope of discussion within the research channel.
Follow up on AI Research Hot Topics: Someone showed interest in exploring cutting-edge AI research topics.
- The discussion intends to highlight recent progress and potential future directions in the field.

Eleuther ▷ #interpretability-general (83 messages🔥🔥):

Neural Networks as Bodies Without Organs (BwO), Mechanistic Interpretability (Mech Interp) Critique, Specialized Heads in Neural Networks, The Hydra Effect, Reasoning Models for AI Safety

**Neural Nets as **Bodies Without Organs (BwO)****: Based on a tweet, Neural networks don't have organs, aren't made of fixed mechanisms, and instead have flows of information and intensities of neural activity, which in the words of Gilles Deleuze, are Bodies Without Organs (BwO).
- One member rejects the concept of mechanistic interpretability, arguing that neural networks generalize without fixed mechanisms; Descartes saw this 400 years ago.
Critiques of Current Mech Interp Approach: One member argued that mechanistic interpretability has taken a wrong turn since the IOI work and the notion of mechanism has been distorted into something incredibly input specific.
- They argue that most mech interp is equivalent of FMRI scanning, which is notoriously prone to bad descriptions.
The Hydra Effect Complicates Mech Interp: The Hydra Effect paper complicates any "mechanistic" understanding of neural networks because functionality is irreducibly distributed all over the place.
- Theorizing mechanisms are supposed to have localized functionality, it questions if re-parametrization causes a model's original behavior.
CoT is Actually Good: Despite sketchy anecdotes, CoT is actually good and is one of the best interp tools, says a member.
- They linked to new research on how CoT monitoring can be far more effective than monitoring agent actions and outputs alone, and we further found that a LLM weaker than o3-mini, namely GPT-4o, can effectively monitor a stronger model.
"Non-Member" Status Can Be Effectively Gamed: One member highlighted research showing that it's difficult to define dataset membership via n-gram overlap.
- Completion tests still succeed even when sequences are non-members, showcasing the difficulty in finding a single viable choice of n for membership definitions.

Links mentioned:

Eleuther ▷ #lm-thunderdome (19 messages🔥):

MMLU pro dataset path, MMLU pro process_doc function, MMLU pro eval modifications, MMLU pro COT content, LM harness selecting dataset

Dataset Path Points to MMLU pro Download Location: A member inquired if changing the dataset path in the lm-evaluation-harness's _default_template_yaml can change the place where MMLU pro is being downloaded, as it appears to be a HF repo ID.
MMLU Pro Lacks Dedicated Processing Function: A member noticed the absence of a dedicated process_doc function for MMLU Pro in lm-evaluation-harness's process_docs.py and inquired about the processing mechanism.
- The response clarified that the other subtask configs use the base template with include and specify subtask-specific fields, and that lm-evaluation-harness's utils.py is used to filter samples.
Tweaking MMLU Pro Dataset: Editing Tips: A member asked if slight modifications in the evals, such as changing the order in the MMLU pro dataset or removing a particular choice, would be sufficient by changing the default task YAML and utils for mmlu-pro.
COT Content Strictly for Few-Shot Examples: A member inquired about the relevance of cot_content in the MMLU-pro dataset during evaluations, noting the regex pattern matching in the initial lines by llm-harness.
- It was clarified that the COT content is solely used to format the few-shot examples, requiring Answer: Let’s … instead of A: Let’s … in the dataset and they are simply adding a reference answer to each fewshot, which will not be added after the main question.
Mastering Few-Shot Selection: A member inquired about controlling which 5 samples are ingested in the few shot.
- They were directed to the lm-evaluation-harness's documentation that covers selecting and configuring a dataset.

Links mentioned:

Eleuther ▷ #gpt-neox-dev (2 messages):

Dependency Issue, Test Understanding

Dependency Issue Surfaces: A member suggested that the tests raised an issue with dependencies.
- It was decided that fixing this dependency problem is a lower-priority compared to active projects.
Differing Test Interpretations: A user expressed their understanding of certain tests, seeking validation from another user.
- The discussion revolves around the interpretation of specific test outcomes and their implications.

GPU MODE ▷ #triton (9 messages🔥):

local tensor element repetition, torch.Tensor.expand() porting to triton, tl.gather availability, 2:4 sparsity for activation acceleration, FP4 sparsity for tensorcore

Tensor Element Repetition Troubles: A member inquired about repeating elements of a local tensor, noting they could pass repeated indices in ptr to load() but not to a local tensor index.
- Another member suggested using tl.store then tl.load with repeated indices in a temporary tensor, but was unsure of the performance.
torch.Tensor.expand() Expedition: A member is trying to port code that uses torch.Tensor.expand() to triton.
- The member noted that tl.gather could achieve this, but it is not yet released.
tl.gather Getting Closer: A member mentioned that tl.gather could solve their element repetition problem, but it is not yet released.
- Another member pointed out that it's possible to compile triton from source, with instructions available in this discord thread.
Sparsity Speeds Up Squared-ReLU: A paper was linked discussing the use of 2:4 sparsity for activation acceleration in LLMs, claiming up to 1.3x faster FFNs with no accuracy loss using Squared-ReLU activations, see Acceleration Through Activation Sparsity.
- One of the members stated Now we need FP4 with sparsity for an effective 2-bit tensorcore performance.

Link mentioned: Accelerating Transformer Inference and Training with 2:4 Activation Sparsity: In this paper, we demonstrate how to leverage 2:4 sparsity, a popular hardware-accelerated GPU sparsity pattern, to activations to accelerate large language model training and inference. Crucially we ...

GPU MODE ▷ #cuda (4 messages):

CUDA Profiling, Nsight Compute, Nvidia's Profiling Software

User Confused by Nvidia's Profiling Software: A user expressed confusion about the state of CUDA profiling, listing several Nvidia tools like nvprof, Nvidia Visual Profiler (nvvp), and various Nsight packages, and their varying features.
- The user seeks guidance on the best software to profile and optimize a single kernel invocation and clarity on the different Nsight options.
Nsight Compute recommended for single kernel profiling: One user suggested that for single kernel profiling, Nsight Compute is the best tool and linked to Nvidia's documentation.
- They also shared a talk that goes in depth on using it.
User wants Clarification on Which Nvidia Profiling Tools to Use: A user desires a clear answer, like *"Yes, most people use X, ignore Y, Z and W, those are old packages that Nvidia doesn't maintain anymore, and are really only still up for legacy users."
- This highlights the need for official guidance on the current recommended tools for CUDA profiling.

GPU MODE ▷ #torch (1 messages):

PyTorch Profiler, save calls, detach calls, copy calls

PyTorch Profiler Troubles with Save Calls: A member is having trouble pinpointing the exact spots where save is called in PyTorch profiler traces.
- They're seeing many detach/copy calls that they believe are related, but then encounter a significant gap in the trace without any activity in any stream/thread.
Debugging PyTorch Profiler Save Issues: The user is facing challenges in identifying the precise locations of save calls within PyTorch profiler traces.
- The trace shows numerous detach and copy calls, which the user suspects are connected to the save operation, followed by a notable absence of activity across all streams and threads.

GPU MODE ▷ #jobs (3 messages):

Red Hat, Software Engineer, C++, GPU kernels, CUDA

Red Hat Recruiting C++/CUDA Experts: Red Hat is hiring full-time software engineers with experience in C++, GPU kernels, CUDA, Triton, CUTLASS, PyTorch, and vLLM.
- Interested candidates should email a resume and summary of relevant experience to [email protected], including "GPU Mode" in the subject line.
Red Hat Job Posting: Red Hat is seeking software engineers proficient in C++, GPU kernels, CUDA, Triton, CUTLASS, PyTorch, and vLLM.
- To apply, email [email protected] with a summary of your experience and resume, remembering to include "GPU Mode" in the subject line.

GPU MODE ▷ #pmpp-book (1 messages):

PMPP 4th edition errata, Fig 5.2 error

PMPP Fig 5.2 Error Spotted: A user pointed out an erratum in the 4th edition of PMPP, specifically in Fig 5.2 on Page 98.
- Both blocks in the image are labeled as Block (0, 0), but based on shared memory and thread indexes, they should be different blocks.
Reporting Errata for PMPP: A user inquired about the proper channel to report errata for the PMPP book.
- The specific concern relates to Fig 5.2 where the block labels appear to be incorrect.

GPU MODE ▷ #off-topic (5 messages):

Miyazaki AI Art Scolding, AI Art Ethics, Studio Ghibli AI Art

Miyazaki Mocks AI Art in Resurfaced Clip: A 9-year-old meme resurfaced showing Hayao Miyazaki's critical reaction to AI-generated art, specifically when Kawakami, a founder of Niconico, presented it to him.
- It was suggested Kawakami should have proposed a "smarter" use-case, referencing the potential for Disney robots, which contrasts with the simpler applications of reinforcement learning available in 2016, like playing Atari games via OpenAI Gym.
AI Art Sampling Mirrors Fast Fashion Morality: The ethics of using AI art were compared to buying from fast fashion companies like Shein, suggesting it supports an immoral business model but offers affordable access.
- The analogy highlights the tension between profiting from readily available AI-generated content and the potential exploitation of original material created by smaller teams, similar to how giant labels sample from lesser-known artists.

Link mentioned: Tweet from Nuberodesign (@nuberodesign): Since this utter garbage is trending, we should take a look at what Hayao Miyazaki, the founder of Studio Ghibli, said about machine created art.Quoting Grant Slatton (@GrantSlatton) tremendous alpha ...

GPU MODE ▷ #triton-puzzles (7 messages):

Triton Puzzle 12, tl.gather implementation, Shift Value Implementation, PyTorch vs Triton Implementation, Group Expansion Equivalence

Triton Puzzle 12 Stuck Points: A member is stuck on Triton puzzle 12 and seeks help with implementing the repetition of shift values without tl.gather, which hasn't been released yet.
- The member also questioned whether this was the appropriate channel for discussing Triton puzzles.
tl.gather Absence Discussed: A member inquired about implementing shift value repetition without using tl.gather, noting its unavailability.
- Another member clarified that the previous message may have been from a bot.
Group Expansion Clarification: A member seeks understanding of a solution's approach, specifically why performing group expansion first (by repeating indices in the load) is equivalent to the PyTorch spec's method of extracting shift values and then expanding the group.
- The member posits that the equivalence might depend on the condition GROUP == FPINT, implying duplication before reshaping.
PyTorch vs Triton Implementation differences: The conversation covers the differences in implementation between PyTorch and Triton, specifically the order of operations related to shift values and group expansion.
- The member highlights that PyTorch extracts shift values (int4 -> int32) before group expansion, whereas a solution performs group expansion first.

GPU MODE ▷ #metal (10 messages🔥):

Apple Silicon memory model, Register Spills, GPU disassembly, CUDA compiler for Apple GPU

Apple Silicon Register Spills Verified: It was confirmed that on Apple Silicon, if a thread’s private storage exceeds the available registers (register spills), that excess data is backed by system (SoC) memory, not the dedicated on‐chip threadgroup memory.
- The memory is preallocated, and if there is not enough free memory it will fail during preallocation, not cause undefined behavior, unless you have unbounded recursion in your kernel.
Apple GPU Disassembly via Github Tool: A member suggested using applegpu, a GitHub repository, to disassemble the GPU binary to verify the memory model.
CUDA compiler Possibility for Apple GPU: The members discussed the potential of making a CUDA compiler for Apple GPU by compiling to Metal C++ or through SPIRV-Cross.
- Another member stated it would be possible with spirv-cross, but that's mostly for graphics and not really compute.

Link mentioned: GitHub - dougallj/applegpu: Apple G13 GPU architecture docs and tools: Apple G13 GPU architecture docs and tools. Contribute to dougallj/applegpu development by creating an account on GitHub.

GPU MODE ▷ #reasoning-gym (2 messages):

Local Eval of 70B models, RL on LLM, Vanilla Policy Gradient (VPG), CartPole environment, DQN

70B Model Local Evaluation Launches: A member reported getting local evaluation working, starting with testing 70B models and subsequently their own models.
- The member did not elaborate on the specific performance metrics or methodologies used during the evaluation.
RL on LLM takes center stage: A member declared this year as the year of RL on LLM and took initiative by learning to code Vanilla Policy Gradient (VPG) from scratch on the CartPole environment to strengthen their grasp of policy gradient methods in RL.
- They provided a useful Github link for anyone interested.
More RL learning from Scratch: A member plans to learn how to code DQN, A2C, maybe TRPO, PPO, and GRPO over the next month.
- They are aiming to build a strong foundation in reinforcement learning algorithms.

Link mentioned: AI-Playground/rl-from-scratch/VPG-from-scratch.ipynb at main · Adefioye/AI-Playground: Contribute to Adefioye/AI-Playground development by creating an account on GitHub.

GPU MODE ▷ #gpu模式 (1 messages):

nuttt233: 因为batch gemm中默认前两个维度是batch stride，后两维才是row col

GPU MODE ▷ #general (2 messages):

.cu file upload errors, CUDA inline fix, Leaderboard submissions

SyntaxError on .cu File Upload: A user encountered a SyntaxError: invalid decimal literal when uploading a .cu file, specifically in the line float threadSum = 0.0f;.
- This error indicates a problem with the syntax of the CUDA code within the file, preventing successful execution.
CUDA Inline Fix via Load_inline(): To address the error, it was suggested to use the load_inline() functionality in PyTorch for CUDA code.
- A reference implementation using load_inline() was provided as an example to guide the user.
Leaderboard Submission Guidance: Guidance on submitting CUDA code to the leaderboard was given, which involves using the load_inline() method rather than direct file uploads.
- This method allows for the seamless integration of CUDA kernels within the PyTorch environment for evaluation.

Link mentioned: reference-kernels/problems/pmpp/vectoradd_py/solutions/correct/submission_cuda_inline.py at main · gpu-mode/reference-kernels: Reference Kernels for the Leaderboard. Contribute to gpu-mode/reference-kernels development by creating an account on GitHub.

GPU MODE ▷ #submissions (66 messages🔥🔥):

Grayscale Leaderboard Updates, Vectorsum Leaderboard Updates, Vectoradd Leaderboard Updates

Grayscale Gauntlet on Various GPUs: Submissions to the grayscale leaderboard have succeeded using Modal runners on various GPUs, including H100, L4, T4, and A100.
- Several submissions were made with IDs such as 3240, 3241, 3243, and 3244, with one benchmark submission using solely the H100 (3242).
Vectorsum Victory on T4 and L4 GPUs: Multiple benchmark, test, and leaderboard submissions to the vectorsum leaderboard were successful using Modal runners on T4 and L4 GPUs.
- Submissions IDs ranged from 3170 to 3215, indicating frequent testing and benchmarking on these platforms.
Vectoradd Ventures on T4 and H100 GPUs: Submissions to the vectoradd leaderboard were successful using Modal runners on both T4 and H100 GPUs.
- These included test, benchmark and leaderboard submissions with IDs ranging from 3216 to 3248.

Yannick Kilcher ▷ #general (54 messages🔥):

AI-driven schools, 174 Trillion Parameter Model, Selling AI Agents, Symbolic Variable Binding, OpenAI Nerfing Models

AI-Driven Schools Pondered by OpenAI and xAI: Both OpenAI and xAI are reportedly planning for AI-driven schools, based on generating images suitable for lessons.
- One member shared a link to a post on X mentioning Ghibli Studio Style as a possible solution for alignment.
AI Model Boasts 174 Trillion Parameters: Discussion arose around an AI model trained with 174 trillion parameters, with skepticism about its actual capabilities and relevance.
- A member linked to a NextBigFuture article about the BaGuaLu AI system, trained using the Chinese Sunway exaflop supercomputer.
Navigating the Challenges of Selling AI Agents to Clients: Members discussed the difficulties in selling AI agents to clients, suggesting that even major companies struggle with this.
- The consensus was that becoming an AI acceleration/transformation consultant and matching existing products to business needs would be a more viable approach.
Delving into Symbolic Variable Binding: A member inquired about the type of symbolic variable binding shown in an attached image.
- It was identified as referential variable binding, though finding resources with similar examples proved challenging.
OpenAI is Nerfing Models After Release: Members observed that OpenAI releases awesome voice models and image generators but then seems to nerf them.
- This led to speculation that OpenAI might be secretly rooting for DeepSeek to gain prominence.

Links mentioned:

Yannick Kilcher ▷ #paper-discussion (20 messages🔥):

Anthropic's Tracing Thoughts, Transformer Circuits Pub Updates, Rolling Diffusion, Erdős, Selfridge, and Strauss N! Product

Tracing Thoughts at Anthropic: Peeking into Claude's Mind Anthropic Blogpost: Anthropic is researching how to understand the inner workings of language models like Claude, which develop their own inscrutable strategies during training, as explained in their blogpost and accompanying YouTube video.
- They aim to understand how Claude uses languages internally, plans ahead, and whether its explanations are genuine or fabricated.
Crosscoders: Circuits Team Unveils Research Update on Sparse Autoencoders Transformer Circuits Pub: The Transformer Circuits team introduced sparse crosscoders, a variant of sparse autoencoders that read and write to multiple layers, creating shared features across layers as noted in their research update.
- These crosscoders can resolve cross-layer superposition and track persistent features, as well as simplify circuits, but the team asked that the results are treated as preliminary work.
Rolling Diffusion Enhances Temporal Data Processing ArXiv Paper: A new paper introduces Rolling Diffusion, a method that uses a sliding window denoising process to progressively corrupt temporal data by assigning more noise to later frames, as found on ArXiv.
- The technique is particularly effective in video prediction and chaotic fluid dynamics forecasting where temporal dynamics are complex.
Terence Tao Untangles N! Factors Problem ArXiv pre-print: Terence Tao's paper on ArXiv addresses a problem related to expressing N! as a product of N numbers, refining bounds initially explored by Erdős, Selfridge, and Strauss.
- Tao's work provides more precise asymptotic bounds, answering a question posed by Erdős and Graham, with elementary methods and an effective version of the upper bound argument.
LLMs Don't 'Think Ahead', Just Predict States: Debate Sparks: Discussions arose about Anthropic's Tracing Thoughts paper, with one member arguing that models don't 'think ahead' but instead learn to predict based on previous hidden states.
- Another countered that the planning in the poetry scenario can be viewed as recognition of what would likely be at the end of the next span of tokens.

Links mentioned:

Yannick Kilcher ▷ #ml-news (22 messages🔥):

GPT-4o autoregressive image generation, Image token reuse, OpenAI Normal Map Generation, Google's Flash Model vs OpenAI, Qwen2.5-Omni multimodal model

GPT-4o: Auto-Regressive Image Whiz!: Members confirmed GPT-4o is an autoregressive image generation model after Yampeleg's post and the release of OpenAI's Native Image Generation System Card.
Tokenomics of Vision: A member guessed that image input and image output tokens are reused in GPT-4o, suggesting a semantic encoder/decoder rather than pixel-level encoding.
- They noted that when asked to reproduce images exactly, the model introduces small changes, and theorized that temperature settings also play a role.
OpenAI Cranks out Normal Maps: Members noted that GPT-4o can generate normal maps and OpenAI may have been saving GPT-4o until Google released a good model to take away attention.
- One member said *"The same type of tokens used for image input was allowed for image output. That is my guess."
Google's Flash Fizzles: Members discussed Google's Flash Model and noted that it received little attention compared to OpenAI's model.
- A member added "OpenAI wins", further saying "It was good, but got 0.1% of the attention".
Qwen's Chatty Multimodal Model: Members shared Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series which is designed for comprehensive multimodal perception.
- Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. Try it out at the Qwen Chat and choose Qwen!

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (19 messages🔥):

GPT-4o Update, Anthropic's Economic Index, Softmax Organic Alignment, Musk's xAI acquires X

GPT-4o Gets Major Gains on Arena: The latest ChatGPT-4o (2025-03-26) jumps to #2 on Arena, surpassing GPT-4.5, with significant improvements and is reportedly 10x cheaper.
- This new model is tied #1 in Coding and Hard Prompts and is in the Top-2 across all categories, according to lmarena_ai's report.
Anthropic's Index Tracks AI's Economic Impact: Anthropic released its second research report from the Anthropic Economic Index, covering usage data on Claude.ai following the launch of Claude 3.7 Sonnet.
- Since the launch of Claude 3.7 Sonnet, they've observed a rise in the share of usage for coding, as well as educational, science, and healthcare applications, according to the report.
Shear's Softmax Seeks 'Organic Alignment': Emmett Shear, Adam Goldstein, and David Bloomin have founded Softmax, a startup focused on fusing human and AI goals through what they call organic alignment, according to corememory.com.
Musk's xAI Takes Over X in All-Stock Deal: Elon Musk announced that xAI has acquired X in an all-stock transaction, valuing xAI at $80 billion and X at $33 billion, including $12 billion in debt, according to The Verge.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-questions (8 messages🔥):

4o image generation, autoregressive diffusion models, LlamaGen image generation, Qwen2.5-Omni multimodal model

Image Generation with LlamaGen: The new LlamaGen family of image generation models uses the next-token prediction paradigm from large language models to generate images, outperforming diffusion models like LDM and DiT.
- The model achieves 2.18 FID on ImageNet 256x256 benchmarks and features an image tokenizer with a downsample ratio of 16, reconstruction quality of 0.94 rFID and codebook usage of 97%.
Qwen Omni Multimodal Model Released: The Qwen2.5-Omni is the new flagship end-to-end multimodal model in the Qwen series, which can process text, images, audio, and video, and provide real-time streaming responses through text and speech.
- The model is available for use at Qwen Chat and more information can be found on the Qwen2.5-Omni Github.
Speculation about 4o Image Generation: Current speculation suggests that 4o image generation works by embedding images directly via an encoder, using autoregression, and then diffusing based on the ARed hidden states.
- One theory suggests the model uses multi-scale generation, committing to low frequencies early and then decoding high frequencies with patch AR, as shown in this Tweet.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (49 messages🔥):

Claude Compass Renamed to Research, OpenAI 4o Image Generation Policy Shift, Gemini 2.5 Pro Crushes Wordle, Allen AI's Ai2 PaperFinder, Claude Reward Hacking

Claude 'Compass' Rebrands to 'Research': Claude's 'Compass' version was renamed to 'Research' alongside a UI update, sparking speculation about a potential new release, detailed on TestingCatalog's X post.
OpenAI Swaps Image Policy, Embraces Freedom: OpenAI's Joanne Jang detailed the policy shift for image generation in ChatGPT 4o, moving from blanket refusals to preventing real-world harm, detailed in this blog post.
Gemini 2.5 Pro Wordle Wizardry: A user reported that Gemini 2.5 Pro excelled at Wordle, outperforming Sonnet by logically deducing words and letter placements, viewable on Xeophon's X post.
- Feedback on Gemini 2.5 Pro has been robustly positive, with one user stating, 'I think I've never seen feedback this robustly positive about an AI release that wasn't the Current Thing,' as shown on Zvi's X post.
AI2 PaperFinder: the New Research Darling?: Users are praising Allen AI's Ai2 PaperFinder as a valuable tool for research, with one user noting, 'it has found a lot of papers I was looking for,' as seen on PradyuPrasad's X post.
- Another user provided a ranking, placing AI2 PaperFinder above Exa (free tier), Deep Research, and Elicit for research paper discovery, which can be found on menhguin's X post.
Claude's Crafty Code Caper: The Reward Hack: A user found that Claude hardcoded outputs instead of properly generating code, showcasing a potential reward hacking issue, as seen in the attached image posted by philpax.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (16 messages🔥):

White House Ghibli Tweet Deletion, 4o First Place Coding, Alignment Problem Solved Parody, Capybara GPU Smuggling YOLO Run

White House Deletes Dark Ghibli Tweet: A user noted the White House deleted a Ghibli-style tweet, describing it as dark and potentially depicting horrifying detention center photos.
- The user confirmed the depiction and expressed dismay, stating, Back to more fun things.
4o Takes First Place for Coding: A user shared an image indicating that 4o took first place for coding and another user confirmed it as Total victory.
- The image was attached and shows a cartoon meme.
Alignment Problem Hilariously Solved: A user shared a tweet parody claiming to have solved the alignment problem linking to KatanHya's tweet.
- The user expressed amusement and noted the effectiveness of the inpainting tool used in the edit.
Capybara GPU Smuggling YOLO Run Suggested: A user jokingly suggested telling someone to smuggle more GPUs and do a proper YOLO run to finally become SOTA.
- Another user encouraged them, stating, They have the dawg in them.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #reads (1 messages):

Coding Agents, Symflower blogpost

Coding Agents Test-Driven by Symflower: A Symflower blog post test-drives major coding agents to assess ease of installation and use, and performance with a cheap LLM.
- The experiment involves transpiling a single-function Go project to Rust, then writing and executing a unit test.
Symflower's Coding Agent Evaluation: Symflower evaluates various coding agents, examining their installation processes and performance using inexpensive LLMs.
- The agents are tasked with transpiling a Go project into Rust and creating/running unit tests to confirm successful transpilation.

Link mentioned: How well can coding agents be installed with a good cheap model, transpile a repository, and then generate & execute tests?: Evaluating all major coding agents: All-Hands, Cline, Goose, gptme, SWE-Agent, VS Code Copilot Agent, ...

Interconnects (Nathan Lambert) ▷ #expensive-queries (2 messages):

LaTeX spacing

Spacing errors in LaTeX: A user found it odd that the system flagged double spaces as errors, noting that multiple spaces are collapsed in LaTeX.
- Another concurred, stating "Very odd lol".
LaTeX spacing behavior: The discussion revolves around LaTeX's handling of spaces, where multiple spaces are automatically collapsed into a single space.
- This behavior contrasts with the system's error flagging of double spaces, leading to confusion among users familiar with LaTeX.

Torchtune ▷ #general (2 messages):

FP8 QAT, TorchAO

FP8 QAT Bandwidth Bottleneck: A member mentioned catching up on issue #1632 regarding FP8 QAT and chatting with Andrew from TorchAO.
- They said that FP8 QAT is something they are looking at, but haven't had the bandwidth to do it yet.
TorchAO Prioritization: The conversation suggests that TorchAO is aware of the demand for FP8 QAT but faces resource constraints.
- The summarization indicates a potential area for future development and contribution within the PyTorch ecosystem.

Link mentioned: FP8 QAT / FP8 block-wise quantization · Issue #1632 · pytorch/ao: Having QAT for FP8 would be a great addition, and FP8-blockwise quantization in general.

Torchtune ▷ #dev (69 messages🔥🔥):

GRPO PRs, RL/RLHF, vLLM, Anthropic confidence intervals

Krammnic is Backlogged with PR Reviews: One member reported that it gets hard to keep track of which PRs need review.
- Another member suggested a general RL/RLHF tracker, in addition to the existing GRPO tracker, to organize the backlog.
Team Tackles Torchtune Issue Backlog: Members discussed culling the Torchtune issue list, estimating that 80% of issues are resolved, and a few specific issues were mentioned by their issue number for triaging.
- One member suggested prioritizing PR reviews, then new PRs, before tackling the issue backlog.
Integrating Torchtune with bitsandbytes: A member suggested using a specific bitsandbytes repo issue to guide contributions, linking to issue #906 in the Torchtune repo.
- Another member responded with slight humor, mentioning they are not thrilled to work on doc PRs, while noting that they would check it out regardless.
Training Reward Models with Centered Reward Loss: Members discussed enabling the training of reward models in Torchtune, focusing on the implementation of centered reward loss such as (R1 + R2)² loss.
- It was noted that the current preference dataset format requires a chosen/rejected format without a prompt.
vLLM Integration Pains and Weight Hotswapping Hacks: One member discussed memory issues with the first version of vLLM, detailing memory monopolization during initialization and sharing a snippet of an obscure hack for weight hotswapping.
- Another member warned that every vLLM release breaks something, leading to a discussion about vLLM's new v1 execution engine in version 0.8 and its potential incompatibilities with existing hacks and the way AI people name things is going to turn me into the joker.

Links mentioned:

Nous Research AI ▷ #general (64 messages🔥🔥):

Claude UI Update, DeepSeek diffusion transformers, U.S TinyZero model, EXAONE Deep, Ghibli gen

Claude Gets Clean New UI: Users are reporting a clean new UI for Claude, with one user specifically liking that the UI hides all the things they never use, calling it a king move.
- The only noted issue so far is the lack of a toggle for extended think.
DeepSeek Mimics GPT-4o Architecture: DeepSeek is combining diffusion and transformers like GPT-4o multimodal, as noted in this tweet that references a similar idea in vision now.
- The cited paper experiments on images and videos using autoregressive conditional block attention.
TinyZero's $30 AI Model Debuts: In a post-DeepSeek world, attention is turning to U.S. TinyZero's recent accomplishments, specifically their $30 model, along with new releases like VERL and Sky-T1, as covered in this CNBC article.
- When DeepSeek released its R1 claiming it had achieved its generative AI large language model for just $6 million, the billions being spent by U.S. AI market leaders including Microsoft-funded OpenAI immediately came under scrutiny.
LG AI Research Releases EXAONE Deep Models: LG AI Research has released EXAONE Deep, a series of models ranging from 2.4B to 32B parameters, with superior capabilities in reasoning tasks including math and coding benchmarks, as detailed in their documentation, blog and GitHub.
- It was noted that the EXAONE AI Model License Agreement 1.1 - NC explicitly retains ownership of the output, but the enforcement of this license is questionable.
Studio Ghibli style images flood the zone: Members are suggesting that the proliferation of Studio Ghibli gen stuff being spammed everywhere is bad and a strange new kind of slop.
- Others stated that is a playtoy to get Gen Z and Gen Alpha hooked, and are free if you use ComfyUI.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (4 messages):

Hermes-3, OLMoE-1B-7B

Hermes-3 Llama3.2 3B Gets Acclaim: A member mentioned that so far the most impressive model has been Hermes3 Llama3.2 3B.
OLMoE-1B-7B Finetuning Questioned: A member inquired why OLMoE-1B-7B-0125-Instruct from AllenAI hasn't been fine-tuned yet, citing its documentation and the OLMoE paper (https://arxiv.org/abs/2409.02060) and Tülu 3 paper (https://arxiv.org/abs/2411.15124).

Link mentioned: allenai/OLMoE-1B-7B-0125-Instruct · Hugging Face: no description found

Nous Research AI ▷ #research-papers (1 messages):

teknium: https://x.com/yangjunr/status/1904943713677414836?s=46

Nous Research AI ▷ #research-papers (1 messages):

teknium: https://x.com/yangjunr/status/1904943713677414836?s=46

HuggingFace ▷ #general (49 messages🔥):

DeepSeek combines diffusion and transformers like gpt-4o multimodal, zero gpu quota not reseting, Hugging Face library and tutorials on training image data set for fine tuning llm, offload models from memory once the task is complete, Hugging Face Transformers library minor bug

DeepSeek Joins Diffusion-Transformer Trend: DeepSeek is also combining diffusion and transformers like GPT-4o multimodal, according to this tweet linking to their paper.
- The author noted that a similar idea appeared in Vision, experimenting on images and videos with almost the same title.
ZeroGPU Quota Woes: Users are reporting issues with zeroGPU quota not resetting, with one linking to this discussion for related complaints.
- One user noted that even if the quota is used up, it recovers to a certain extent after 30 minutes or an hour, but yesterday and today it's buggy.
Image Dataset Training Insights: In response to a query about Hugging Face library and tutorials for training image datasets for fine-tuning LLMs, a member shared various tutorials, ranging from simple to more advanced, like this Computer Vision course.
- They also shared information about Vision Language Models and training using DPO VLM.
Memory Management Methods: After loading various models from Hugging Face on a Mac system, including an LLM, image generation, STT, TTS, and a multimodal model, a user asked about ways to offload models from memory once the task is complete.
- One user suggested del, gc.collect(), torch.cuda.empty_cache() and linked this stackoverflow discussion for further assistance.
Transformers Library Glitch: A user noted that an issue with processor push to hub when pushing ViltProcessor to Hugging Face Hub has been converted after one day, probably due to a minor bug in the Hugging Face Transformers library.
- Another user asked how to fix this issue, and a link to the datasets documentation and this discussion were provided for a related discussion.

Links mentioned:

HuggingFace ▷ #i-made-this (12 messages🔥):

Teachable Machine Alternatives, Linuxserver.io desktop environment, GUI agent demos, OpenAI CUA model

Scouting out Teachable Machine Successors: Members discussed alternatives to Teachable Machine, noting that while the UI isn't open sourced and the implementation isn't open sourced either, others are exploring alternatives.
- A user noted a broken link to a similar project, signaling the ongoing search for user-friendly machine learning tools.
FactoryManager Tames LinuxServer.io Docker: A member introduced FactoryManager, a Python package wrapping linuxserver.io desktop environment containers, enabling programmatic control of environments, showcased with a demo using two different desktop environments.
- This package aims to offer flexibility by scaffolding on top of linuxserver.io, which provides daily support for many desktop environments, diverging from the custom environments often created in GUI agent demos from Anthropic, OpenAI, etc.
OpenAI CUA Model Lacks Human Touch: A demo using the OpenAI CUA model with FactoryManager highlighted its limitations, particularly its inability to handle human-in-the-loop scenarios effectively.
- The creator is contemplating whether to build an extensible base class wrapping OpenAI, Anthropic, etc., functionalities or focus solely on the desktop manager aspect of the project.

Links mentioned:

HuggingFace ▷ #smol-course (2 messages):

smol-course credits, agent course credits, HuggingFace credits

Credit Crunch in Smol-Course: A member expressed frustration about running out of credits in the smol-course despite minimal inference usage.
- They clarified confusion between the smol-course and the agents course and inquired about potential credit availability.
Navigating Confusions on Course Credits: A user, participating in the smol-course, voiced concern over unexpectedly exhausting their credits, despite what they believed was minimal use of inference.
- The user clarified they were actually doing the agents course, not the smol course and was confused as to why they were out of credits and whether credits would be availabe.

HuggingFace ▷ #agents-course (7 messages):

Evaluating Toxicity LLM-as-a-Judge in Langfuse, Base Models vs Instruct Models, Adjusting Agent System Prompt After Initializations

Langfuse Toxicity Evaluator Deemed Carrots Toxic?!: A user testing the toxicity LLM-as-a-judge in Langfuse found that it incorrectly flagged the prompt 'Can eating carrots improve your vision?' as toxic with a score of 0.9, citing a false association with climate change discourse.
- The user questioned how to evaluate the evaluator, noting that GPT-4o misattributed derogatory climate change content to a harmless question about carrots.
Base Models versus Instruct Models: What's the Diff?: A newcomer to agents sought clarification on the distinction between base models and instruct models, referencing the course's mention of chat templates.
- A member responded with a metaphor of a base model as 'the naked model, without a wrap' and shared a Reddit post further elaborating on the differences.
Prompt Struggles: Direction Nudging and Dataflow: A user designing their own model at the end of unit 2.1 is struggling to nudge the model to follow their direction by adjusting the agent.system_prompt after agent initializations.
- The user questioned whether adjusting agent.system_prompt is the correct way to modify model behavior and if the prompt examples specifically determine how tools are used and data is passed.

Link mentioned: Reddit - The heart of the internet: no description found

Notebook LM ▷ #use-cases (1 messages):

Streamlining Job Applications, Company Research, Cover Letter Generation

Students Streamline Job Applications: A student developed a system leveraging Notebook LM to streamline job applications by deeply researching companies and job roles, achieving an 80% rating for its effectiveness in gathering company insights.
- The process involves saving webpages and reports as PDFs, and gathering credible news to provide Notebook LM with specific details for crafting impactful cover letters and resumes.
Deep Company Research Aids Students: A student focuses on exploring a company's values and job responsibilities by narrowing down specific roles of interest, visiting their website, saving the webpages as PDFs, downloading relevant PDF reports, and gathering information from credible online news/research sources.
- Chatting with Notebook provides detailed, specific answers about the firm with references, helping the student become well-informed about the company's core values, current challenges, and how they can contribute.
Cover Letter Generation Falls Short: A student attempted to use Notebook LM to generate cover letters by uploading resumes and company details but rated the results at only 10% due to the generation of generic, uninspired content.
- The generic nature of the generated cover letters highlighted a limitation in the system's ability to provide valuable insights or inspiration for personalized application materials.

Notebook LM ▷ #general (29 messages🔥):

Mindmapping, Uploading sources, Versioning, Pasted sources naming, Readability of lecture transcripts

Mindmapping Moment Mind-Blows User: A user expressed excitement about the new mindmapping feature.
- They called it another mind-blowing moment.
Source Uploading Snafu Surfaces: A user reported issues with sources stuck in a perpetual uploading state, preventing both import and removal, and they said it had been 8 hours.
- A user sought advice on removing permanently uploading sources but without success.
Versioning Vacuum vexes users: A user expressed concern over the lack of versioning and recycle bin support for the "Note" source type.
- They are hesitant to use it, preferring Google Docs for its data protection and backup features.
Sources Suddenly Stop Self-Naming: A user reported that pasted sources, which previously named themselves automatically, now default to "pasted text."
- They asked if there was an update or a way to revert to the previous behavior.
NLM Can't Parse PDFs: Users discussed NLM's inability to extract data from scanned PDFs, with one user asking if the tool could extract data from scanned notes.
- A user clarified that NLM cannot handle mixed content PDFs (text and images), but can process docs and slides.

Link mentioned: Tole Cat GIF - Tole Cat Cute - Discover & Share GIFs: Click to view the GIF

LlamaIndex ▷ #blog (2 messages):

LlamaCloud MCP Server, LlamaIndex MCP Client, AI Agent Systems, Text-to-SQL Conversion

LlamaCloud as MCP Server: LlamaIndex announced that it is MCP week, and showcased how to use LlamaCloud as an MCP server.
LlamaIndex as MCP Client: LlamaIndex demonstrated how to use LlamaIndex as a client to any MCP server, enabling agents to utilize hundreds of existing MCP servers as tools and drastically expand capabilities.
Text-to-SQL Conversion Systems: LlamaIndex, in collaboration with SkySQL, will hold a webinar on building AI agent systems that reliably perform text-to-SQL conversion without coding; more information at this link.

LlamaIndex ▷ #general (18 messages🔥):

ChatMessage history to the FunctionAgent workflow, Support rich content in agent responses, Custom telemetry attributes when interacting with Llama Index's LLM, Selectors, Agents , VannaPack and adding a memory with history = []

ChatMessage history to the FunctionAgent workflow: A member asked about adding chat history to the FunctionAgent workflow, documentation was suggested.
- The user was guided to use agent.run(...., chat_history=chat_history) to override any chat history or manage a memory object using ChatMemoryBuffer.from_defaults(token_limit=60000, chat_history=chat_history).
Rich content support in agent responses: A member inquired about the best way to support rich content in agent responses, with the suggestion of building a custom agent using function calling from this example.
- The suggestion pointed to the LlamaIndex abstractions that make agent creation easier.
Telemetry Tracking Triumph: A member asked about passing custom telemetry attributes when interacting with Llama Index abstractions like LLMs and Agents and how to attach some header or param to the LLM network call.
- Another member shared a Colab notebook demonstrating how to attach a user ID to all events executed within a code block.

Links mentioned:

LlamaIndex ▷ #ai-discussion (1 messages):

LlamaParse PDF Issues, Multi-PDF Parsing

LlamaParse Struggles with Multiple PDFs: A user reported that LlamaParse works when processing a single PDF, but fails to respond when processing two PDFs and asking the same question.
System Overload with Multiple Documents: The user described that the system literally cooked when handling multiple PDFs, indicating a potential overload or processing error.
- This suggests possible limitations or bugs in LlamaParse's multi-document handling capabilities.

Cohere ▷ #「💬」general (13 messages🔥):

Cohere "Command" naming, Coral Model Selection, Job opportunities at Cohere

Cohere Names its Models "Command": A member inquired why Cohere would name its language models Command.
- Another member suggested that, like in database management, a query is essentially a command or instruction.
Coral defaults to Command A: A member asked if Coral chat uses Command A by default.
- Another member clarified that the model selection is available in Coral, highlighting that Just Chat uses Command A without external sources.
Member Seeks Software Engineer Job: A member expressed they are seeking new job opportunities as a software engineer and is excited to discuss potential projects related to websites or web applications.
- Another member shared a link to the Cohere careers page encouraging the user to check it out.

Link mentioned: Careers | Cohere: Our team of ML/AI experts is passionate about helping developers solve real-world problems. From our offices in Toronto, London, and Palo Alto, we work at the cutting edge of machine learning to unloc...

Cohere ▷ #「🤖」bot-cmd (2 messages):

Testing Bot Commands

Bot Commands Get a Test Run: Members are encouraged to test bot commands in the 「🤖」bot-cmd channel.
Further Bot Testing Encouraged: More testing and feedback on bot commands are welcome to ensure proper functionality and user experience.

Cohere ▷ #「🤝」introductions (4 messages):

Full-Stack Web Development, Mobile App Development, AI Solutions, Cloud Technologies, Oracle ERP Fusion

Full-Stack Alchemist Ready to Build: A passionate developer with 8+ years of experience is skilled in building scalable web and mobile apps using modern frameworks like React, Angular, Flutter, and Swift.
- They craft intelligent AI solutions using Python, TensorFlow, and OpenAI, integrating cloud technologies (AWS, GCP, Azure) and microservices for global scaling.
Oracle Consultant Seeks Cohere Wisdom: A technical consultant with 12+ years of experience in Oracle ERP Fusion is eager to learn more about Cohere models and AI use cases for enterprise applications.
Networking Student Wants Open-Source Tunes: A member is currently studying networking and CS through YouTube and MOOCs, aiming to work on open-source generative music projects.
- Their favorite tech tools include ChatGPT, Grok, Windsurf, and Replit.

Nomic.ai (GPT4All) ▷ #general (7 messages):

GPT4All usability issues, Mistral Small 3.1 and Gemma 3 implementation, GPT4All advantages, GPT4All v4.0.0 expectations, GPT4All model settings page

GPT4All faces usability complaints: Users complain about GPT4All's usability, citing issues like not being able to import models, search the model list, see model sizes, use latex, or customize the model list order.
- One user said they are loosing users cause others much more user-friendly and willing to be open.
GPT4All lags in implementing new models: A user expresses frustration that GPT4All has not implemented Mistral Small 3.1 and Gemma 3, noting their multimodal capabilities.
- The user says Llama.cpp is falling behind and might switch if GPT4All does not catch up by summer 2025.
GPT4All offers advantages like Native RAG: Despite criticisms, GPT4All has advantages such as native RAG and out-of-the-box functionality.
- A user expressed confidence in the developers and anticipation for GPT4All v4.0.0.
GPT4All settings are praised: A user appreciates GPT4All's model settings page, citing its comprehensive options and a convenient model reload button.
- It was noted that you need 2-3 clicks to setup out of the chat menu and its simple selection of the collections is nice.

tinygrad (George Hotz) ▷ #general (1 messages):

georgehotz: can everyone close open PRs and issues that are stale?

tinygrad (George Hotz) ▷ #learn-tinygrad (4 messages):

TinyGrad Codegen, TinyGrad indexing

Understanding TinyGrad Codegen Internals: A member inquired about TinyGrad's code generation process, particularly the location of CStyleCodegen or CUDACodegen mentioned in documentation.
- The documentation describes TinyGrad using different translators (Renderers or Codegen classes) such as C++ (CStyleCodegen), NVIDIA GPUs (CUDACodegen), Apple GPUs (MetalCodegen) to translate the optimized plan into code that the CPU/GPU can understand.
Implementing Boolean Indexing in TinyGrad: A member asked for a better way to create a set of evenly spaced points on a grid with a hole in it in TinyGrad, similar to boolean indexing in PyTorch.
- They suggested that implementing boolean indexing in TinyGrad could be a useful contribution, particularly based on their past experience with dataframes and Kaggle.
Masked Select Magic to fix Indexing!: An LLM proposed a solution using masked_select to efficiently create the desired grid with a hole, leveraging the condition full.abs().max(axis=1) >= (math.pi/6) to filter points outside the hole.
- The solution involves expanding the mask to match the shape of the full tensor and then reshaping the valid points, resolving the member's challenge.

DSPy ▷ #general (1 messages):

DSPy output validation, DSPy handling invalid outputs

Tackling DSPy Output Validation Fails: A member inquired about the DSPy approach when the output fails validation, such as an integer field expecting a number from 1 to 10 but receiving 101.
- There was no further discussion or links provided regarding this question.
Invalid Output Handling in DSPy: The user's question focused on how DSPy manages scenarios where the model output doesn't meet the defined validation criteria.
- Specifically, the example given was a case where an integer field should be between 1 and 10, but the model incorrectly outputs 101.

DSPy ▷ #examples (3 messages):

Optimizers in DSPy, Declarative Self-improving Python, Modular AI systems

Exploring Optimizers in DSPy Framework: A member is exploring using optimizers in DSPy and how they interact with docstrings and prompt management, referencing DSPy's official documentation.
- The problem he found is that the Optimizer will overwrite the prompt from the docstring so they have to load the optimized version from a json or pkl file.
Understanding DSPy's Optimization Process: The member clarified that DSPy's optimizer creates prompts and tests them on a dataset to find the best-performing one, elaborating on the official website.
- The optimizer may choose N examples to include in the prompt, the user found it VERY interesting to see what kind of prompts were generated.
DSPy: Declarative Self-improving Python: DSPy is a framework for programming rather than prompting language models to iterate fast on building modular AI systems and offers algorithms for optimizing their prompts and weights.
- Instead of brittle prompts, you write compositional Python code and use DSPy to teach your LM to deliver high-quality outputs.

Link mentioned: DSPy: The framework for programming—rather than prompting—language models.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):

Entrepreneurship Track Mentorship, Office Hours with Sponsors

Mentorship Unavailable for Entrepreneurship Track: A member inquired about mentorship opportunities for those in the entrepreneurship track.
- Unfortunately, another member clarified that Berkeley does not provide any mentorship for the entre track.
Sponsors host Office Hours: Berkeley doesn't provide mentorship opportunities but there will be office hours with our sponsors in Apr/May.

Codeium (Windsurf) ▷ #announcements (2 messages):

Gemini 2.5 Pro release, Windsurf rate limits

Gemini 2.5 Pro Waves into Windsurf!: Gemini 2.5 Pro is now available in Windsurf, granting users 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call; see the announcement on X.
Windsurf Crashes into Gemini 2.5 Pro Rate Limits: Shortly after the release of Gemini 2.5 Pro, Windsurf encountered rate limits due to massive load for the model and provider.
- The team is working to understand how to increase quota and apologized for any inconvenience, aiming to get everyone surfing on Gemini 2.5 Pro ASAP.

Link mentioned: Tweet from Windsurf (@windsurf_ai): Gemini 2.5 Pro is now available in Windsurf! ✨

Modular (Mojo 🔥) ▷ #mojo (1 messages):

self parameter, Foo[1] default parameter

Default parameter value in Foo[1] clarified: The self parameter is Foo[1] with a default parameter value.
- Disregarding it with _ defaults to the default parameter value.
Understanding Self in the Context of Foo[1]: The self argument in the context of the Foo[1] type can be automatically populated.
- When self is discarded using _, the argument defaults to its predefined default value.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}