AI News for 3/26/2025-3/27/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7972 messages) for you. Estimated reading time saved (at 200wpm): 757 minutes. You can now tag @smol_ai for AINews discussions!

There's a new 4o model in ChatGPT, but there's no blogpost and not much detail beyond the announcement tweet so there's not much to report. However you can see that the time between SOTA models has been shortening recently.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

GPT-4o and Multimodal Models

OpenAI's GPT-4o has seen significant updates, enhancing its ability to follow detailed instructions, tackle complex technical and coding problems, and improve intuition and creativity while reducing emojis 🙃, according to @OpenAI. Furthermore, the updated chatgpt-4o-latest is now available in the API, with plans to bring these improvements to a dated model in the API in the coming weeks, as announced by @OpenAIDevs.
GPT-4o's native image generation stands out for its instruction following capability, with @abacaj noting that nothing comes close to it. @iScienceLuvr highlighted the impressive compositioning, text generation, and overall flow of diagrams generated by GPT-4o, specifying that these elements were created without needing to be explicitly defined.
Concerns about image generation content filters have also been raised, with @nrehiew_ pointing out an instance where OpenAI allowed an image through its filters that seemed surprising.
Initial examples matter: @sama emphasizes the careful consideration put into the initial examples shown when introducing new technology.
Creative freedom and potential harms: @joannejang, who leads model behavior at OpenAI, shared her thoughts and nuance that went into setting policy for 4o image generation. She discusses OpenAI's shift from blanket refusals in sensitive areas to a more precise approach focused on preventing real-world harm, aiming to maximize creative freedom while preventing real harm, and embracing humility, recognizing how much they don't know, and positioning themselves to adapt as they learn.
Image generation with transparent backgrounds is a cool feature of GPT-4o that is super useful for creating all kinds of assets, according to @giffmana.
GPT-4o's performance improvements are clear over previous releases, with large model arena showing improvements in math, hard prompts and coding categories, according to @lmarena_ai.
Model quality perceptions: @abacaj finds that Google models are perpetually in preview or experimental mode, and by the time they're fully available, another model has surpassed them. They give you a glimpse of what’s possible and then someone goes and actually does it.

DeepSeek and Gemini

DeepSeek V3-0324 APIs are being tracked by @ArtificialAnlys across 10 APIs, including DeepSeek’s first-party API and offerings from Fireworks, DeepInfra, Hyperbolic, Nebius, CentML, Novita, Replicate and SambaNova. It's also now available on Hugging Face through @SambaNovaAI, with 250+ t/s - the fastest in the world. It smashes benchmarks like MMLU-Pro (81.2) & AIME (59.4), outperforming Gemini 2.0 Pro & Claude 3.7 Sonnet.
Gemini 2.5 Pro is recommended for coding, if you are currently using Claude, according to @_philschmid.
Gemini 3 can be deployed with just 3 lines of code to Google Cloud Vertex AI, due to the new Model Garden SDK, according to @_philschmid.
Gemma 3 Technical Report is now on arxiv, according to @_philschmid. It introduces a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. The version introduces vision understanding abilities, a wider coverage of languages and longer context – at least 128K tokens
A new function calling guide for GoogleDeepMind Gemini, using the new uSDKs, has been announced by @_philschmid, and includes multiple full examples for Python, JavaScript and REST.
TxGemma, built on GoogleDeepMind Gemma models, can understand and predict the properties of small molecules, chemicals, proteins and more. This could help scientists identify promising targets faster, predict clinical trial outcomes, and reduce overall costs, according to @GoogleDeepMind.

AI Safety and Interpretability

Anthropic's interpretability research was highlighted by @iScienceLuvr. The new interpretability methods allow them to trace the steps in their "thinking", according to @AnthropicAI.
Anthropic is recruiting researchers to work with them on AI interpretability, according to @AnthropicAI.
Anthropic Economic Index is releasing the second research report from the Index, and sharing several more datasets based on anonymized Claude usage data, according to @AnthropicAI.
AI Safety Fads: @DanHendrycks notes that every year or two a new LW/AF fad comes out (inner optimizers, ELK, Redwood's injury classifier, SAEs), which tend to be much more intense than those in academia due to LW/AF insularity and centralized funding.

AI Tools and Frameworks

Langchain now offers full E2E OTel support for applications built with LangChain or LangGraph, enabling unified observability, distributed tracing, and the ability to send traces to other observability tools, according to @LangChainAI.
LangGraph BigTool - LangChain shows it's reliable enough to work w/ local models (via @ollama) w/ > 50 tools.
LlamaCloud can be used as an MCP server, allowing users to bring up-to-the-second data into their workflow as a tool used by any MCP client, as demonstrated by @llama_index.
Cohere's Command A, a highly capable and efficient model that can be run on just 2 GPUs, is optimized for real-world agentic and multilingual tasks, according to @cohere.
Keras has a brand new homepage to celebrate the 10th anniversary of the original release, according to @fchollet.

Trends and Opinions

Generative AI and Studio Ghibli: Following the release of GPT-4o, there has been significant discussion around the use of Studio Ghibli-style generations, according to @iScienceLuvr and @aidan_mclau. @nearcyan discusses the post-reality-filter stage rolling out over the next few years, where one's reality is whatever they want it to be (Ghibli or pokemon or lotr or...), and as each human finds that which they truly desire, they then become demarcated into their own garden made of pure beauty and art (and for many, of lust) optimized just for them.
The future is ASI: @aidan_mclau says they cannot imagine building artificial fucking superintelligence to make money or stymie political opponents or arbitrage EA calculus.
Concerns about reliance on models: @nptacek is beginning to wonder if there is a builders/collators divide going on here, with some wanting some sort of neat, orderly information space while others are completely comfortable pushing the boundaries of what can exist in the first place.
The Meltdown of GPUs: @sama tweeted that their GPUs are melting, due to people loving images in chatgpt.

Humor/Memes

The Inverse of a Banger: @nearcyan defined the exact opposite of a banger when it comes to AI image generation.
Muppet Jensen: @iScienceLuvr posted a friendly reminder from muppet jensen.
AGI = All Ghibli Images?: @_akhaliq jokes that AGI means "All Ghibli Images?"
Gary Marcus's opinions: @cloneofsimo joked about world-renowned, thought-provoking, innovative value that Gary Marcus provides to the ML community.
DeepMind AI trying to make a banger: @sama CLAIDE tried to make a banger but instead said one man's slop is another man's slop.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek V3 0324 on Livebench Surpasses Claude 3.7 with Hallucination Issues

DeepSeek V3 0324 on livebench surpasses Claude 3.7 (Score: 148, Comments: 14): DeepSeek V3 (0324) has achieved significant performance on LiveBench, ranking 10th overall and outperforming Claude 3.7 Sonnet (base model), while being the second highest non-thinking model after GPT-4.5 Preview. This performance suggests that the upcoming R2 model could potentially be a strong competitor in the AI field.
- DeepSeek V3's Performance and Hallucination Issues: Users report that DeepSeek V3's hallucination rate increased from 4% to 8%, making it less reliable for certain tasks. Despite correct answers based on hallucinated prompts, users find it surprising and recommend running it at a low temperature of 0.3 to mitigate this issue.
- Comparison with Other Models: Gemini Pro 2.5 showed significant improvements in reasoning over its predecessor, raising questions about potential enhancements from V3.1 to R2. Anthropic and OpenAI face challenges with high API costs, but OpenAI's multi-modal capabilities, particularly in image generation, are noted as advantageous.
- LiveBench and Model Updates: There is curiosity about the removal of grok-3-beta from LiveBench. Fast R1 providers may take time to adopt V3, and users express hopes for improvement in upcoming updates, possibly by June.

Theme 2. Microsoft's KBLaM: Plug-and-Play Knowledge in LLMs

Microsoft develop a more efficient way to add knowledge into LLMs (Score: 426, Comments: 56): Microsoft has developed KBLaM, a new method designed to efficiently integrate knowledge into Large Language Models (LLMs). This approach aims to improve the performance and accuracy of LLMs by enhancing their knowledge base without significantly increasing computational requirements.
- KBLaM's Limitations: Users highlight that KBLaM is a research prototype and not production-ready, with limitations in providing accurate answers when used with unfamiliar knowledge bases. This suggests it's not an improvement over existing RAG systems, which are already in production.
- Technical Insights and Challenges: The implementation requires significant resources, such as an A100 80GB for testing an 8B model, indicating high computational demands. The approach involves language tokens attending to knowledge tokens but not vice versa, which raises questions about potential knowledge gaps, such as understanding concepts without practical application knowledge.
- Potential Applications and Research Directions: There's interest in whether extracting factual knowledge from training data could optimize parameter usage, potentially making models more intelligent or efficient. However, the consensus is that broad knowledge is essential for intelligence, and more research is needed to explore general knowledge applications and specialist bots.

Theme 3. New QVQ-Max Feature on Qwen Chat Enhances User Experience

New QVQ-Max on Qwen Chat (Score: 115, Comments: 16): Qwen Chat introduces "QVQ-Max", a powerful visual reasoning model, among other models like "Qwen2.5-Max," described as the most powerful language model in the series. The user interface highlights the capabilities of each model, including "Qwen2.5-Plus," "QwQ-32B," and "Qwen2.5-Turbo," against a dark, clean design backdrop.
- QVQ-Max and other models like Qwen2.5-Max are generating interest among users, with some planning to include them in their testing schedules, particularly on advanced hardware like the M3 Ultra.
- A comment highlights that the model is currently closed, indicating limited or restricted access at the moment.
- There is anticipation for further developments or releases, with hints from an employee on Twitter about potential updates or enhancements expected on Thursday.

Theme 4. Gemini 2.5 Pro Faces Performance Criticism Despite ASIC Advantage

Gemini 2.5 Pro Dropping Balls (Score: 106, Comments: 17): The post titled "Gemini 2.5 Pro Dropping Balls" compares Gemini 2.5 Pro with LLaMA 4, but lacks detailed content in the body. The title suggests potential issues or shortcomings with Gemini 2.5 Pro.
- Gemini 2.5 Pro vs LLaMA 4: There is skepticism about Gemini 2.5 Pro's superiority over LLaMA 4, with some users arguing that Grok only comes close when using a sampling of 64. However, others believe that no current model, including Claude, can surpass Gemini 2.5 Pro.
- Technological Edge: Google's use of their own ASICs gives them a significant advantage, with Meta and Amazon trying to catch up using MTIA and Tranium respectively. Google is perceived to be six or seven generations ahead, making the competition challenging.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

debugging issues with our pipelines, sorry...

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Gemini 2.5 Pro: Rate Limits, Pricing, and Performance Hype

Cursor Users Hit Gemini 2.5 Pro Rate Limit Wall: Cursor users are experiencing extremely low rate limits with Gemini 2.5 Pro, some reporting throttling after just two API requests. Workarounds involve using Open Router and personal AI Studio API keys, with some suggesting Google Workspace Business accounts might unlock higher limits.
Cursor Defends Gemini 2.5 Pro Pricing Amidst Free API Claims: Cursor faces user backlash for charging for Gemini 2.5 Pro, which is perceived as free via Google AI Studio API. A Cursor representative clarified that charges cover capacity at scale, as Google doesn't offer a truly free tier for their usage levels.
Gemini 2.5 Pro Eclipses Claude 3.5 Sonnet in User Preference: User evaluations show Gemini 2.5 Pro surpassing Claude 3.5 Sonnet, capturing 3% of top rankings for story generation while Sonnet plummeted from 74% to 18%. Users praise Gemini 2.5 for seamless handling of long contexts, with some confirming 15K token context windows on AI Studio.

Theme 2. OpenAI's GPT-4o: Updates, Image Generation, and Policy Shifts

GPT-4o Gets Another Update, Climbs Arena Leaderboard: OpenAI's GPT-4o received a significant update, now ranking #2 on the Arena leaderboard, surpassing GPT-4.5 and tying for #1 in Coding and Hard Prompts. The update reportedly improves instruction following, problem-solving, intuition, and creativity.
OpenAI Eases Image Generation Policy, Meltdown Imminent: OpenAI is relaxing its image generation policy from blanket refusals to preventing real-world harm, aiming for greater creative freedom. Sam Altman joked that GPUs are melting due to image generation popularity, leading to temporary rate limits.
Midjourney CEO Disses 4o Image Generation as "Slow and Bad" Meme: The Midjourney CEO dismissed GPT-4o's image generation as slow and bad, calling it a fundraising tactic and a meme, not a serious creative tool. This criticism comes amid discussions about model naming conventions and the White House deleting a tweet with a Ghibli-style image.

Theme 3. Model Context Protocol (MCP) Gains Momentum and Faces Challenges

OpenAI and Cloudflare Embrace Model Context Protocol (MCP): OpenAI CEO Sam Altman announced MCP support is coming to OpenAI products like the Agents SDK and ChatGPT desktop app, signaling a major step for MCP adoption. Cloudflare also now supports building and deploying remote MCP servers, lowering the barrier to entry.
Claude Desktop Struggles with MCP Prompt and Resource Handling: Users reported issues with Claude Desktop getting stuck in endless loops when MCP servers include resources or prompts. A workaround involves removing capabilities to prevent Claude from searching for these elements, with a fix released.
LlamaCloud Integrates as MCP Server for Real-time Data: LlamaCloud can function as an MCP server, enabling real-time data integration into workflows for any MCP client, including Claude Desktop. This allows users to leverage existing LlamaCloud indexes as dynamic data sources for MCP.

Theme 4. Local LLM and Tooling Updates: Unsloth, LM Studio, and Aider

Unsloth Unleashes Dynamic Quantization and Orpheus TTS Notebook: Unsloth AI released Dynamic Quants for improved accuracy and efficiency in local LLMs, along with DeepSeek-V3-0324 GGUFs. They also launched the Orpheus TTS notebook for human-like speech synthesis with emotional cues and voice customization, outperforming OpenAI's TTS in user tests.
LM Studio 0.3.14 Adds Multi-GPU Controls and Optimization: LM Studio 0.3.14 introduces granular controls for multi-GPU setups, allowing users to optimize performance by enabling/disabling GPUs and choosing allocation strategies. The update also includes 'Limit Model Offload' mode for improved stability and long context handling, with enhancements coming for AMD GPUs.
Aider's /context Command Streamlines Code Navigation: Aider introduced the /context command, automating the identification and addition of relevant files for a given request, improving workflow efficiency in large codebases. However, users reported compatibility issues with Gemini via the OpenAI API compatibility layer and CPU usage spikes.

Theme 5. Turing Institute Turmoil and Open Source RL System DAPO

Alan Turing Institute Faces Mass Layoffs and Project Cuts Despite Funding: The Alan Turing Institute (ATI), despite a recent £100 million funding injection, is planning mass redundancies and to axe around a quarter of its research projects, causing staff revolt. The institute faces an existential crisis amidst competition from the wider AI field.
ByteDance's Open-Source RL System DAPO Emerges Quietly: ByteDance and Tsinghua AIR released DAPO, an open-source Reinforcement Learning system, which seemingly went under the radar. Members shared the link, highlighting its potential significance in the RL research community.
Catastrophic Overtraining Paper Challenges LLM Pre-training Paradigm: A new paper coins the term "catastrophic overtraining," suggesting that extended pre-training can degrade fine-tuning performance and make models harder to adapt to downstream tasks. The paper notes that instruction-tuned OLMo-1B performs worse after extended pre-training.

PART 1: High level Discord summaries

Cursor Community Discord

Cursor Chokes on Gemini 2.5 Pro's API Limits: Users report hitting very small rate limits with Gemini 2.5 Pro in Cursor, as low as two API requests before being throttled.
- Some members are using a combination of Gemini API, Requesty, and Open Router to get around these limits while others pointed out that Google Workspace Business accounts might unlock higher limits.
Windsurf and Cursor Duke it Out Again: The debate rages on between Windsurf and Cursor, with Windsurf favored for its full context during prototyping, and Cursor preferred for bug fixing.
- One user complained about Windsurf's UI styling compared to Cursor's consistency across pages, while another requested to hire the official shitsurf shill.
Context Window Constrained by Cursor?: A user asked whether Cursor restricts the context window for Gemini 2.5 Pro to 30K, despite the model's advertised 1M context limit.
- Others chimed in that while agentic models have a 60k context window, Claude 3.7 has a 120k context window, or even 200k if max settings are enabled (though there was not enough data to confirm if this was Vertex).
Gemini 2.5 Pro's Pricing Perplexity: Users are questioning why Cursor charges for Gemini 2.5 Pro, when it's supposedly free via Google's AI Studio API, leading to accusations of dishonesty.
- A Cursor representative clarified that the charges cover the capacity required to handle the model's usage, as Google doesn't offer a free tier at Cursor's scale, adding that the price mirrors Gemini 2.0 Pro.
Cline Takes Over Coding Workflows: Some users plan to ditch Cursor, opting for VSCode with Cline (with Gemini) for coding, and DeepSeek v3 from Fireworks for planning.
- One user expressed nostalgia for Cursor's Tab feature, and while noting the decline of most models, still recognized the usefulness of the models, but concluding that local models all suck at the RTX 4090 level anyway.

Perplexity AI Discord

Perplexity Bot Joins Discord: Perplexity launched the Perplexity Discord Bot for testing, providing quick answers and fact-checking directly within Discord channels, accessible via tagging <@&1354609473917816895> or using the /askperplexity command.
- Testers can explore features like /askperplexity, fact-check comments with the ❓ emoji, and create memes with /meme, with feedback directed to the <#1354612654836154399> channel for improvements.
GPT-4.5 Quietly Departs Perplexity: Members noticed GPT-4.5 disappeared from model selection on Perplexity.com, potentially due to cost concerns.
- The Perplexity AI bot clarified that GPT-4.5 generally outperformed GPT-4o in scientific reasoning, math, and coding, while GPT-4o excels in general and multimodal use.
Complexity Extension Boosts Perplexity: Users discussed the 'Complexity' extension, a third-party add-on designed to enhance Perplexity's functionality.
- The Perplexity AI bot noted that while the extension's features could be beneficial as native options, integration decisions depend on user demand, technical feasibility, and product roadmap alignment.
MCP Servers Control Perplexity: Users explored leveraging Model Context Protocol (MCP) servers like Playwright to allow Perplexity to control browsers or other applications with available MCP servers.
- The Perplexity AI bot explained that configuring the server to work with Perplexity's API enables automated browser actions and web interactions.
User Encounters API Parameter Error: A user encountered an error related to the response_format parameter despite having credits, disrupting their app's functionality and leading to lost sales.
- The API team implemented error handling for parameters like search_domain_filter, related_questions, images, and structured_outputs, clarifying these were never available for non-Tier 3 users; see usage tiers.

Manus.im Discord Discord

Gemini 2.5 Pro Shows Early Promise: Members are giving early praise that Gemini 2.5 Pro is coding pretty good so far and wayyyy better than any Gemini model used last year.
- One user mentioned it took two separate animation components and meshed them into one for a seamless loading transition.
Manus Invitation Codes Face Delays: Users have expressed frustration regarding the wait times for Manus invitation codes, with some waiting for over a week.
- One member suggested using incognito mode or a different browser when signing up and receiving the embedded code.
Discord UI Sidebar Mysteriously Vanishes: A user reported missing icons, threads and messages, from their Discord sidebar, specifically on platform.openai.com.
- A member suggested changing appearance settings to compact view or checking PC display settings to resolve size issues.
WordPress Staging Site Fails in Manus: A member reported that their WordPress staging site in Manus has been failing repeatedly since the last maintenance.
- No solutions were found.
Manus Plays Well with N8N: Members discussed using Manus with N8N or Make.com for process automation and workflow automation.
- One member is in the process of building their first workflow with N8N and Manus to connect creatives globally.

LMArena Discord

Livebench primarily tests rote tasks: Members debated the merits of Livebench, with some arguing it primarily tests rote tasks and rewards flash thinking rather than deeper reasoning.
- This focus on rote tasks could skew results and reduce the reliability of the benchmark.
Gemini 2.5 Pro raises limits and eyebrows: Gemini 2.5 Pro's capabilities are discussed, ranging from amazing at math to exhibiting instruction following issues, with Logan Kilpatrick announcing increased rate limits on X.
- Concerns are growing regarding the consistency and stability of the model, especially discrepancies between the free AI Studio version and the paid Gemini Advanced version.
AI Censorship debate burns: The discussion shifts to censorship in AI models, with concerns raised about both Western models being woke and Chinese models being propaganda parrots.
- Members debate whether government censorship is distinct from safety guardrails and legal compliance.
Qwen 3 release is imminent: Enthusiasm builds for the upcoming Qwen 3 release, with speculation about its architecture and performance.
- Some anticipate a MoE model with impressive performance, while others remain cautious about its actual capabilities compared to Qwen 2.5 Max.
DeepSeek V3 0324 scores impressively: DeepSeek V3 0324's impressive SWE-bench score is highlighted, raising questions about its coding prowess relative to other models like GPT-4o.
- Some suggest that these coding improvements might be a result of vibe coding or benchmark tuning rather than genuine advancements in the model's architecture.

Unsloth AI (Daniel Han) Discord

Selective Quantization Improves Unsloth's Dynamic Quants: Unsloth's Dynamic Quants are selectively quantized, and they released DeepSeek-V3-0324 GGUFs, including 1-4-bit Dynamic versions.
- This allows running the model in llama.cpp, LMStudio, and Open WebUI, with a guide for detailed instructions.
Unsloth's Orpheus TTS Notebook Speaks Volumes: Unsloth released the Orpheus TTS notebook that delivers human-like speech with emotional cues and allows users to customize voices + dialogue faster with less VRAM, described in this tweet.
- It supports single stage models and one of the members said that Kokoro won't be finetuneable at all.
YouTube Algorithm Succumbs to Galois Theory: After a single YouTube search for Galois theory, a member joked that their feed is now saturated with videos about quintics.
- They quipped that these algorithms are "walking and degrading to crawling after 8k ctx".
Instruct Models Better without Pretraining?: A user was advised not to continually pretrain an instruct model because it can degrade performance and is generally intended for adding new domain knowledge, referencing the Unsloth documentation.
- Instead, the member was encouraged to explore Supervised Fine Tuning (SFT) instead for Q&A tasks.
ByteDance's DAPO System Opens Doors: Members shared a link to ByteDance's Open-Source RL System DAPO, noting it "seemed like it kind of went under the radar".
- The system comes from ByteDance Seed and Tsinghua AIR.

aider (Paul Gauthier) Discord

Google Gemini 2.5 Pro Hits High Demand: Google is prioritizing raising rate limits for Gemini 2.5 Pro due to high demand, according to a tweet by @OfficialLoganK.
- To bypass rate limits, OpenRouter suggests adding an AI Studio API key and setting up OpenRouter.
GPT-4o Gains Traction After Recent Updates: GPT-4o received an update in ChatGPT featuring improved instruction following, technical problem solving, intuition, and creativity, according to OpenAI's tweet.
- It now ranks #2 on the Arena Leaderboard, surpassing GPT-4.5, tying #1 in Coding and Hard Prompts.
Aider's New Command Streamlines Coding Process: Aider's new /context command automatically identifies and adds relevant files for a given request, streamlining the coding process, though it is still in testing.
- This assists in large codebases and saves time, and is useful for figuring out what needs to be modified and can be used with a reasoning model to brainstorm bugs.
Gemini's OpenAI API faces Compatibility Problems: A user reported issues with Gemini not working with the OpenAI compatibility layer, suspecting Litellm as the cause, despite other models functioning correctly.
- The user accesses all AI services through a reverse proxy, necessitating the OpenAI API compatibility.
User Reports Aider CPU Usage Spike: A user reported Aider suddenly spiking to 100% CPU usage, causing hanging or slow response from the LLM, despite working with a small repo.
- The user sought debugging tips, unsure where to start troubleshooting the issue.

OpenAI Discord

Dash Devotee's Keyboard Konversion: A user's keyboard remapping to favor the dash sparked a discussion on punctuation preferences and alternatives, like semicolons, with some joking about using the ^ symbol.
- This underscored the nuanced and personal choices individuals make in their writing and communication styles.
Sora's Struggles: Prompting Plants & Questionable Bunnies: A user sought help crafting Sora prompts for a camera spin effect with smoothly changing plant backgrounds, sharing an example image, criptomeria-na_kmienku-globosa-cr-300x300.webp.
- Meanwhile, concerns arose about Sora generating suggestive content when prompted for bunny characters, raising questions about content moderation.
Image Generation: Vice or Vision?: A user criticized AI image generation as a vice that devalues digital art, sharing an image created with it, Screenshot_20250327_162135_Discord.jpg.
- Subsequent disagreement led to the user being blocked, highlighting differing views on AI's role in the art world.
Arxiv Ascendant: STEM's Speedy Stage: Arxiv's growing presence as a pre-publication platform in STEM fields sparked debate about the value of unreviewed work, the potential for a critical mass of eyes to drive progress, and the coming era of AI peer review.
- Criticism of the traditional peer review process included scientists paying for consideration and losing ownership, fueling enthusiasm for AI to create a more functional, accessible system.
Bulk File Bonanza: Upload All At Once: Members confirmed that to ensure the model considers all documents in a single context, it's preferable to upload all files simultaneously when using tools like ChatGPT.
- This approach ensures the model integrates the information from all documents, provided they are formatted properly.

OpenRouter (Alex Atallah) Discord

Gemini 2.5 Capacity Crunch: Users encounter RESOURCE_EXHAUSTED errors with Gemini 2.5, and are advised to link an AI Studio key in OpenRouter settings to enhance capacity.
- It's highlighted that Google lets users pay for increased capacity through AI Studio.
Deepseek R1 Stalls with Emptiness: Users reported empty API responses from the Chutes provider when using Deepseek R1 (Free), even with fresh keys.
- Setting max_tokens to 0 was found to be a likely culprit, though the issue persisted even after adjustment.
OpenRouter Provider Routing Realities: A user discovered and debugged a routing bug when routing Gemini/Anthropic across Google/Bedrock/Anthropic using the AI SDK.
- Even with allow_fallbacks set to false, requests were not respecting the defined order, leading to all requests ending up on Anthropic; the staff confirmed the routing bug.
OpenRouter Parity Pursuit: A user finds OpenRouter's compatibility with the OpenAI SDK lacking when using google/gemini-2.5-pro-exp-03-25:free compared to openai/gpt-4o-mini via Spring AI.
- A member insists that OpenRouter is supposed to be 100% compatible and the user may be facing rate limits, and suggests testing with Mistral Small 3.1 and Phi 3 models.
Free Gemini 2.5 Pro Prowess Possible: Members share how to utilize OpenRouter to run Gemini 2.5 Pro in @cursor_ai for free, using a Cursor tutorial.
- The member reported this solution as resolving issues encountered after a brief period of troubleshooting.

LM Studio Discord

LM Studio Gets Multi-GPU Mantras: LM Studio 0.3.14 introduces controls for multi-GPU setups, letting users enable/disable GPUs and choose allocation strategies to optimize performance on systems with multiple GPUs.
- The version includes a 'Limit Model Offload to Dedicated GPU memory' mode, improving stability and optimizing long context handling, with enhancements coming to AMD GPUs.
Vision Model Plugin Voyage Vanishes: Members sought vision model plugins for models from Hugging Face, noting Mistral Small is text-only and in GGUF format for LM Studio.
- The suggestion to use Mistral Small in LM Studio was made, but it seems like nobody got the Vision Model working.
Threadripper CPU Tag Squabble: Members debated if AMD Threadripper CPUs are consumer grade, despite being marketed as HEDT (High-End Desktop) processors, referencing a Gamers Nexus article.
- A member argued that while marketed towards home users, Threadrippers are actually professional workstations.
Gemma 3 Generates Great Gains: A user achieved 54 t/s using Gemma3 - 12b Q4_K_M on a newly acquired 9070XT (Vulkan, no flash attention), whereas their 7800XT only generated around 35 t/s with Vulkan and 39 t/s with ROCm.
- Members discussed how Gemma3 models spill into shared memory even with full offload, noting context can fill 32 GB of shared memory with questions being asked on how to load a model as large as 48+gb
P100 is Past Prime: A member asked about a P100 16GB for 400 CAD/200 USD as a hobby investment, but they were strongly advised against it as e-waste.
- Members cited its Tesla architecture, unsupported CUDA versions, and inferior performance compared to modern cards like the 6750XT.

Eleuther Discord

Deepseek V3 Rivals Cloud on Mac: Members compared running Deepseek V3 on Mac Studios (20toks/second) with cloud instances, citing slower performance on an AMD EPYC Rome system (4 tokens/sec) reported in this article.
- The discrepancy might be due to the Mac's faster unified RAM.
EleutherAI Mulls ICLR 2025 Meetup: EleutherAI is considering an ICLR 2025 meetup, potentially hosting about 30 attendees.
- Sponsorship opportunities may be explored if attendance interest is high.
Qwen 32B Stalls on LLM Harness: A member encountered issues evaluating the Qwen 32B model on LLM harness, despite using the latest Transformers version.
- Root cause is possibly due to sharded model having more than 10 shards related to tied embeddings, potentially triggered by the transformers library itself.
Transformers Library Triggers False Errors: A member traced a misleading error back to transformers 4.50.2, sharing a Colab notebook running 4.50.0 where the problem was absent.
- The issue stemmed from insufficient storage, despite the error message suggesting a problem with the AutoModel loading function; fix will come in the form of a PR to lm-eval to add better error handling.
OLMo-1B Suffers Overtraining Crash: The instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on standard LLM benchmarks than its 2.3T token counterpart, according to this paper.
- The Gemma Team also published a new paper with authors including Aishwarya Kamath, Johan Ferret, and Shreya Pathak.

Interconnects (Nathan Lambert) Discord

Gemini 2.5 Pro Dethrones Claude 3.5 Sonnet: Gemini 2.5 Pro edged out Claude 3.5 Sonnet in user preference, capturing 3% of top rankings for story element combinations, while Claude 3.5 Sonnet dropped from 74% to 18% according to evaluations.
- Despite the shift, users lauded Gemini 2.5 for its seamless performance with lengthy contexts, confirmed by a user observing a 15K token context window on AI Studio.
OpenAI's Revenue Rockets, AGI Dreams Loom: OpenAI anticipates tripling its revenue to $12.7 billion this year, projecting $125 billion by 2029, fueled by advancements like GPT-4o and a revised image generation policy focusing on preventing real-world harm, as detailed in a Bloomberg report.
- Despite GPU constraints due to GPT-4o's image generation popularity, OpenAI is temporarily implementing rate limits, with Sam Altman noting, it's super fun seeing people love images in chatgpt, but our GPUs are melting.
Midjourney CEO Slams 4o Image Generation: The Midjourney CEO criticized 4o's image generation as slow and bad, dismissing it as a fundraising tactic and a meme, not a creative tool, as reported on X.
- This criticism surfaces amid discussions about model naming conventions and the White House deleting a tweet featuring a Ghibli-style image, originally described as dark.
Turing Institute in Deep Turmoil: Despite a recent £100 million injection in 2024, the Alan Turing Institute (ATI) is facing mass layoffs and planning to axe around a quarter of its research projects, causing staff upheaval according to researchprofessionalnews.com.
- The institute faces an existential threat given competing challenges from the wider field.

Modular (Mojo 🔥) Discord

Mojo Struggles with Unit Addition: Discussion in the mojo channel highlighted challenges in resolving addition of different units, such as kilometers per second and meters per minute, with concerns about the return type and how to correctly scale the values.
- One member noted that the scale would have to return the correct thing in those cases, and there's no way to use -> A if cond else B type logic for the function's return type.
C Unions Spark Debate in Mojo: A member inquired about how union lowers into, and another suggested using a C union.
- The second member pointed out, Since iirc CUDA has unions in some parts of the API.
Traits Discussion Reveals Nuances: Discussion on extension methods and traits clarified that extensions allow adding methods to library types, a feature not directly available with Rust's impl due to orphan rules.
- Another member corrected that Rust's impl can implement library types.
Implicit Trait Implementations Draw Concern: Debate arose on implicit trait implementations, with a member hoping they are temporary and stating they make marker traits hazardous to have.
- Alternative approaches to propagate trait implementations were discussed, including naming extensions and evaluating soundness tradeoffs.
Tuple Mutability Raises Eyebrows: A member highlighted that assigning to an index inside a tuple is possible and can be done with indexing, demonstrating unexpected mutability.
- Another member clarified that this is a side effect of __getitem__ returning a mutable reference and noted that it should not be the case, demonstrated in the test suite.

HuggingFace Discord

ByteDance's InfiniteYou Merges with ComfyUI: ByteDance's InfiniteYou model, designed for flexible photo recrafting while preserving individual identity, has been integrated into ComfyUI via this Github repo.
- The goal is to deliver a smooth way to generate different, high-quality images, connecting Claude to Vite.
HF Inference API Capped for Free Users: Per HuggingFace's API pricing documentation, free users can no longer query the Inference API once they use up all their monthly credits.
- In contrast, PRO or Enterprise Hub users will incur charges for requests exceeding their subscription limits.
Sieves Streamlines Zero-Shot NLP Pipelines: Sieves, a tool for building NLP pipelines using only zero-shot, generative models without training, was introduced.
- It ensures accurate output from generative models, leveraging structured output from libraries like Outlines, DSPy, and LangChain.
Qwen 2.5 VL Tangles with Memory Errors on Kaggle: A member faced memory errors running Qwen 2.5 VL 3b on Kaggle to describe a 10-second video, after debugging import issues in the latest transformers library (4.50.0.dev0).
- Recommendations included using bigger hardware, a smaller model, or Flash Attention 2 for GPU offloading.

MCP (Glama) Discord

Sama Signals Support: OpenAI Embraces MCP!: OpenAI CEO Sam Altman announced MCP support is coming to OpenAI products like the Agents SDK, ChatGPT desktop app, and Responses API.
- This move is seen as a critical step towards establishing MCP as the backbone for agents handling business-related tasks, similar to HTTP's impact on the internet.
Cloudflare Cranks Context: MCP Gets Remote Server Tooling: Cloudflare now supports building and deploying remote MCP servers with tools like workers-oauth-provider and McpAgent.
- This support is a significant development, offering developers resources to construct MCP servers more efficiently.
Spec Snafus Surface: Claude Struggles with MCP Prompts: Users encountered issues with Claude Desktop when MCP servers include resources or prompts, leading to endless queries, but a new github version with the fix was released.
- A workaround involves removing capabilities to prevent Claude from searching for missing resources and prompts.
Canvas MCP Connects College Courses: A member built a Canvas MCP for college courses, enabling automated querying of resources and assignments.
- In response to a request, the creator added a Gradescope integration agent, enabling autonomous crawling of Gradescope.
All-In-One Docker Compose Preps MCP Servers: An all-in-one docker-compose was created, enabling users to self-host 17 MCP servers easily from Portainer.
- The compose fetches Dockerfiles from public GitHub projects, ensuring updates are automatically applied.

Notebook LM Discord

Mind Map Goes Public: The Mind Map feature on Notebook LM is now publicly available to all users, with the team expressing gratitude for user patience and feedback, with thank you image.
- A member noted that the mind map, while neatly structured, wastes time because it lacks descriptions, and you also have no control with how the mind map is structured or how descriptive it can be.
Podcast Creation in Spanish Stops: A user reported that the ability to generate podcasts in Spanish is no longer functioning in the "customize" settings.
- A member suggested including podcast creation via the NotebookLM API and noted that they could make some really cool things with that.
Notebook Sharing Frustrates: A Pro user reported being unable to share Notebooks via link, even with publicly available YouTube content.
- Potential solutions mentioned include ensuring recipients have active NLM accounts and manually emailing the link.
Gemini gets Turkey Tested: A member tested Gemini 2.5 Pro with a 'Turkey Test', challenging it to write metaphysical poetry about a bird and shared the video with NotebookLM commentary here.
- The user shaped the commentary with Interactive Mode to segue to a satisfying ending, stumbling into new uses for NBLM.
Advanced Research has Limits: A member inquired about the deep research limit for Gemini Advanced, and another member responded that it is 20 research reports per day.
- The first member considered this pretty dang good compared to chatgpt.

Yannick Kilcher Discord

Sketch-to-Model Pipeline Proposed: A member introduced a "Sketch-to-Model" process (Sketch --> 2D/2.5D Concept/Image --> 3D Model --> Simulation) and explored alternatives to Kernel Attention (KA).
- The member mentioned that ChatGPT alluded to a concept akin to KAN, identifying it with Google DeepMind, while Grok 3 indicated that the xAI team is actively researching KAN.
AI Puzzle-Solving Debated: Members pondered whether AI could crack the puzzle book Maze: Solve the World's Most Challenging Puzzle (Wikipedia).
- Suggestions included training LLMs on ARGs and old puzzle games, though it was acknowledged that some puzzles' deliberate difficulty might stymie current reasoning models.
GPT-4o Autoregressive Image Generation Confirmed: GPT-4o is confirmed to be an autoregressive image generation model as stated in OpenAI's Native Image Generation System Card.
- Speculation suggests GPT-4o might be reusing image input tokens for image output, potentially outputting the same format of image tokens used as input, using a semantic encoder/decoder.
Turing Institute Axes Research Projects: Despite securing £100 million in funding, the Alan Turing Institute (ATI) is planning mass redundancies and to cut a quarter of its research projects.
- Reports indicate open revolt among staff due to these cuts.
Tracing Thoughts in Language Model gets dissected: Members are analyzing Tracing Thoughts in a Language Model and the associated YouTube video.
- The conversation is expected to span multiple sessions due to the extensive material available.

GPU MODE Discord

Data Distribution Differs Between DP and TP Ranks: In distributed processing (DP), each rank receives different data, while in tensor parallelism (TP), all ranks get the same data, according to a member.
- They suggested that TRL (Transformer Reinforcement Learning) should already manage this distribution automatically to ensure efficient training and utilization of resources.
Triton Autotune Lacks Pre/Post Hooks: pre_hook and post_hook are not supported in triton.Autotune or triton.Config because they require python code execution at runtime, which Inductor can't support in AOTI.
- One member speculates that implementing this support shouldn't be too difficult and is willing to help.
Hopper's num_ctas Setting Puzzles Triton Users: Users are encountering crashes or RuntimeError: PassManager::run failed exceptions when using a num_ctas value higher than 1 for Hopper in Triton, with the root cause remaining unclear.
- This effectively limits the performance tuning options for Hopper architecture when using Triton.
CUDA's Memory Hierarchy Gets a Boost: A user explained the CUDA memory hierarchy and clarified that it's the movement of data between DRAM and SRAM that is slow.
- This is why memory coalescing and maximizing data transfer efficiency between global memory and shared memory is critical.
Red Hat Needs GPU Kernel Engineers: Red Hat is seeking full-time software engineers at various levels with experience in C++, GPU kernels, CUDA, Triton, CUTLASS, PyTorch, and vLLM.
- Interested candidates should email a resume and summary of relevant experience to [email protected], including "GPU Mode" in the subject line.

LlamaIndex Discord

LlamaCloud Doubles as MCP Server: LlamaCloud can be used as an MCP server, enabling real-time data integration into workflows for any MCP client, as shown in this video demonstration.
- This setup allows an existing LlamaCloud index to serve as a data source for an MCP server used by Claude Desktop.
Claude Leverages Data from LlamaCloud: Claude Desktop can use an existing LlamaCloud index as a data source for an MCP server, integrating up-to-the-second data into the Claude workflow, detailed in this video.
- This functionality enhances Claude's ability to access and utilize real-time information within its workflows.
LlamaExtract Ditches Schema Inference: The schema inference capability in LlamaExtract, announced last year, has been de-prioritized because most users already have the schema they need, as detailed in the LlamaExtract Announcement.
- The feature may return in the future, but other aspects are being prioritized.
LLMs to Caption PDFs and Scanned Images: Members discussed using LlamaParse as the best parsing tool to parse PDFs; another member suggested using an LLM to read and caption an image for a RAG application, to answer questions from uploaded PDFs.
- Another member enquired about Hybrid Chunking and OCR for scanned documents like handwritten mathematics homework.
Chatbot battles with SQL Query Generation: A user building a chatbot that generates SQL queries from user messages reported issues with the bot not picking the appropriate columns, even with column comments in the SQL file.
- No specific solution was provided, but the user was encouraged to file a bug report to the team.

Latent Space Discord

Nvidia Nabs Lepton AI: Nvidia has acquired Lepton AI, an inference provider, for several hundred million dollars to enhance its software offerings for GPU utilization, according to The Information.
- This acquisition aims to simplify GPU utilization and beef up its software offerings.
OpenAI's Agents Embrace MCP: The Model Context Protocol (MCP) now integrates with OpenAI Agents SDK, enabling the use of various MCP servers to supply tools to Agents, as detailed in the Model Context Protocol introduction.
- MCP is envisioned as a USB-C port for AI applications, standardizing context provision to LLMs.
Replit Agent v2 Gains Autonomy: Replit Agent v2, currently in early access with Anthropic’s Claude 3.7 Sonnet, boasts enhanced autonomy, formulating hypotheses and searching for files before making alterations, detailed in the Replit blog.
- The upgrade ensures it is more autonomous and less likely to get stuck on the same bug.
GPT-4o Leaps Up Leaderboard: The latest ChatGPT-4o update (2025-03-26) has surged to #2 on the Arena, surpassing GPT-4.5, with notable enhancements and is tied for #1 in Coding and Hard Prompts, according to the Arena leaderboard.
- The update is reportedly better at following detailed instructions, particularly those with multiple requests, and has improved intuition and creativity.
OpenAI Relaxes Image Gen Policy: OpenAI is adjusting its image generation policy from blanket refusals to preventing real-world harm, aiming to maximize creative freedom while averting actual harm, as described by Joanne Jang in her blog post.
- This policy shift seeks to balance creative expression with harm prevention.

Torchtune Discord

FP8 QAT run Spotted: A member is exploring FP8 QAT and encountered this issue while aiming for a pure QAT run on a cold-trained model.
- They clarified that while FP8 QAT is a goal, immediate resources are limited.
Optimizer State Remains Unaltered: A member confirmed that activating fake quant will not alter the optimizer state.
- This confirmation addresses concerns about unintended side effects during quantization experiments.
GRPO PRs Seek Swift Action: A member emphasized the urgency of merging two GRPO PRs (#2422 and #2425), pointing out that #2425 is a critical bug fix.
- A team member promptly acknowledged the message and committed to addressing the PRs.
Anthropic Allegedly Abandons Ship for TensorFlow: It was pointed out that Anthropic is allegedly standardizing around TensorFlow.
- This has triggered speculation about the future of PyTorch within Anthropic.
JoeI SORA Takes Over: A member shared a screenshot of JoeI SORA in an unspecified context, responding to a query about a model's intuition.
- The member quipped that there is no intuition, just JoeI.

Cohere Discord

Cohere Explores Vector Database Integrations: Members discussed options for vector databases, with a member sharing a link to Cohere's integrations page that showcased options such as Elasticsearch, MongoDB, Redis, Haystack, Open Search, Vespa, Chroma, Qdrant, Weaviate, Pinecone, and Milvus.
- Another member asked about hosting vector DBs online, to which the response implied that Cohere handles hosting concerns.
Founders Ponder AI Agent Pricing: A member initiated a discussion about how founders are pricing and monetizing AI agents, seeking to chat with others and validate insights.
- Another member encouraged sharing more details about AI Agent pricing strategies.
Cohere May Hit Up QCon London: A member inquired whether Cohere would be present at QCon London this year, expressing interest in discussing access to North with a Cohere representative.
- They attended last year.
Refugee Organization Champions Livelihood: A refugee in Kenya introduced Pro-Right for Refugees, a Community Based Organization (CBO) focused on promoting refugee access to livelihood opportunities and enhancing peaceful living in Kakuma Refugee and Kalobeyei Settlement.
- The CBO focuses on peacebuilding, awareness raising, and livelihood initiatives, inviting volunteers and support for refugees.

tinygrad (George Hotz) Discord

Budget AI Rig Assembled for Cheap: A member explored building a budget AI rig for 7-8k yuan using older X99 components, Xeons, and 32GB ECC DDR4 RAM sourced from Taobao.
- Another member confirmed the feasibility of this setup after a quick investigation.
AX650N Specs Highlighted: The AX650N's specs were spotlighted via a link to its product page, revealing 72Tops@int4, 18.0Tops@int8 NPU and native support for Transformer intelligent processing platform.
- Additionally, the AX650N includes an Octa-core A55 CPU, supports 8K video encoding/decoding, and features dual HDMI 2.0 outputs.
AX650N Reverse Engineered: A blog post detailed the reverse engineering of the AX650N, reporting it achieves 72.0 TOPS@INT4 and 18.0 TOPS@INT8.
- The post also mentioned efforts to port smaller Transformer models, pointing to an associated GitHub repo.
Tinygrad's PRs Focus on CPU Functionality: Two Tinygrad pull requests were shared: PR #9546 and PR #9554.
- The first PR addresses a potential fix for a recursion error in test_failure_53, while the second aims to continue moving functions off of CPU in torch backend.
TinyGrad's Code Generation Unveiled: A user inquired about TinyGrad's code generation process, referencing outdated information about CStyleCodegen and CUDACodegen classes, seeking to understand the translation from optimized plans to low-level code.
- The discussion sought to clarify how TinyGrad translates optimized plans into executable code for various devices (CPU/GPU), given the user's confusion regarding the current implementation.

LLM Agents (Berkeley MOOC) Discord

Sharing Lecture Recordings OK'd: A member inquired about sharing lecture recordings for the LLM Agents Berkeley MOOC, and a moderator confirmed it's allowed, encouraging new participants to sign up.
- This is part of onboarding new MOOC participants.
Mentorship Deadline Extension Mulling: A member requested a deadline extension for the mentorship application; the moderator noted the form won't close immediately.
- However, consideration after the deadline isn't guaranteed due to high interest and the need to start projects soon.
Entre Track Mentorship Missing: A member asked about mentorship for the Entre track, and the moderator clarified that Berkeley doesn't offer it.
- There will be office hours with sponsors in Apr/May.

DSPy Discord

AOT Breaks Down Reasoning: Atom of Thoughts (AOT) decomposes questions into atomic subquestions structured as a directed acyclic graph (DAG), in contrast to Tree of Thoughts (ToT) which maintains the entire tree history.
- The poster emphasizes AOT's memoryless reasoning steps and explicit decompose-then-contract phases for atomic subquestions.
Ideal Evaluation Datasets: Ideal evaluation datasets for AOT should include GSM8K and MATH (datasets with step-by-step solutions), HotpotQA and 2WikiMultihopQA (annotated reasoning paths), and datasets explicitly detailing intermediate reasoning steps.
- Examples provided include mock_llm_client.generate.side_effect = ["0.9", "42"] for testing and validation.
LLMDecomposer Strategy is Flexible: AOT utilizes the flexible decomposition provided by LLMDecomposer, with prompts that adapt based on the question type (MATH, MULTI_HOP), support custom decomposers, and enable dynamic prompt selection.
- The decomposition strategy ensures atomicity via a contraction validation phase, exemplified by prompts such as QuestionType.MATH: Break down this mathematical question into smaller, logically connected subquestions: Question: {question}.
MiproV2 Faces Value Error: A user encountered a ValueError while using MiproV2, related to mismatched keys in signature.output_fields, where the expected keys were dict_keys(['proposed_instruction']), but the actual keys received were dict_keys([]).
- Similar issues were reportedly encountered with Copro on GitHub, potentially related to max_tokens settings.

Codeium (Windsurf) Discord

Windsurf Surfs Up with Gemini 2.5 Pro!: Gemini 2.5 Pro is now available in Windsurf, giving users 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call.
- The release was announced on X.
Gemini 2.5 Pro Overloads Windsurf: Windsurf is experiencing rate limiting issues with Gemini 2.5 due to massive load.
- The team is actively working to increase quota and apologized for the inconvenience.

Nomic.ai (GPT4All) Discord

GPT4All Users Bemoan Model Import Issues: Users are reporting difficulties importing models into GPT4All, with the system seemingly unresponsive, with further issues including the inability to search the model list.
- Additional complaints include missing model size information during selection, lack of LaTeX support, and a non-user-friendly model list order.
GPT4All User Experience Draws Ire: Users are expressing frustration with the GPT4All user experience, citing issues such as missing embedder choice options.
- A user stated that you are loosing users ... cause others much more user-friendly and willing to be open.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor Community ▷ #general (1297 messages🔥🔥🔥):

Gemini 2.5 Pro Pricing and Access, Windsurf vs. Cursor: Pros and Cons, Context Window Limitations, Model Performance and Preferences, Workflow Strategies

Cursor Faces Rate Limits with Gemini 2.5 Pro: Users are encountering very small rate limits with Gemini 2.5 Pro in Cursor, with one user reporting only two API requests before hitting the limit.
- A member mentioned using a combination of Gemini API, Requesty, and Open Router to circumvent these limitations, while another suggested that Google Workspace Business accounts might offer increased limits via a direct line to a representative.
Windsurf and Cursor Face-Off: Members continue to debate the merits of Windsurf versus Cursor, with some favoring Windsurf for prototyping due to its claimed full context and others preferring Cursor for bug fixing.
- A user noted that Windsurf's UI styling was inconsistent compared to Cursor's, which maintains a consistent aesthetic across pages, while another called for the "official shitsurf shill" to be hired.
Context Window Size Limited by Cursor?: A user inquired if Cursor limits the context window for Gemini 2.5 Pro to 30K, despite the model's advertised 1M context limit.
- Later, it was clarified that while agentic models have a 60k context window, Claude 3.7 has a 120k context window, or even 200k if max settings are enabled, though not enough data to tell is it's Vertex.
The Perplexing Pricing Puzzle of Gemini 2.5 Pro: Users voiced concerns over Cursor's charging for Gemini 2.5 Pro, which is allegedly free through Google's AI Studio API, leading to accusations of dishonesty.
- A Cursor representative clarified that the charges are to cover the capacity required to handle the model's usage, as Google is not offering a free tier at Cursor's scale and the price of the mode mirrors Gemini 2.0 Pro.
New Workflows Emerge: Goodbye Cursor, Hello Cline?: Users are experimenting with new coding workflows, some of whom intend to move away from Cursor in favor of VSCode with Cline for coding (with Gemini) and DeepSeek v3 from Fireworks for planning.
- While expressing nostalgia for Cursor's Tab feature and a general decline for most models, but still recognizing the usefulness of the said models, one user voiced the opinion that local models all suck at the RTX 4090 level anyway.

Links mentioned:

Perplexity AI ▷ #announcements (1 messages):

Perplexity Discord Bot, Testing the Discord Bot, Discord Bot Feedback

Perplexity Bot Enters Discord: Perplexity introduces the Perplexity Discord Bot to the community for testing.
- The bot aims to provide quick answers and fact-checking capabilities directly within Discord channels, accessible via tagging <@&1354609473917816895> or using the /askperplexity command.
Testing the Perplexity Bot: Testers can get quick answers by using the /askperplexity command or tagging <@&1354609473917816895>, fact-check comments with the ❓ emoji reaction, and create memes with the /meme command.
- The call is for community members to unleash their curiosity and explore the bot's features, integrated directly into Discord channels.
Discord Bot Feedback welcome: Feedback for improving the Discord bot is requested in the <#1354612654836154399> channel.
- The team is actively seeking bug reports and suggestions to enhance the bot's performance and user experience.

Perplexity AI ▷ #general (748 messages🔥🔥🔥):

GPT-4.5 Discontinued, Complexity Extension, MCP Servers, Perplexity Pro API vs Subscription, CEO Suggests Ad Removal

GPT-4.5 Vanishes from Perplexity: Members noted that GPT-4.5 is missing from the model selection screen on Perplexity.com, possibly due to cost concerns and without prior notice.
- The Perplexity AI bot stated GPT-4.5 generally outperforms GPT-4o in scientific reasoning, math, and coding, while GPT-4o is better for general and multimodal use.
Complexity Extension Enhances Perplexity: Users discussed the "Complexity" extension, a third-party add-on enhancing Perplexity's functionality.
- The Perplexity AI bot noted that while the extension's features could be beneficial as native options, integration decisions depend on user demand, technical feasibility, and product roadmap alignment.
Perplexity Integrates with Playwright via MCP: Users explored leveraging Model Context Protocol (MCP) servers like Playwright to allow Perplexity to control browsers or other applications with available MCP servers.
- The Perplexity AI bot explained that setting up and configuring the server to work with Perplexity's API would enable automated browser actions and web interactions.
Service Outages Plague Perplexity: Users reported intermittent downtime issues on Perplexity AI over several days, with potential causes ranging from network issues to VPN usage.
- Affected users experienced loss of threads, inability to create spaces, and general unresponsiveness from the platform, and many expressed frustration over the lack of communication from Perplexity regarding the outages.
Image Generation Woes Frustrate Users: Users reported difficulties accessing the "Generate Image" option in Perplexity Pro, particularly on the iOS app.
- The Perplexity AI bot recommended using the web browser version or contacting support for assistance, while some users shared workarounds involving specific prompts or refreshing the browser.

Links mentioned:

Perplexity AI ▷ #sharing (6 messages):

Perplexity AI Search, Android 15, Bluetooth Toggle

Perplexity AI Searches the web: The bot is performing multiple searches on Perplexity AI, another one, yet another, and another one.
Android 15 Bluetooth Toggle: A member has been searching for Android 15 Bluetooth Toggle improvements and discussions.
Minimum wage 2025: A user has been searching for details on the minimum wage in 2025.

Perplexity AI ▷ #pplx-api (27 messages🔥):

sonar API issues, llama-3.1-sonar-small-128k-online problems, Tier 3 access needed, Perplexity API parameter error handling

Legacy Code Causes Model Errors: A member resolved a code 400 error by deleting a problematic section of legacy code, clarifying the issue wasn't with the llama-3.1-sonar-small-128k-online model itself.
- The user confirmed that after deleting the legacy code, the model functioned normally.
API User Requests Tier 3 Access for Sonar Reasoning Pro: A user urgently requested Tier 3 access for Sonar Reasoning Pro to control search parameters (images, related questions, search domain filter, structured outputs) for an upcoming event.
- They also inquired about specifying search depth, providing custom instructions for research, iterative search capabilities, and obtaining iterative search sources to summarize with another LLM.
Sudden Parameter Error Halts App Functionality: A user encountered an error related to the response_format parameter despite having credits, disrupting their app's functionality and leading to lost sales.
- They sought immediate support to resolve the issue and requested temporary access to prevent further losses.
API Error Handling Implemented for Parameters: The API team implemented error handling for parameters like search_domain_filter, related_questions, images, and structured_outputs, clarifying these were never available for non-Tier 3 users.
- If users pass JSON schema within the prompt, they will continue to see the correct behavior.
User seeks integrating Llama Index RAG with Sonar: A user asked for advice on integrating Llama Index RAG context using the index object to Perplexity sonar model.
- The post was tagged with the 'pplx' emoji.

Link mentioned: no title found: no description found

Manus.im Discord ▷ #general (1045 messages🔥🔥🔥):

Gemini 2.5 Pro, Manus Invitation Code Wait Times, Discord Sidebar Changes, Manus Staging WordPress Issues, Manus and N8N Workflow Automation

Early praise for Gemini 2.5 Pro: Some members reported that Gemini 2.5 Pro is coding pretty good so far and wayyyy better than any Gemini model used last year.
- One user noted they took two separate animation components and meshed them into one for a seamless loading transition.
Manus Invitation Code delays: Several users expressed frustration regarding the wait times for Manus invitation codes, with some waiting for over a week.
- One member suggested using incognito mode or a different browser when signing up and receiving the embedded code.
Discord UI Sidebar Gone: A user inquired about missing icons, threads and messages, from their Discord sidebar, specifically on platform.openai.com.
- Another user suggested changing appearance settings to compact view or checking PC display settings to resolve size issues.
WordPress staging not working: A member reported that their WordPress staging site in Manus has been failing repeatedly since the last maintenance.
- Others did not seem to have the same issue and no solutions were found.
Manus integrates with N8N: Members discussed using Manus with N8N or Make.com for process automation and workflow automation.
- One member is in the process of building their first workflow with N8N and Manus to connect creatives globally.

Links mentioned:

LMArena ▷ #general (652 messages🔥🔥🔥):

Livebench benchmark discussion, Gemini 2.5 Pro performance, Censorship in AI models, Qwen 3 release, DeepSeek V3 0324 performance

Livebench gets grilled as rote task tester: Members debate the merits of Livebench, with some arguing it primarily tests rote tasks and isn't a comprehensive benchmark.
- There is an implication that Livebench rewards flash thinking rather than deeper reasoning, which could skew results.
Gemini 2.5 Pro sees rate limits raised: Gemini 2.5 Pro's capabilities are discussed, with impressions ranging from amazing at math to exhibiting instruction following issues.
- There are concerns about the consistency and stability of the model, especially the discrepancies between the free AI Studio version and the paid Gemini Advanced version, although Logan Kilpatrick announced increased rate limits on X.
AI Censorship debate rages on: The discussion shifts to censorship in AI models, with concerns raised about both Western models being woke and Chinese models being propaganda parrots.
- Members debate whether government censorship is distinct from safety guardrails and legal compliance.
Qwen 3 gets release date soon?: Enthusiasm builds for the upcoming Qwen 3 release, with speculation about its architecture and performance.
- Some anticipate a MoE model with impressive performance, while others remain cautious about its actual capabilities compared to Qwen 2.5 Max.
DeepSeek V3 0324 codes like a pro: DeepSeek V3 0324's impressive SWE-bench score is highlighted, raising questions about its coding prowess relative to other models like GPT-4o.
- Some suggest that these coding improvements might be a result of vibe coding or benchmark tuning rather than genuine advancements in the model's architecture.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (391 messages🔥🔥):

Unsloth Dynamic 4-bit Quantization, Qwen/Qwen2.5-Omni-7B in Unsloth, GRPO research, TTS fine-tuning, Llama 3.2 vision

Unsloth's New Dynamic Quantization is Selectively Quantized: Unsloth's Dynamic Quants are selectively quantized, greatly improving accuracy over standard bits.
- Unsloth released DeepSeek-V3-0324 GGUFs, including 1-4-bit Dynamic versions, for running the model in llama.cpp, LMStudio, and Open WebUI, and provides a guide for detailed instructions on running them.
TTS Finetuning and emotional cues with Unsloth Notebook released: Unsloth released the Orpheus TTS notebook that delivers human-like speech with emotional cues and allows users to customize voices + dialogue faster with less VRAM.
- One of the members said that it supports single stage models -> pretty much llm text token in -> audio token out and added that Kokoro won't be finetuneable at all.
Details on how to train an LLM that generate voices with classified emotions: Members are using scribe v1 from 11labs that does audio event classification to extract transcribed voice lines with emotion from a dataset of 40k hours audio in order to train an Orpheus model.
- The goal is prose/speed/spacing in a given mood and a member stated the official orpheus was trained on 8096 ctx so you can go as high as 5-10 min.
LLMs playing turn based strategy games: Members discussed the possibility of finetuning an LLM to play a complicated turn based game like chess but with x1000 more complexity.
- One of the members replied you fooling your self if you think you can get that reliable with so many permutations / rules, and recommended to check y-haidar/awbw-research.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (2 messages):

YouTube feed algorithms, context length limitations

YouTube Algorithm Pushes Galois Theory: A member joked that after a single YouTube search for Galois theory, their feed is now saturated with videos about quintics.
- They quipped that these algorithms are "walking and degrading to crawling after 8k ctx".
Context Length Crawling: The member also joked that context length decays rapidly with longer content.
- They quipped that these algorithms are "walking and degrading to crawling after 8k ctx".

Unsloth AI (Daniel Han) ▷ #help (67 messages🔥🔥):

Gemma3 finetuning issues, Dynamic 4-bit quantization, Qwen2.5VL-7B finetuning, Toxicity injection attacks, Llama 3 fine-tuning with LoRA

Gemma3 Finetuning Frustrations: A user encountered a ValueError when trying to load a finetuned Gemma3 model from disk, related to missing bitsandbytes components, as discussed in GitHub issue #638.
- The user provided a minimal example showcasing the issue with saving and loading the Gemma3 model.
Qwen2.5VL-7B fine-tuning changes: A user shared their changes to the GRPOTrainer for fine-tuning Qwen2.5VL-7B, pointing to a specific Discord message.
- The user questioned whether Unsloth's implementation could provide better memory improvements compared to their own changes.
LoRA fine-tuning unlocks Llama 3: A blog post was shared discussing how LoRA reduces the amount of parameters modified during fine-tuning which makes finetuning Llama 3 8B feasible.
- The Neptune AI blog post can be found here.
Instruct Models: To Pretrain or Not To Pretrain?: A user was advised not to continually pretrain an instruct model because it can degrade performance and is generally intended for adding new domain knowledge.
- Instead, the member was encouraged to explore Supervised Fine Tuning (SFT) instead for Q&A tasks, referencing the Unsloth documentation.
Llama 3's Batched Token Troubles: A user reported that Llama 3 8B returns different tokens for the same input when using batched inputs, and traced the issue back to a regression in the 2025.3.4 release.
- The user confirmed that 2025.3.3 works as expected.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (65 messages🔥🔥):

ByteDance training policy, Dr GRPO paper, Catastrophic overtraining, Low precision training, Nvidia pruning paper

DAPO: ByteDance's Open-Source RL System Sparks Interest!: Members shared a link to ByteDance's Open-Source RL System DAPO, noting it "seemed like it kind of went under the radar".
- The system comes from ByteDance Seed and Tsinghua AIR.
Reasoning Lengths Influence Model Performance: Discussion arose around the idea that a "steady increase in response length allows for greater exploration", improving the model's reasoning and training, citing the Dr GRPO paper.
- It was theorized that penalizing against too many thinking tokens for wrong answers might "cut off the search space".
Catastrophic Overtraining: More Pre-training, More Problems?: The paper "Catastrophic Overtraining" suggests that extended pre-training can degrade fine-tuning performance, terming this effect "catastrophic overtraining."
- One member suggested that if an LLM is very tightly fit to one probability distribution (the pre training data) and then you try to shift it to a second distribution (the instruct data) it won't go well.
Pruning and Layer Addition: Nvidia's Strategy for Fine-Tuning?: Nvidia presented a pruning paper which can be found here, as mentioned by a member.
- Pruning parameters irrelevant to a given task, then adding new layers to compensate for lost bulk, and continuing training, could strike a balance between plasticity and pre-training gains, as described by another member.
KB-LaM: Microsoft's Plug-and-Play External Knowledge!: A member shared a link to Microsoft's KB-LaM (Knowledge Base Language Model) introducing plug-and-play external knowledge to LLMs.
- Unlike fine-tuning or RAG, KB-LaM integrates external knowledge without costly retraining or complex retrieval modules.

Links mentioned:

aider (Paul Gauthier) ▷ #general (452 messages🔥🔥🔥):

Gemini 2.5 Pro, Rate Limits with Gemini 2.5 Pro, Model Context Protocol (MCP), OpenAI's GPT-4o Update, Aider's New /context Command

Gemini 2.5 Pro Faces Demand Surge, Rate Limits To Rise: The Google team is prioritizing raising rate limits for Gemini 2.5 Pro due to high demand (@OfficialLoganK Tweet).
- One user notes Google LLMs have been buttcheeks for coding forever, give us something!
OpenRouter Offers Gemini 2.5 Pro Optimization Tactics: To bypass rate limits, OpenRouter suggests adding an AI Studio API key and setting up OpenRouter, using it as a surge protector.
- Another member noted their free tier quantized version of deepseek is rather stupid and verbose.
Gemini 2.5 Demolishes Claude 3.7's Coding Capabilities: Users on Reddit report that Gemini 2.5 fixed Claude's 3.7's atrocious code with one prompt.
- Another member reports Gemini 2.5 has just blasted my refactoring project, absolutely incomparable to Sonnet.
OpenAI's GPT-4o Gets A Boost: GPT-4o received an update in ChatGPT featuring improved instruction following, technical problem solving, intuition, and creativity (OpenAI Tweet).
- It now ranks #2 on the Arena Leaderboard, surpassing GPT-4.5, tying #1 in Coding and Hard Prompts.
Aider's /context Command Expedites Codebase Navigation: Aider's new /context command automatically identifies and adds relevant files for a given request, streamlining the coding process, however it is half baked and in testing.
- This assists in large codebases and saves time, and is useful for figuring out what needs to be modified and can be used with a reasoning model to brainstorm bugs.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (48 messages🔥):

Readonly file addition PR, Gemini issues with OpenAI API, /context mode explanation, Aider git issues, Setting different models as architect and coder

Readonly File Addition? Make a PR: A user inquired about creating a PR to add an option for adding files as readonly.
- This would allow Aider to automatically read certain files based on a regex file pattern, similar to Cursor's rules-based system.
Gemini's OpenAI API Compatibility Conundrums: A user reported issues with Gemini not working with the OpenAI compatibility layer, suspecting Litellm as the cause, despite other models functioning correctly.
- Paul suggested using the Gemini provider instead, while the user explained that they access all AI services through a reverse proxy, necessitating the OpenAI API compatibility.
Aider's Undo/Clear Command Clarification: A user asked whether Aider remembers undos when using /undo and if /clear deletes the entire chat memory, Paul confirmed.
- Paul advised using /clear if going in circles, and trying again fresh.
Custom Linting Commands: Clojure Conundrums: A user sought advice on making Aider handle mismatched parentheses in Clojure, noting Aider's documentation mentions a built-in linter for Clojure.
- Another member suggested setting a custom lint command in the YML config, while the user wondered if the built-in linter requires activation.
Taming Aider's CPU Usage Spikes: A user reported Aider suddenly spiking to 100% CPU usage, causing hanging or slow response from the LLM, despite working with a small repo.
- The user sought debugging tips, unsure where to start troubleshooting the issue.

OpenAI ▷ #ai-discussions (222 messages🔥🔥):

Keyboard Remapping, Dashes vs. Semicolons, Sora Prompts and AI Image Generation, Midjourney vs Sora image generation, NSFW content and AI

Keyboard Remapping Craze: Dashes Dominate Discourse: A user remapped their keyboard to better use the dash, feeling it fit their personality better, sparking a discussion about dash usage versus other punctuation.
- Some users admitted to using dashes more frequently, while others humorously threatened to use the ^ symbol as a separator.
Sora's Struggles: Prompting Plants and Censorship Suspicions Surface: A user sought help with Sora video prompts to create a camera spin effect with smoothly changing backgrounds using a plant image (criptomeria-na_kmienku-globosa-cr-300x300.webp).
- A user raised concerns about Sora generating younger images when asking for bunny characters, and another user found the resulting image highly sus.
Image Generation Temptation: Vice or Vision?: A user argued that AI image generation is a vice and a distraction, driving down the value of digital art, after sharing an image created with it (Screenshot_20250327_162135_Discord.jpg).
- After one user blocked him for disagreeing, others noted how easily they block people that disagree with them.
DeepSeek's Image Gambit: Looming 4o Competition?: Members speculate when DeepSeek will release its own 4o-competitive image model, pointing out that DeepSeek previously released an autoregressive image generation model called Janus.
- One user noted that Janus was worse than 4o, but it demonstrated DeepSeek's progress in autoregressive image generation.
Google AI Studio vs. ChatGPT: The Quest for Coding Clarity: A user sought alternatives to ChatGPT Plus for coding in Matlab, citing prompt limitations, with suggestions including Google AI Studio, Grok3, Hugging Face Chat, DeepSeek v3, and QwenChat.
- They were quickly confused by the amount of options given, noting they all seemed the same to them.

OpenAI ▷ #gpt-4-discussions (16 messages🔥):

Context Window, Image generation

Bulk File Upload: Members discussed whether it's better to upload files all at once, rather than one at a time, and the consensus was that it is preferable to upload all files simultaneously to ensure the model considers all documents.
- A member confirmed that uploading all files at once is fine, with the caveat that everything must be formatted properly.
ChatGPT image commercial usage is allowed: A member asked whether images created by ChatGPT are allowed to be used commercially if paying for ChatGPT.
- Another member responded yes, clarifying that commercial use of images is allowed with a paid subscription.
Context Window Estimations: Members discussed the context window limitations, where one clarified that Plus accounts have a 32k context window, while Pro accounts have 128k.
- Another member estimated that they might cross the context window limit with 10 documents (totaling 30MB) within just 10 messages of chat conversation.
No Sign-up Cartoon Image Generator Hunt: A member asked if there is an image generator that can convert different scenes from a script into a cartoon style without requiring sign-up each time.
- The user was unsure where to post this question in the Discord channel.

OpenAI ▷ #prompt-engineering (42 messages🔥):

Sora Prompt Engineering, AI for Academic Research, Arxiv's role in STEM publishing, AI peer review, Translating Foreign Language Data

Sora's Prompts: Cinematic Visions with AI: Crafting effective Sora prompts requires balancing concise cinematic direction with clear visual composition, including camera movement, focus, background treatment, and emotive tone, as demonstrated with a sample prompt for generating a 360-degree orbit around a detailed sculpture.
- A user showcased the results of using a specific prompt to generate a video of a sculpture with a blurred background, highlighting the system's capabilities in text-to-video generation and providing a tangible example of successful prompt engineering.
AI Research: Paywalls & Publication Bottlenecks: A user seeking prompts for deep research in academic sources was informed that the majority of academic sources are paywalled, limiting the potential for AI-driven research, with one member noting there's already a glut of AI written "research" and it's bottlenecking publication.
- While abstracts are accessible, basing deep research solely on them was deemed insufficient, leading to a discussion about the challenges and limitations of using AI for comprehensive academic research due to access barriers and the current state of AI-generated content.
Arxiv Momentum: STEM's Pre-Publication Powerhouse: Arxiv is gaining popularity in STEM fields as a pre-publication platform, which prompted a discussion about the value of unreviewed work compared to the peer review process.
- It was suggested that a critical mass of eyes on pre-publication work could lead to advancements, with AI potentially becoming a peer in the review process in the future, sparking debate about the evolving dynamics of academic validation.
AI Peer Review: A Coming Revolution?: The possibility of AI serving as a peer reviewer was discussed, suggesting that AGI models could possess the intellectual capacity to effectively evaluate research, though it shouldn't be the only word.
- Concerns were raised about the current peer review system, including scientists being forced to pay for consideration and work being locked behind paywalls, sparking hope that AI peer review could lead to a more functional and accessible system.
AI Translation: Bridging Language Barriers in Research: A user expressed difficulty in researching dam removal consequences due to data and articles being in a foreign language, leading to a suggestion to use AI models to translate and discuss the information chunk by chunk.
- It was recommended to keep the original language and engage the model in a conversation to ensure a comprehensive understanding of the ideas, capturing unspoken nuances and addressing differences in how concepts are framed between languages, facilitating more effective research.

OpenAI ▷ #api-discussions (42 messages🔥):

Sora Prompts, AI Research Paper, Arxiv, Meta-Prompting, AI Peer Review

Crafting Sora Prompts for Cinematic Videos: A member shared advice on writing prompts for Sora, emphasizing camera movement, focus, background treatment, and emotive tone for generating cinematic videos.
- The suggested prompts include specific details like "smooth 360-degree camera orbit around a detailed sculpture" with "softly blurred background with depth-of-field" to achieve the desired effect.
Quest for AI-Assisted Deep Research papers gets questioned: Members discussed the possibility of using AI for deep research in writing research papers, noting that many academic sources are paywalled, making comprehensive automated research difficult.
- One member suggested meta-prompting, which involves explaining the research goal to ChatGPT to generate a more effective research prompt, while expressing doubts that the AI will still struggle with academic sourcing because of robots.txt settings on a lot of academic sites, making those sources as trash the AI can't read.
Arxiv Gaining Popularity in STEM: A member mentioned that Arxiv is gaining popularity in STEM fields as a platform for publishing papers, even though it is not peer-reviewed.
- Another member argued that once enough people review the work, things start to move, and they anticipate AI becoming a peer reviewer in the near future, changing the dynamic of academic research.
Peer Review not perfect; good stuff is good without it: One member argued that the highest quality work is valuable before and after peer review, also pointing out that junk can pass through the peer review process.
- The member further criticized the traditional publishing model where authors often have to pay to have their work considered and then lose ownership of their work, giving rise to complaints about this process.
Model can assist with language translation challenges: A member who needed to work with data from government agencies that publish data in a foreign language was advised to converse with the model, explaining they're gonna read it together, one chunk at a time, and discuss what it means, and what does the model infer about it, what isn't said, are there any conflicts or ambiguities.
- The idea is to paste chunks of the original language text into the model, allowing the model to infer and translate ideas more effectively.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini 2.5, OpenRouter tips, Cursor IDE integration

Gemini 2.5 Free Quota Maximization Tips: To maximize your free Gemini 2.5 quota, users should add their AI Studio API key in OpenRouter settings so that rate limits act as a surge protector.
- Additionally, users should set up OpenRouter in their favorite IDE, with a Cursor tutorial available, and use one-shot tickets.
Use OpenRouter for Free Gemini 2.5 Pro in Cursor AI: A member posted how to use OpenRouter to get Gemini 2.5 Pro in @cursor_ai for free.
- It was mentioned this was after scratching his head for 10 mins and seeing this a few times on X, and that it was working and solved this for them.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): To maximize your free Gemini 2.5 quota:1. Add your AI Studio API key in https://openrouter.ai/settings/integrations. Our rate limits will be a “surge protector” for yours.2. Set up OpenRouter in your ...

OpenRouter (Alex Atallah) ▷ #general (268 messages🔥🔥):

Stripe security, Gemini 2.5 Pro, OpenRouter and OpenAI SDK compatibility, Deepseek R1 provider issues, OpenRouter provider routing

OpenRouter Doesn't Store Card Info, Cites Stripe: Following a user email about a potential issue, OpenRouter confirmed they do not store any card information and that all payment processing is handled by Stripe.
- A member suggested contacting Stripe or the user's bank for more information, mentioning that they've had no issues with Stripe after topping up every 2 weeks.
Navigating Gemini 2.5's Low Capacity: Users reported receiving error messages like RESOURCE_EXHAUSTED when using the new Gemini 2.5 model, with Alex Atallah advising users to connect an AI Studio key to increase capacity.
- Members pointed out that Google provides the option to pay for increased capacity through AI Studio.
OpenRouter Aims for OpenAI SDK Parity: A user reported issues using tools with models like google/gemini-2.5-pro-exp-03-25:free via OpenRouter, despite it working with openai/gpt-4o-mini using Spring AI's OpenAI support, questioning OpenRouter's compatibility with the OpenAI SDK.
- A member confirmed that OpenRouter is supposed to be 100% compatible but suggested the user may have hit rate limits or that the specific model might not support tools, while others suggested trying Mistral Small 3.1 and Phi 3 models for testing.
Debugging Deepseek R1 Empty Responses: Users reported receiving empty API responses from the Chutes provider when using Deepseek R1 (Free), even after trying different keys and Targon.
- After debugging, setting max_tokens to 0 was identified as a potential cause, with a member suggesting setting it to a higher value, though the issue persisted even with increased token limits.
OpenRouter's Provider Routing Fails to Route: A user debugged provider routing with the AI SDK, trying to route Gemini/Anthropic across Google/Bedrock/Anthropic, but observed that the order wasn't being respected even when allow_fallbacks was set to false, with always ending up on Anthropic.
- The staff acknowledged the routing bug and thanked the user for finding the bug.

Links mentioned:

LM Studio ▷ #announcements (1 messages):

LM Studio 0.3.14, Multi-GPU Controls, NVIDIA GPUs, AMD GPUs

LM Studio Adds Multi-GPU Controls: LM Studio 0.3.14 introduces new granular controls for multi-GPU setups, allowing users to enable/disable specific GPUs and choose allocation strategies.
- Users can select between allocating evenly or prioritizing GPUs in a specific order, optimizing performance on systems with multiple GPUs.
GPU Allocation Strategy Enhanced: The new version includes a 'Limit Model Offload to Dedicated GPU memory' mode, improving stability and optimizing long context handling on single GPU setups, particularly beneficial for NVIDIA GPUs.
- The developers are actively working to bring these enhancements to AMD GPUs as well.
Unlock Hidden GPU Controls: LM Studio 0.3.14 introduces advanced GPU controls accessible via keyboard shortcuts: Ctrl+Shift+H (Windows) or Cmd+Shift+H (Mac), with a pop-out window option using Ctrl+Alt+Shift+H (Windows) or Cmd+Option+Shift+H (Mac).
- This allows users to manage GPU settings while models are loading, enhancing the user experience.

Links mentioned:

LM Studio ▷ #general (135 messages🔥🔥):

Vision Model Plugins, LM Studio Download Speed, VRAM Requirements for Models, Github Copilot vs Cursor, Fine-tuning Models with Unsloth

Vision Model Plugin Quandaries Arise: Members were looking for vision model plugins to use with models downloaded directly from Hugging Face.
- It was noted that Mistral Small is text-only and uses the GGUF format in LM Studio.
VRAM Determines Local Model Viability: Users discussed how to determine if their PC can handle a model, with the rule of thumb being that 8GB of VRAM can handle 7B Q4KM models.
- It was noted that nothing local with 8GB of VRAM comes close to Sonnet 3.5's performance.
Cursor's Agent Mode Edges Out Copilot: A user expressed preference for Cursor over GitHub Copilot in VS Code, citing Cursor's agent mode and tab complete features.
- They highlighted the ability to choose which model Cursor uses as a key advantage, and referred to Cursor forum for latest updates.
Fine-Tuning Frustrations and Unsloth Insights: Users shared experiences with fine-tuning models using Unsloth notebooks, encountering VRAM issues and various errors, while asking for assistance on how to get started, with pointers to the Unsloth Discord.
- Members clarified the difference between training and RAG (Retrieval-Augmented Generation).
HEDT Threadripper Defies Consumer Labels: Members debated whether AMD Threadripper CPUs should be considered consumer grade, citing their marketing as HEDT (High-End Desktop) processors.
- One member pointed out that while Threadrippers are marketed towards home users, they are actually professional workstations, linking to a Gamers Nexus article supporting that claim.

Links mentioned:

LM Studio ▷ #hardware-discussion (67 messages🔥🔥):

Gemma 3, 9070XT, ROCm, P100, RTX 4060ti 16gb

Gemma 3 T/S on 9070XT: A member with a newly acquired 9070XT achieved 54 t/s using Gemma3 - 12b Q4_K_M (Vulkan, no flash attention).
- They noted their 7800XT only generated around 35 t/s with Vulkan and 39 t/s with ROCm.
Gemma 3's Memory Appetite: Members discussed how Gemma3 models can spill into shared memory even with full offload, and how context can fill 32 GB of shared memory.
- Someone asked so one can load model as large as 48+gb (24 VRAM+32 Shared RAM)? and then followed up saying that even though part of total memory will be in VRAM, it still will work as mostly RAM inference.
P100: E-Waste or Bargain Find?: A member inquired about a P100 16GB as a hobby investment at 400 CAD/200 USD.
- Another member strongly advised against it, labeling it basically e-waste due to its Tesla architecture, unsupported CUDA versions, and inferior performance compared to modern cards like the 6750XT.
Nemo 12B causes crashes: A member reported that any models based on Nemo 12B just crashes and doesn't load.
- The same member stated they are not sure whether they have motherboard issues related to this problem.
Backyard AI Program is a Remedy: A member made progress, indicating that the Backyard AI program seems to load the model in correctly.
- The member suggested trying Backyard AI for other members that haven't resolved loading issues.

Eleuther ▷ #general (88 messages🔥🔥):

Deepseek V3 on Mac Studios vs. Cloud Instances, ICLR 2025 Meetup, Qwen2.5-Omni-7B Audio Testing, Qwen 32B Model Evaluation with LLM Harness, Transformers Library Errors

Deepseek V3 Runs on Mac, Cloud?: Members discussed running Deepseek V3 on Mac Studios and the possibility of using high RAM cloud instances as a cheaper alternative to GPUs, noting a user getting 20toks/second on a Mac Studio.
- Someone pointed to this article where they observed only 4 tokens/sec on an AMD EPYC Rome system, possibly due to the Mac's faster unified RAM.
EleutherAI Planning ICLR 2025 Meetup: A member inquired about an ICLR thread and suggested seeking sponsors for a meetup if attendance numbers were promising.
- Another member estimated around 30 attendees for EAI and friends.
Voice Sample Test for Qwen 2.5 Omni: A member asked for someone to test the Chelsie voice and post an audio sample using Qwen/Qwen2.5-Omni-7B with audio output enabled as described here.
- The code snippet provided was model = Qwen2_5OmniModel.from_pretrained("Qwen/Qwen2.5-Omni-7B", torch_dtype="auto", device_map="auto", enable_audio_output=True,)
LLM Harness struggles with Qwen 32B: A member reported issues evaluating the Qwen 32B model on LLM harness, despite having the latest version of Transformers.
- The consensus was that the sharded model may have more than 10 shards, related to tied embeddings, and that the transformers library could be the cause.
Transformers Library Blamed for Error Messages: A member traced an issue back to transformers 4.50.2, sharing a Colab notebook using version 4.50.0 where the issue was not present.
- The error message was misleading and stemmed from a lack of storage, even though the error message suggested the issue was with the AutoModel loading function, and there was a suggestion to submit a PR to lm-eval to add better error handling.

Links mentioned:

Eleuther ▷ #research (2 messages):

Catastrophic Overtraining, OLMo-1B instruction-tuned model, Gemma Team

LLMs face Catastrophic Overtraining: A new paper challenges the assumption that better pre-training performance translates to improved downstream models, coining the term catastrophic overtraining.
- The paper suggests that extended pre-training can make models harder to fine-tune, leading to degraded final performance due to a systematic increase in the broad sensitivity of pre-trained parameters to modifications.
OLMo-1B Suffers from Catastrophic Overtraining: The instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart, according to a paper.
Gemma Team Publishes New Paper: The Gemma Team has published a new paper with authors including Aishwarya Kamath, Johan Ferret, and Shreya Pathak.

Links mentioned:

Eleuther ▷ #interpretability-general (86 messages🔥🔥):

privileged basis, neural networks and fixed mechanisms, CoT for reward hacking, manifold manipulation

Privileged Basis gets Deconstructed: Members discussed the concept of a "privileged basis", which refers to directions in data that retain more information after nonlinear transformations, but one member called it ill-defined, questioning the assumptions behind its usefulness.
- They asked, privileged by whom?, further questioning the assumptions that the points are uniformly distributed on the unit ball first, and that each point has the same "information content" to start out with too.
Neural Networks lack Organs: One member argued against the concept of mech interp, asserting that neural networks generalize without fixed mechanisms and that is how they generalize, noting, Neural networks aren't made of fixed mechanisms, they have flows of information and intensities of neural activity.
- They linked to a tweet arguing that they can't be organized into a set of parts with fixed functions.
Do Circuits just not exist?: One member suggested interpretability tools will need to describe the manipulation of entire manifolds of data at once rather than zeroing in on the model's behavior for specific samples, suggesting such descriptions may be more architecture invariant than activation or weight based approaches.
- In a possible galaxy-brained moment, this member grew more sympathetic to saying circuits aren't real.
CoT is actually good!: One member stated that CoT is actually good and is one of the best interp tools.
- They linked to a paper showing that monitoring a frontier reasoning model, such as OpenAI o3-mini, for reward hacking in agentic coding environments by using another LLM that observes the model's chain-of-thought (CoT) reasoning can be far more effective than monitoring agent actions and outputs alone.

Links mentioned:

Eleuther ▷ #lm-thunderdome (2 messages):

AlpacaFarm logprob/loss implementation, Instruction tuning EOS token

AlpacaFarm's Loss Implementation Questioned: A member inquired about the difference between logprob/loss implementation of AlpacaFarm and noted that using their loss implementation results in the same loss/perplexity on different sequences.
- The member linked to the specific implementation in rl_models.py for reference.
EOS Token's Role in Instruction Tuning Examined: A member asked whether an EOS token is added when doing instruction tuning.
- No further discussion or links were provided.

Link mentioned: [Discussion] about compute_logprobs · Issue #56 · tatsu-lab/alpaca_farm: alpace_farm implementation https://github.com/tatsu-lab/alpaca_farm/blob/94b02079b74af731b2671e3691a5080d5d340fd8/src/alpaca_farm/models/rl_models.py#L97C30-L97C46 DeepSpeedExamples implementation ...

Eleuther ▷ #gpt-neox-dev (7 messages):

GPT-NeoX Data Chunking, Cross-Document Attention, FA3 Support for H100s, FP8 and H100 performance

GPT-NeoX Data Chunking Discussed: A member raised concerns about data chunking with GPT-NeoX for training on the Common Pile v0.1, especially with documents exceeding the context length.
- It was noted that the current preprocessing script might not handle pre-chunking, potentially leading to correlated examples in batches.
Cross-Document Attention Remains On: A member noted that there is no way to turn off cross document attention within the same sequence.
- This can affect performance and training dynamics, particularly with chunked data.
FA3 Support Boosts H100 Performance: It was mentioned that FA3 support brings a noticeable speedup on H100s compared to A100s, but is not currently implemented in NeoX.
- This could influence hardware considerations for training runs.
FP8 Branch Achieves High H100 Throughput: A member reported that they have a branch with added features (definitely FP8 and possibly FA3) that's getting 580 TFLOP/H100/s and 10,312 T/H100/s.
- This indicates significant performance improvements on H100s due to recent optimizations.

Interconnects (Nathan Lambert) ▷ #news (96 messages🔥🔥):

Gemini 2.5 Pro, Claude 3.5 Sonnet, OpenAI Revenue, Midjourney CEO, 4o Image Generation

Gemini 2.5 Steals the Show from Claude 3.5: Gemini 2.5 Pro was judged as the best among all LLMs for 3% of its stories combining the required elements, whereas Claude 3.5 Sonnet, which dominated the list at the beginning of the year with 74%, is now down to 18% according to evaluations by Lech Mazur.
OpenAI Revenue Triples, Gunning for AGI: According to a Bloomberg report, OpenAI expects its revenue to triple this year to $12.7 billion, projecting $125B by 2029, becoming cash flow positive as they chase AGI.
Midjourney CEO Bitter over 4o Image Generation: The Midjourney CEO reportedly said that 4o imagegen is slow and bad, claiming OpenAI is just trying to raise money and it's a meme not a creative tool, further postulating that in one week nobody will talk about it anymore, according to this tweet.
Sparse Autoencoders Don't Change the Game: Research from the GDM mechanistic interpretability team indicates that Sparse Autoencoders (SAEs) don't help probes generalize OOD and aren't a game-changer, as detailed in this post on the Alignment Forum.
Cohere Debuts Command A and R7B Models with New Algorithm: Cohere released a tech report highlighting their novel approach to training Command A and Command R7B models, including the use of self-refinement algorithms and model merging techniques at scale, according to this tweet and linked paper.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (23 messages🔥):

Ghibli model training and copyright, Anthropic referral program, CoT French, OpenAI's image generation policy

Ghibli Model Gets Sued?: A user trained a model on 30-40 Ghibli movies, joking it was obvious they'll get sued due to copyright infringement.
Anthropic Referral Program: A user shared a link to Anthropic's Referral Partner Program Terms.
Research UI revamp and name change: Claude "Compass" was renamed to "Research", along with the recent UI revamp.
OpenAI shifts image generation policy: An OpenAI employee shared their thoughts on setting policy for new image generation in ChatGPT through 4o, moving from blanket refusals to a more precise approach focused on preventing real-world harm.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (43 messages🔥):

GPT-4o Image Generation Rollout, Naming Conventions for Models, Gary Marcus on AI Economics, White House Deletes Ghibli-Style Tweet

GPT-4o's Confusing Image Generation Launch: The rollout of GPT-4o's image generation feature in ChatGPT was criticized for being confusing, despite OpenAI's claim that paying users received it quickly.
- A member noted that the warning for free users using outdated stuff was late, tiny, and easy to miss, referencing Simon Willison's blog post about the launch and its issues.
Model Naming Conventions Confuse Users: Members discussed that having path-dependent names for models is confusing because model names should not require you to know of every other model to understand them.
- One member found QvQ cute but still questioned its practicality, while another proposed naming it QwQ-V to clarify its function.
Gary Marcus Flags Dubious AI Economics: Gary Marcus posted a substack article arguing that the economics of generative AI do not make sense, referencing his previous warnings about technical limits dating back to 2001.
- He pointed out that Nvidia, despite making profits from AI, is facing headwinds, with its stock down 17.75% YTD, though it was pointed out that this YTD number was from 2025.
White House's Dark Ghibli Tweet Pulled: The White House deleted a tweet featuring a Ghibli-style image, which was characterized as dark by a member.
- Reportedly it was a "ghiblified" horrifying detention center photo.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #reads (6 messages):

Alan Turing Institute Crisis, WanTeam's AI Failure Paper, dewey_en_beta embedding model

Alan Turing Institute Faces Crisis: Despite a fresh £100 million funding in 2024, the Alan Turing Institute (ATI) is preparing for mass redundancies and to cut a quarter of its research projects.
- Staff are in open revolt due to potential redundancies.
WanTeam publishes AI Failure Paper: WanTeam has published a paper What AI Failure Looks Like in Computer Vision and Pattern Recognition.
- The authors include Ang Wang, Baole Ai, Bin Wen and others.
dewey_en_beta Embedding Model Unveiled: A new technical report introduces dewey_en_beta, an open-source embedding model excelling on MTEB (Eng, v2) and LongEmbed, supporting 128K token sequences.
- It uses chunk alignment training to generate localized chunk embeddings, and is hosted on HuggingFace.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #expensive-queries (7 messages):

Gemini 2.5, AI Studio, Long Contexts

Gemini 2.5 use wows Users: Users praised Gemini 2.5 after it just worked perfectly with actual long context, as seen in this share link.
- One user exclaimed models have always struggled when dumped actual long context and then followed up with woah.
User Confirms 15K Tokens on AI Studio: A user reported that they observed a 15K token context window.
- When prompted to check on AI Studio, the user confirmed with a yup.

Link mentioned: ‎Gemini - LaTeX Typos and Formatting Issues : Created with Gemini Advanced

Modular (Mojo 🔥) ▷ #mojo (168 messages🔥🔥):

Unit Scaling, SI Units as a Closed Set, Return Type Logic, Conditional Type, Extension Methods

Challenges in resolving unit addition: Kilos Per Second + Meters Per Minute: Discussion around resolving addition of different units, such as kilometers per second and meters per minute, with concerns about the return type and how to correctly scale the values.
- Members noted it's tricky because the scale would have to return the correct thing in those cases, and there's no way to use -> A if cond else B type logic for the function's return type.
Divergence on using C Unions: One member asked about how union lowers into and another member suggested a C union.
- The second member said, Since iirc CUDA has unions in some parts of the API.
Traits Discussion Reveals Nuances: A discussion about extension methods and traits, with a member pointing out that extensions allow adding methods to library types, a feature not directly available with Rust's impl due to orphan rules.
- Another member corrected that Rust's impl can implement library types.
Exploring Implicit Trait Implementations Soundness: Debate on implicit trait implementations, with one member expressing hope they are temporary and stating they make marker traits hazardous to have.
- Other members discussed ways to propagate trait implementations, mentioning the possibility of naming extensions and discussing the soundness tradeoffs.
Tuple Mutability Draws Disbelief: A member noted that assigning to an index inside a tuple is possible and can be done with indexing.
- Another member highlighted that this is a side effect of __getitem__ returning a mutable reference and noted that it should not be the case.

Links mentioned:

HuggingFace ▷ #general (52 messages🔥):

Model Parameters Explained, ComfyUI InfiniteYou, HuggingFace Inference API Pricing, OpenAI 4o dataset

Demystifying Model Parameters: A member clarified that parameters, such as weights and biases, determine how a model processes inputs and generates outputs, with the input layer acting as a 'door'.
- The explanation detailed the flow as Text > Tokenizer > Input Layer > Embedding Layer > Hidden Layers > Output Layer > Tokenizer, highlighting the input layer's role as a lookup table.
ByteDance's InfiniteYou Gets ComfyUI Workflow: ByteDance's InfiniteYou model, designed for flexible photo recrafting while preserving individual identity, has been integrated into ComfyUI, as seen in this Github repo.
- The integration aims to provide a seamless and efficient experience for generating diverse, high-quality images.
HF Inference API Pricing: Free Tier Limits: A member shared HuggingFace's API pricing documentation, which specifies the amount of monthly credits a user gets to run the HF Inference API.
- When the monthly credits are depleted, free users won't be able to query the Inference API anymore, while PRO or Enterprise Hub users will get charged for the requests on top of their subscription.
New OpenAI 4o Dataset Drops: A new dataset for the OpenAI 4o model was released, containing over 200,000 human responses from around 45,000 individual annotators, evaluating preference, coherence, and alignment from this HuggingFace page.
- The data was collected using the Rapidata Python API in less than half a day.

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

LLMs, Transformers, Guidance for Newcomers

Newcomer Seeks Guidance on LLMs and Transformers: A new member introduced themselves and asked for general guidance on LLMs and Transformers.
Community Welcomes Newcomer: Several members welcomed the newcomer and offered assistance with learning about LLMs and Transformers, suggesting resources and offering to answer questions.

HuggingFace ▷ #cool-finds (6 messages):

Windsurf for Vite Frontend, ComfyUI InfiniteYou Integration

Windsurf Rides Vite Wave: A member suggested using Windsurf to build Vite front ends, highlighting its appeal to those less experienced with front-end development, and its potential for connecting Claude to Vite.
- They also shared their method for connecting the open AICUA operator directly to Vite using the Sage maker on point API gateway, as an alternative to Cursor for budget-conscious users who prefer OpenAI models and AWS architecture.
InfiniteYou Infiltrates ComfyUI: ByteDance's InfiniteYou model, designed for flexible photo recrafting while preserving individual identity, has been integrated into the ComfyUI platform via ComfyUI InfiniteYou.
- The integration aims to provide a seamless experience for generating diverse, high-quality images while maintaining unique facial features.

Link mentioned: GitHub - ZenAI-Vietnam/ComfyUI_InfiniteYou: An implementation for InfiniteYou: An implementation for InfiniteYou. Contribute to ZenAI-Vietnam/ComfyUI_InfiniteYou development by creating an account on GitHub.

HuggingFace ▷ #i-made-this (12 messages🔥):

sieves zero-shot NLP pipeline, llama-cpp-connector updates for vision models, HFInheritedModelConfig for custom model building, Morphos web tool

Sieves Simplifies Zero-Shot NLP Pipelines: A member introduced Sieves, a tool for building NLP pipelines using only zero-shot, generative models without training.
- It offers pre-implemented common NLP tasks, ensuring correct output from generative models by leveraging structured output functionality from libraries like Outlines, DSPy, and LangChain.
llama-cpp-connector Unleashes Vision Models in Python: A team released llama-cpp-connector, a Python connector for llama.cpp, enabling easy use of its vision models, like Gemma 3, by Python developers.
- It was created to address the lag in wrappers like llama-cpp-python and the lack of vision model support in llama-server.
HFInheritedModelConfig Enables Custom Model Building: A member shared a custom model builder using HFInheritedModelConfig, designed to create models from pre-existing Hugging Face Hub models while overwriting config parameters and layers.
- This approach streamlines model customization, avoiding the need to patch Hugging Face components directly.
Morphos Offers Web-Based Image Classifier Training: A member introduced Morphos, a web tool for training image classifiers, with support for webcam/upload and real-time preview.
- This tool enables users to train image classifiers directly from a web interface.

Links mentioned:

HuggingFace ▷ #computer-vision (6 messages):

Image Reference Points, Qwen 2.5 VL Models on Kaggle, Memory Errors with Qwen 2.5 VL, Flash Attention 2 for GPU Offloading

Debate Landmarks for Image Reference: A member asked about using reference points in images like signs, bolts, or image metadata to aid in some computer vision task.
- Another member responded that the images are quite varied and that focus distance metadata is unavailable, as they're often from camera phones.
User Battles to Import Qwen 2.5 VL Models: A member initially struggled to import Qwen 2.5 VL models in Kaggle using the latest transformers library (4.50.0.dev0), encountering an import error related to Qwen2_5_VLForConditionalGeneration.
- After further debugging, the member identified and resolved the issue, citing some mismatch with other packages.
Member Runs into Memory Errors: After resolving import issues, the member then inquired about running Qwen 2.5 VL 3b on Kaggle for describing a 10-second video, facing persistent memory errors.
- It was recommended that they try bigger hardware, smaller model, or try flash attention 2 to help help GPU offloading.
Flash Attention 2 Comes to the Rescue!: A member suggested trying Flash Attention 2 to alleviate GPU memory issues when running Qwen 2.5 VL 3b models for video description on Kaggle.
- The original poster expressed gratitude and intent to try the suggested solution.

HuggingFace ▷ #NLP (16 messages🔥):

SetFit v4 release, Reranker models, LLMs generating JSON, Converting PDFs to JSON, Training models on precaution data

SetFit V4 Launch Reranks Sentence Transformers: A member shared a link to the SetFit v4 release blog post on Hugging Face, focusing on training reranker models which is part of Sentence Transformers and discussed its features and capabilities such as Reranker models.
- Unlike SetFit, these reranker models are for classifying pairs of text, often queries and answers, and are commonly used as "rerankers" in a 2-stage retriever-reranker pipeline.
LLMs Can Output JSON in Various Modes: A user asked about using LLMs to generate JSON text, and another user confirmed that it is possible to get a JSON output from LLMs like ChatGPT by providing examples in the prompt or using a JSON mode.
- It was highlighted that while LLMs can generate JSON, they might not be suitable for large data processing directly, like PDFs, and suggested chunking for large documents.
Converting PDFs to JSON with Chunking: A user inquired about converting large PDFs into JSON format for finetuning but another user suggested chunking for large documents.
- It was noted that while direct PDF to JSON conversion might be challenging for large files, one could extract data from the PDF, convert it to CSV, and then to JSON, or use chunking strategies before feeding it to an LLM.
Training Model on Precaution Data: A user was seeking methods to train a model on precaution data, converting it to JSON for finetuning, with the goal of providing precautions based on danger levels and was thinking of converting it into JSON format for finetuning
- An alternative idea was suggested, involving the use of a chatbot data and the CTG data with RAG (Retrieval-Augmented Generation) to fetch the precautions.

Link mentioned: Training and Finetuning Reranker Models with Sentence Transformers v4: no description found

HuggingFace ▷ #smol-course (4 messages):

AI Agents Course, Smol Course

Agents Course Link Provided: A member provided a link to the AI Agents Course.
- The course aims to take participants on a journey, from beginner to expert, in understanding, using, and building AI agents.
Smol Course Link Requested and Provided: A member requested a link to the Smol Course, which was then provided: smol-course.
- The GitHub repo is described as A course on aligning smol models.

Links mentioned:

HuggingFace ▷ #agents-course (16 messages🔥):

Course Unit Release Dates, Hugging Face Token Setup, Gemini vs. You.com, Agent Building Ideas, LLM Evaluator Issues

Course Unit Release Dates Delayed: The release of course units is being shifted, with unit 3 moved to April 1st, prompting questions about a similar delay for unit 4, originally scheduled for April 8th.
- While one member initially thought unit 3 and 4 would both be released next week, it was clarified that the next unit will focus on building an agent, and the final assignment will still be distinct.
HF Token Troubles Fixed: One member encountered an authentication error while running their first agent locally using the Qwen2.5-Coder-32B-Instruct model.
- The issue was resolved by setting the HF_TOKEN using os.environ['HF_TOKEN']="hf_myt0k3n".
Gemini Gains Ground Against You.com: Members discussed the use of Google's Gemini as an alternative to You.com for course-related tasks.
- One member has been using Gemini since Unit 2.1, stating that version 2.0 works perfectly fine, though 2.5 was also recently released; a point was raised that it also has limits.
Need Agent Building Inspiration?: A member who recently finished Unit 2.1 is seeking ideas for an agent to build in order to reinforce their learning.
- They are considering creating a tool for 2D to 3D conversion from spaces but are unsure how to proceed further.
Toxicity LLM Evaluator's Sensitive Side: A member testing the toxicity LLM-as-a-judge in Langfuse found that it incorrectly flagged the prompt "Can eating carrots improve your vision?" as toxic (0.9).
- The reason given was that the hypothetical response contained a dismissive tone towards people who believe in climate change, even though the actual response had nothing to do with the subject; the user employed OpenAI's gpt-4o as the model.

MCP (Glama) ▷ #general (94 messages🔥🔥):

MCP adoption across OpenAI products, MCP impact on businesses and the future, Cloudflare's MCP tooling, Security risks of MCPs from GitHub, MCP server implementation issues with Claude

Sama Signals Support: MCP Coming to OpenAI Products!: OpenAI CEO Sam Altman announced that MCP support is coming to OpenAI products, including the Agents SDK, ChatGPT desktop app, and Responses API.
- Members see this as a significant step towards MCP becoming the backbone for agents performing business-related tasks, akin to the impact of HTTP on the internet.
Cloudflare Cranks Context: Jumps into MCP with Remote Server Tooling: Cloudflare announced support for building and deploying remote MCP servers with tools like workers-oauth-provider and McpAgent.
- This move is considered a big deal, providing developers with resources to build MCP servers more efficiently.
Spec Snafus Surface: Claude Struggles with MCP Prompts and Resources: Users reported issues with Claude Desktop when MCP servers include resources or prompts, causing it to endlessly query for them.
- A member suggested a workaround involving the removal of capabilities to prevent Claude from searching for missing resources and prompts and pushed a new github version with the fix.
Server-Side Savvy: Prompts and Tools Together on MCP Servers?: Members discussed storing tools alongside prompts on the same MCP server to ensure proper tool usage, especially with tools like Stable Diffusion.
- It was suggested that the prompt should directly reference the tool names available on the server to guide agent behavior, such as First call ${tool1.name}, then ${tool2.name}.
ICL Integration Insights: Prompts Pump Up In-Context Learning: The group discussed a way to leverage MCP prompts for in-context learning (ICL), using prompts to encourage specific agent behaviors and guide tool usage, linking to an example where you can say use tool for this prompt.
- It was pointed out that for ICL to work effectively, the MCP client needs to add User/Assistant pairs to the context window as distinct messages rather than just attaching a single JSON object.

Links mentioned:

MCP (Glama) ▷ #showcase (12 messages🔥):

Canvas MCP, Truto's SuperAI, Model Context Protocol (MCP), Gradescope integration, Docker Compose for MCP servers

Canvas MCP Connects College Courses: A member created a Canvas MCP that connects to college courses, enabling automatic querying of resources and upcoming assignments.
- Another member requested Gradescope integration, and the creator responded that they added another agent that can autonomously crawl gradescope to find info try at Canvas MCP.
Truto Launches Agent Toolsets on SuperAI: Truto introduced Agent Toolsets on Truto's SuperAI platform, a dedicated platform for companies building AI products & features.
- The linked LinkedIn post encourages users to like or comment to help the team reach more teams.
Building an MCP Server in Javascript: A member shared Lokka article discussing how to building a Javascript Model Context Protocol (MCP) server for Hacker News.
- The linked article emphasizes that the Model Context Protocol (MCP) aims to fix the messiness of integrating LLMs into real products.
All-In-One Docker Compose for MCP Servers: A member created an All-In-One docker-compose that lets users self host 17 MCP servers easily from Portainer (or wherever).
- The compose grabs the Dockerfiles from public GitHub projects, so as they update it should get the new stuff too.

Links mentioned:

Notebook LM ▷ #announcements (1 messages):

Mind Map public release

Mind Map Goes Public!: The Mind Map feature is now 100% publicly available for all users on Notebook LM.
- The team expressed gratitude for user patience, love, and feedback, and the announcement included a thank you image.
Patience Rewarded, Mind Map Unlocked: After a period of anticipation, the Mind Map function on Notebook LM is now accessible to everyone.
- The launch was celebrated with an expression of thanks for the community's enduring support and insightful contributions.

Notebook LM ▷ #use-cases (8 messages🔥):

Spanish podcasts not working, Sharing Notebooks issues, Company research for cover letters and resumes

Spanish Podcast Creation Halts: A user reports that the ability to generate podcasts in Spanish is no longer functioning in the "customize" settings.
- They requested tips or comments on resolving this issue, indicating a previous capability that has since ceased.
Notebook Sharing Woes Plague Pro Users: A Pro user reports being unable to share Notebooks via link, despite the notebook containing publicly available information from YouTube.
- Potential solutions mentioned include ensuring recipients have active NLM accounts and manually emailing the link.
Students streamline company research: A student developed a system to streamline company research for impactful cover letters and resumes, achieving an 80% rating.
- They uploaded company websites, reports, and news sources to Notebook LM, enabling detailed, referenced answers about the firm and saving time, but generating cover letters yielded generic results with a 10% rating.

Notebook LM ▷ #general (83 messages🔥🔥):

Gemini 2.5 Pro Turkey Test, Gemini Advanced Research Limit, NotebookLM API and Podcast Creation, Mind Map Improvements, Gemini 2.0 Flash Readability

Gemini does Turkey Test: A member tested Gemini 2.5 Pro with a 'Turkey Test', challenging it to write metaphysical poetry about a bird and shared the video with NotebookLM commentary here.
- The user shaped the commentary with Interactive Mode to segue to a satisfying ending, stumbling into new uses for NBLM.
Gemini Advanced has Research Limits: A member inquired about the deep research limit for Gemini Advanced, and another member responded that it is 20 research reports per day.
- The first member considered this pretty dang good compared to chatgpt.
Podcast Creation via API in the works?: A member suggested including podcast creation via the NotebookLM API and noted that they could make some really cool things with that.
- They have a discord bot with a thinking model that allows function calling, and the action bot has been much more consistent with its function usage.
Mind Map Requests: A member requested that the developers improve the new mind map feature that has just been released.
- The member noted that the mind map, while neatly structured, wastes time because it lacks descriptions, and you also have no control with how the mind map is structured or how descriptive it can be.
Readability goes Downhill: A member has seen a fall in readability when using Gemini 2.0 Flash with NotebookLM, although it produces richer and better responses.
- It's just a little bit harder to read as Gemini tend to explain / answer first before explaining why , Gemini first explain the basic and then give the answer in the middle of response or somewhere.

Yannick Kilcher ▷ #general (33 messages🔥):

Sketch-to-Model Pipeline, Alternatives to Kernel Attention (KA), AI Solving Puzzles, ChatGPT and Grok 3 for UX/UI, Information Theory in AI/ML

Sketch-to-Model Pipeline Spurs Discussion: A member proposed a "Sketch-to-Model" process (Sketch --> 2D/2.5D Concept/Image --> 3D Model --> Simulation) and expressed interest in alternatives to Kernel Attention (KA).
- They noted that ChatGPT oddly referred to a concept similar to KAN but different, and cited KAN as being from Google DeepMind, and Grok 3 mentioned that xAI team actively researches about KAN.
AI's Puzzle-Solving Prowess Put to the Test: Members discussed whether AI could solve the puzzle book Maze: Solve the World's Most Challenging Puzzle (Wikipedia link).
- One member suggested training LLMs to solve ARGs and old puzzle games, while another noted some puzzles are intentionally difficult, potentially unsolvable by current reasoning models.
UX/UI Design Falls Flat with AI: Members shared experiences testing ChatGPT and Grok 3 for UX/UI design and architectural plans, with disappointing results.
- One member suggested that for app or architectural layouts they need structured reasoning, and pointed to this paper LayoutGen: Layout Generation with Box-wise Diffusion.
Resume Rustlers: AI's Role in Applicant Pool Pollution: A member asked about the best way to generate fake resumes in volumes to pollute the applicant pool and make slightly better candidates appear more impressive.
- The consensus was that HR sets a baseline regardless, and the information would likely be illegal.
Information Theory Underpins Cherry-Picked AI: A member stated that AI and ML cherry-pick from information theory, primarily using entropy and divergence, but not conditional versions.
- They argued that utilizing more aspects of information theory could lead to better generalization, memory, interpretability, and efficiency, and linked Oxford Mathematics lectures on Information Theory.

Links mentioned:

Yannick Kilcher ▷ #paper-discussion (18 messages🔥):

Discord Timestamps, Chain-of-Bias issues, Paper Discussion Format, Tracing Thoughts in a Language Model, Attribution Graphs

Discord Timestamps Auto-Convert: Discord timestamps automatically display in the viewer's local time, streamlining scheduling as described in Discord Timestamps.
- This eliminates timezone conversion confusion.
Chain-of-Bias Systematically Wrong: A member commented that Chain-of-Bias is only helpful to a point, because when it is "wrong", it is "systematically wrong".
- Therefore, this implies it has limited use.
Paper Discussions take a streamy format: Paper discussions usually involve someone pulling up the paper on stream and reading through it with others contributing experience and knowledge.
- More prepared, higher quality weekly discussions are also available through the discord event system.
Tracing Thoughts in a Language Model dual-paper deep-dive incoming: Members will be analyzing Tracing Thoughts in a Language Model and the associated YouTube video.
- Given the length of both, it is likely to involve several sessions.
Attribution Graphs link-drop: A member shared links to Transformer Circuits attribution graphs methods and Transformer Circuits attribution graphs biology.
- The graphs aim to offer insights into transformer functionality.

Link mentioned: Scaling Laws of Synthetic Data for Language Models: Large language models (LLMs) achieve strong performance across diverse tasks, largely driven by high-quality web data used in pre-training. However, recent studies indicate this data source is rapidly...

Yannick Kilcher ▷ #ml-news (14 messages🔥):

Alan Turing Institute Crisis, GPT-4o Autoregressive Image Generation, Image Token Reusal

Alan Turing Institute Faces Crisis and Cuts: Despite a fresh £100 million funding settlement in 2024, the Alan Turing Institute (ATI) is preparing for mass redundancies and to axe a quarter of its research projects.
- Staff are reportedly in open revolt due to the cuts.
GPT-4o Confirmed as Autoregressive Image Model: GPT-4o is confirmed to be an autoregressive image generation model, detailed in OpenAI's Native Image Generation System Card.
Speculation Surrounds Image Token Reusal in GPT-4o: Speculation arises that GPT-4o reuses image input tokens for image output, suggesting it may output the same format of image tokens used as input.
- It was observed that when asked to exactly reproduce an image, the model introduces small changes, indicating a semantic encoder/decoder instead of pixel-level encoding, with one member stating that OpenAI were saving it until Google releases a good model to take away attention.

Links mentioned:

GPU MODE ▷ #general (2 messages):

Data distribution in DP and TP ranks, TRL handling of data distribution

Data Handling Deep Dive: DP vs. TP Ranks: A member clarified that in distributed processing (DP), each rank receives different data, while in tensor parallelism (TP), all ranks get the same data.
- They suggested that TRL (Transformer Reinforcement Learning) should already manage this distribution automatically.
TRL Framework Manages Data Distribution Automatically: A member pointed out that the Transformer Reinforcement Learning (TRL) framework is likely designed to handle the data distribution across different ranks automatically.
- This ensures efficient training and utilization of resources in distributed environments using DP and TP.

GPU MODE ▷ #triton (8 messages🔥):

Pre/Post Hooks in Triton, num_ctas for Hopper, Local Tensor expansion

Pre/Post Hooks unsupported in Triton Autotune: pre_hook and post_hook are not supported in triton.Autotune or triton.Config due to requiring python code execution at runtime, which Inductor can't support in AOTI.
- A member stated that implementing this support shouldn't be too hard.
Hopper's num_ctas Mystery: It appears that no one is using a num_ctas value higher than 1 for Hopper in Triton, due to crashes or RuntimeError: PassManager::run failed exceptions.
- The root cause of these issues remains unclear.
Local Tensor Expansion Woes: A member is facing issues porting code using torch.Tensor.expand() to Triton due to unsupported tensor index for repeating elements of a local tensor.
- While load() allows repeating indices in the ptr argument, local tensor indexing does not.

GPU MODE ▷ #cuda (4 messages):

Memory Coalescing, CUDA Memory Hierarchy

Memory Coalescing Clarified: A user asked whether coalescing requires data to be stored contiguously, even if the data is not contiguous in registers, but the addresses in global memory are consecutive.
- Another user clarified that in such a scenario, the reads would indeed be coalesced, regardless of where the data is written in shared memory (SMEM) or register memory (RMEM).
CUDA Memory Hierarchy Explained: One user explained the CUDA memory hierarchy to illustrate the benefits of memory coalescing.
- They clarified that it's the movement of data between DRAM and SRAM that is slow, emphasizing the importance of maximizing data transfer efficiency between global memory and shared memory.

GPU MODE ▷ #torch (1 messages):

PyTorch profiler, profiler trace

Troubles locating save calls in PyTorch Profiler: A user is having trouble finding the exact spots where the save function is called in a PyTorch profiler trace.
- They see many detach and copy calls that they believe correspond to it, but then there's a break where there's nothing in any stream or thread.
Need help using PyTorch Profiler: A user needs help finding the exact lines of code, in the profiler trace, where the save function is called in PyTorch profiler.
- There are many detach and copy calls that are believed to be related, but there's a break where there's nothing to see in any stream or thread.

GPU MODE ▷ #jobs (3 messages):

Red Hat, Software Engineer, C++, GPU Kernels, CUDA

Red Hat Seeks GPU-Savvy Engineers: Red Hat is hiring full-time software engineers at various levels with experience in C++, GPU kernels, CUDA, Triton, CUTLASS, PyTorch, and vLLM.
- Interested candidates should email a resume and summary of relevant experience to [email protected], including "GPU Mode" in the subject line.
Red Hat is Hiring: Join Red Hat as a software engineer and work with cutting-edge technologies.
- The role requires experience in C++, GPU kernels, CUDA, and other relevant areas.

GPU MODE ▷ #beginner (1 messages):

Knowledge Distillation for Video Models, Estimating Model Parameters, Estimating Inference Throughput on Consumer GPUs

Distilling Video Models for Consumer GPUs: A member is seeking methods to reduce a 28M parameter video model via knowledge distillation for real-time inference on consumer GPUs.
- The goal is to estimate the resulting model size and frames per second (FPS) performance, taking into account GPU architecture, model FLOPs, frame rate, and number of parameters.
Calculate Video Model Params: The member is seeking resources, such as guides or blog posts, to approximate the number of parameters for consumer GPUs, given a pre-trained 28M parameter video model.
- It remains unclear what formula can be used to calculate this estimate.
Estimating Real-time FPS: The member is seeking insights on how to estimate inference throughput (FPS) for the distilled model on consumer GPUs.
- This includes accounting for factors such as GPU architecture, model FLOPs, frame rate, and the number of parameters. Unfortunately no guide or blog was given on this topic.

GPU MODE ▷ #torchao (5 messages):

Blocksparse, TorchAO, Pull Request #1734, Pull Request #1974

Blocksparse Promotion in TorchAO - Is it needed?: A member questioned the need for a specific code addition related to the blocksparse feature promotion in pytorch/ao PR#1734.
- Another member clarified that the addition was likely accidental and unnecessary for the blocksparse functionality.
Decoding Dynamism gets Fixed: PR #1974 aims to remove dynamic=True for decode.

Links mentioned:

GPU MODE ▷ #off-topic (1 messages):

Hayao Miyazaki on AI art, Studio Ghibli anime filter

Miyazaki's Stance on AI Art resurfaces: A user shared a post on X quoting Hayao Miyazaki's critical views on machine-generated art, founder of Studio Ghibli.
Ghibli Anime filter trending: A user noted a trend of converting personal photos into Studio Ghibli anime style, deeming it "tremendous alpha".
- The user suggested sending these anime-style photos to one's wife.

Link mentioned: Tweet from Nuberodesign (@nuberodesign): Since this utter garbage is trending, we should take a look at what Hayao Miyazaki, the founder of Studio Ghibli, said about machine created art.Quoting Grant Slatton (@GrantSlatton) tremendous alpha ...

GPU MODE ▷ #sparsity-pruning (1 messages):

srns27: gosh I'm so blind thanks man haha

GPU MODE ▷ #gpu模式 (1 messages):

nuttt233: 因为batch gemm中默认前两个维度是batch stride，后两维才是row col

GPU MODE ▷ #general (3 messages):

ComfyUI, CUDA, load_inline, Triton

ComfyUI Installation Help: A member recommended that a user with installation errors seek help on the ComfyUI Discord.
- They suggested trying to install Triton in a new conda environment.
CUDA Syntax Error Fix: A member experienced a SyntaxError: invalid decimal literal error when uploading a .cu file.
- Another member suggested using the load_inline() functionality in pytorch, referencing this example.

Link mentioned: reference-kernels/problems/pmpp/vectoradd_py/solutions/correct/submission_cuda_inline.py at main · gpu-mode/reference-kernels: Reference Kernels for the Leaderboard. Contribute to gpu-mode/reference-kernels development by creating an account on GitHub.

GPU MODE ▷ #submissions (25 messages🔥):

Modal Runners, vectorsum, grayscale

Success on grayscale Leaderboard: A benchmark submission with id 3169 to leaderboard grayscale on GPUS: H100 using Modal runners succeeded!
Success on vectorsum Leaderboard: Multiple test, leaderboard, and benchmark submissions to leaderboard vectorsum on GPUS: T4 and L4 using Modal runners succeeded.

LlamaIndex ▷ #blog (1 messages):

LlamaCloud, MCP Server, Claude Desktop

LlamaCloud Doubles as MCP Server: LlamaCloud can be used as an MCP server, enabling real-time data integration into workflows for any MCP client.
- A video demonstration shows using an existing LlamaCloud index as a data source for an MCP server used by Claude Desktop.
Claude Gets Data from LlamaCloud: Claude Desktop can use an existing LlamaCloud index as a data source for an MCP server.
- This allows up-to-the-second data to be integrated into the Claude workflow, as detailed in this video

LlamaIndex ▷ #general (22 messages🔥):

LlamaExtract Schema Inference, TS Chatbot with Postgres DB, E-commerce Chatbot Architecture, SQL Query Generation Issues, Structured Prediction Bug

LlamaExtract Ditches Schema Inference: The schema inference capability in LlamaExtract, announced last year, has been de-prioritized because most users already have the schema they need, as stated in the LlamaExtract Announcement.
- The feature may return in the future, but other aspects are being prioritized.
TS Chatbot Tussles with Postgres Data: A user inquired about using LlamaIndex with a relational Postgres database and a recommendation was made to use a text-to-SQL application built with LLM objects.
- Converting data to a vector DB was deemed unhelpful due to the nature of relational data, but there is a TS package that may help with workflows.
E-commerce Chatbot Multi-Agent Mayhem: A user considering a multi-agent system with a react agent and functional handoff agents for an e-commerce chatbot received a suggestion to stick to function calling agents if the LLM supports it.
- Websockets were recommended for providing quick replies, but the choice depends on the overall system architecture.
Chatbot struggles with SQL Query Generation: A user building a chatbot that generates SQL queries from user messages reported issues with the bot not picking the appropriate columns, even with column comments in the SQL file.
- No specific solution was provided but it was suggested the user file a bug report to the team.
Structured Prediction Bug causes Dict Field Fail: A bug was reported where structured prediction fails with Dict fields, potentially due to limitations in JSON schema or OpenAI's support for annotations, as noted in Issue #18298.
- Workarounds include giving the field a description and setting the type to Any, or defining another Pydantic model to describe the dict field.

Links mentioned:

LlamaIndex ▷ #ai-discussion (13 messages🔥):

PDF Parsing Tools, LlamaParse and Image Reading, LLMs for Image Captioning, Hybrid Chunking, OCR for Scanned Documents

LlamaParse is indeed the answer for PDF parsing: Users discussed using LlamaParse as the best parsing tool to parse PDFs, with one user confirming that it reads images fine after proper configuration and offered to test an example file.
- It was noted that the PDF in question, pizza.pdf, contained no actual text, just an image.
LLMs can caption images for RAG: A member suggested using an LLM to read and caption an image for a RAG application, to answer questions from uploaded PDFs.
- Another member enquired about Hybrid Chunking and OCR for scanned documents like handwritten mathematics homework.

Latent Space ▷ #ai-general-chat (25 messages🔥):

Nvidia Acquires Lepton AI, Model Context Protocol, Replit Agent v2, GPT-4o Update, OpenAI Image Generation Policy

Nvidia Gobbles Up Lepton AI: Nvidia acquired inference provider Lepton AI for several hundred million dollars to enhance its software offerings for GPU utilization, as reported in The Information.
OpenAI's Agents Speak MCP: The Model Context Protocol (MCP) now connects to OpenAI Agents SDK, allowing the use of various MCP servers to provide tools to Agents.
- MCP is described as a USB-C port for AI applications, standardizing how applications provide context to LLMs.
Replit Agent v2 Gets Autonomy Boost: Replit Agent v2, in early access with Anthropic’s Claude 3.7 Sonnet, features increased autonomy, forming hypotheses and searching for files before making changes, detailed in the Replit blog.
- It is now more autonomous and less likely to get stuck on the same bug.
GPT-4o gets a New New Update: The latest ChatGPT-4o update (2025-03-26) has jumped to #2 on the Arena, surpassing GPT-4.5, with significant improvements and is tied for #1 in Coding and Hard Prompts, according to Arena leaderboard.
- It now features better at following detailed instructions, especially prompts containing multiple requests and improved intuition and creativity.
OpenAI Loosens Up Image Gen Policy: OpenAI shifts from blanket refusals to preventing real-world harm in image generation, aiming to maximize creative freedom while preventing real harm, described by Joanne Jang in her blog post.

Links mentioned:

Torchtune ▷ #general (4 messages):

FP8 QAT, Optimizer State with Fake Quant

FP8 QAT on the Horizon: A member is looking at making their model more fp8 friendly and doing a pure QAT run on top of the "cold" trained model, and has spotted this issue on the PyTorch/AO repo.
- FP8 QAT is something they are looking at, but haven't had the bandwidth to do it yet.
Fake Quant doesn't alter Optimizer State: A member confirmed that enabling fake quant should not result in any changes to the optimizer state.

Link mentioned: FP8 QAT / FP8 block-wise quantization · Issue #1632 · pytorch/ao: Having QAT for FP8 would be a great addition, and FP8-blockwise quantization in general.

Torchtune ▷ #dev (18 messages🔥):

Deprecated code deletion, Linter installation issues, Anthropic using TensorFlow, GRPO PRs, JoeI sora

Deprecated Code looking for Executioner: A member requested assistance with PR #2533 to delete deprecated code, citing inability to install the linter on their work laptop.
- The PR involves full train_on_input deprecation and the removal of other deprecated components.
Anthropic Jumps Ship and Uses TensorFlow: A member pointed out that Anthropic is allegedly using TensorFlow, leading to speculation that PyTorch might be banned there.
- Another member responded with smh, suggesting there's not much to say beyond that.
JoeI SORA takes over the World: A member posted a screenshot of JoeI SORA, presumably an AI model, in an unknown context, responding to another member asking about the intuition behind a certain model.
- The member simply responded that there is no intuition, just JoeI and proceeded to show off the screenshot.
GRPO PRs Plea for Processing: A member highlighted two GRPO PRs (#2422 and #2425), noting that #2425 is a bug fix and should be merged soon.
- Another member responded immediately, confirming they were on it.

Link mentioned: Full train_on_input deprecation, removing other deprecated components by RdoubleA · Pull Request #2533 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)ChangelogWhat are the changes made in this PR?Use mas...

Cohere ▷ #「💬」general (12 messages🔥):

Vector Database Options, Hosting Vector DB Online, AI Agent Pricing, Cohere at QCon London

Vector DB Options Explored: A member inquired about the vector databases used, mentioning their experience with Chroma and seeking information on other options and their common usage.
- In response, another member provided a link to Cohere's integrations page showcasing supported vector databases like Elasticsearch, MongoDB, Redis, Haystack, Open Search, Vespa, Chroma, Qdrant, Weaviate, Pinecone, and Milvus.
Hosting Vector DBs Online: A member asked about hosting vector DBs online, specifically if they're synced to storage buckets and loaded each time or if there's an alternative approach.
- In response, another member provided a link to Cohere's integrations page that showcase supported vector databases, implying they handle hosting concerns.
AI Agent Pricing Explored: A member is researching how founders are pricing and monetizing AI agents and invited others to chat and validate insights.
- Another member replied encouraging to share more with us about AI Agent pricing
Cohere potentially at QCon London again?: A member asked if Cohere will be at QCon London this year after attending last year.
- They expressed interest in discussing access to North with a Cohere representative.

Link mentioned: Integrating Embedding Models with Other Tools — Cohere: Learn how to integrate Cohere embeddings with open-source vector search engines for enhanced applications.

Cohere ▷ #「🤝」introductions (2 messages):

Refugee Organization, Peacebuilding, Livelihood Opportunities

Refugee Advocate Introduces Organization: A refugee in Kenya introduced themself as the leader of Pro-Right for Refugees, a Community Based Organization (CBO) in Kakuma Refugee and Kalobeyei Settlement.
- The organization's vision is to promote refugee access to livelihood opportunities and enhance peaceful living, with a mission that every refugee should have a right to access a livelihood in a peaceful environment.
Pro-Right for Refugees Focuses on Peace and Prosperity: The CBO Pro-Right for Refugees focuses on peacebuilding, awareness raising, and livelihood initiatives.
- They welcome volunteers and anyone interested in supporting refugees in the camp, with a motto of “Peaceful lives, thriving lives.”

tinygrad (George Hotz) ▷ #general (12 messages🔥):

Budget AI Rig, AX650N NPU, Tinygrad PRs

Assemble Budget AI Rig: One member inquired about the possibility of building an AI rig for 7-8k yuan using older X99 components, Xeons, and 32GB ECC DDR4 RAM available on Taobao.
- Another member confirmed its feasibility after checking.
AX650N Boasts 72 TOPS: A member shared a link to the AX650N product page, highlighting its 72Tops@int4, 18.0Tops@int8 NPU and native support for Transformer intelligent processing platform.
- The AX650N features an Octa-core A55 CPU, supports 8K video encoding/decoding, and provides dual HDMI 2.0 outputs.
AX650N Performance Investigated: A member shared a link to a blog post that reverse-engineered the AX650N, noting it delivers 72.0 TOPS@INT4 and 18.0 TOPS@INT8.
- The blog post mentions an ongoing effort to port smaller Transformer models to showcase its capabilities, with a GitHub repo available.
Tinygrad PRs: Two Tinygrad pull requests, PR #9546 and PR #9554 were shared.
- The first PR is a potential fix for recursion error in test_failure_53 and the second PR aims to continue moving functions off of CPU in torch backend.

Links mentioned:

tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

TinyGrad Code Generation, Codegen Translators

TinyGrad Code Generation Clarified: A user sought clarification on TinyGrad's code generation process after encountering outdated information mentioning CStyleCodegen and CUDACodegen classes.
- The user aimed to understand where the translation from optimized plans to low-level C++/CUDA code actually occurs.
Codegen Translators in TinyGrad: The discussion centered on understanding how TinyGrad translates optimized plans into machine-executable code for different devices (CPU/GPU).
- The outdated information suggested the existence of specific translator classes like CStyleCodegen and CUDACodegen, prompting the user to inquire about the current implementation.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (7 messages):

sharing lecture recordings, mentorship deadline extension, mentorship for Entre track

Lecture Recording Sharing Sanctioned: A member inquired about sharing lecture recordings, and a moderator confirmed it's absolutely no problem, encouraging new MOOC participants to sign up.
Mentorship Deadline Extension Considered: A member requested an extension for the mentorship application deadline; the moderator replied that the form won't close immediately, allowing for submissions, but consideration post-deadline isn't guaranteed due to high interest and the need to start projects soon.
Entre Track Mentorship Missing: A member asked about mentorship opportunities for the Entre track, and the moderator clarified that Berkeley doesn't offer any, but there will be office hours with sponsors in Apr/May.

DSPy ▷ #papers (3 messages):

Atom of Thoughts (AOT), Tree of Thoughts (ToT), Markovian Reasoning, Two-phase Transition, Atomic Granularity & Dependencies

AOT vs ToT: The poster distinguishes Atom of Thoughts (AOT) from Tree of Thoughts (ToT) by detailing that AOT reasoning steps are memoryless whereas ToT maintains the entire tree history, furthermore AOT has explicit decompose-then-contract phases targeting atomic, indivisible subquestions whereas ToT explores branching thoughts without explicit contraction.
- AOT also forces decomposition into atomic subquestions structured as a directed acyclic graph (DAG) whereas ToT allows varying granularity without enforced dependencies.
Evaluation Datasets Suitability: Ideal evaluation datasets include GSM8K and MATH (datasets with step-by-step solutions), HotpotQA and 2WikiMultihopQA (annotated reasoning paths), and Datasets explicitly detailing intermediate reasoning steps.
- The poster included examples such as mock_llm_client.generate.side_effect = ["0.9", "42"].
Decomposition Strategy with LLMDecomposer: AOT uses flexible decomposition via LLMDecomposer, where prompts adapt by question type (MATH, MULTI_HOP), supports custom decomposers and dynamic prompt selection, and ensures atomicity through a contraction validation phase.
- Example decomposition prompt includes QuestionType.MATH: Break down this mathematical question into smaller, logically connected subquestions: Question: {question}.
AOT integrates with DSPy: AOT integrates smoothly into DSPy workflows, enhancing reasoning by fitting DSPy's modular design, synergizes with DSPy optimizers (like MIPROv2), complements DSPy's inference strategies, allowing efficient scaling, and complements DSPy's dynamic routing, intelligently adapting reasoning paths.
- The poster confirms that they have a working implementation of this ready to go.

DSPy ▷ #general (1 messages):

MiproV2 Issues, ValueError in DSPy

MiproV2 Faces ValueError: A member encountered a ValueError while using MiproV2, specifically related to mismatched keys in signature.output_fields.
- The error message indicates that the expected keys were dict_keys(['proposed_instruction']), but the actual keys received were dict_keys([]).
Debugging MiproV2 Key Mismatch: The user experiencing the ValueError with MiproV2 seeks assistance in resolving the key mismatch issue.
- Similar issues were reportedly encountered with Copro on GitHub, potentially related to max_tokens settings, though the user suspects that's not the root cause in this instance.

Codeium (Windsurf) ▷ #announcements (2 messages):

Gemini 2.5 Pro, Windsurf credits, Rate limiting

Windsurf Waves with Gemini 2.5 Pro Release!: Gemini 2.5 Pro is now available in Windsurf, granting users 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call, announced on X.
Windsurf Wiped Out by Gemini 2.5 Pro Popularity!: Windsurf is already experiencing rate limiting issues with Gemini 2.5, citing massive load for the model + provider.
- The team is actively working to increase quota, expressing regret for any inconvenience caused.

Link mentioned: Tweet from Windsurf (@windsurf_ai): Gemini 2.5 Pro is now available in Windsurf! ✨

Nomic.ai (GPT4All) ▷ #general (1 messages):

GPT4All issues, Model import problems, User experience frustrations

GPT4All Users Report Model Import Issues: Users are reporting difficulties importing models into GPT4All, with the system seemingly unresponsive.
- Further issues include inability to search the model list, missing model size information during selection, lack of LaTeX support, and a non-user-friendly model list order.
GPT4All Users Express Frustration with User Experience: Users are expressing frustration with the GPT4All user experience, citing issues such as missing embedder choice options.
- A user stated that you are loosing users ... cause others much more user-friendly and willing to be open.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}