AIE is all you need.

AI News for 6/5/2025-6/6/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (218 channels, and 7848 messages) for you. Estimated reading time saved (at 200wpm): 636 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

In a busy day of nonAI events, we still managed to pull out the AI news. At the second day of AIE, Logan Kilpatrick knew how to please the crowd: Release a new Gemini:

The Twitter recap below does a pretty good job, and you can watch the full stream here, including a well reviewed live demo from Solomon Hykes and an insane trade deal from Christian Szegedy.


AI Twitter Recap

New AI Models & Benchmark Results

AI Engineer World’s Fair

New Tools, Libraries, and Features

Industry & Platform Dynamics

  • Platform risk and ā€œSherlockingā€: A major topic of conversation was the risk startups face when building on top of large AI platforms. The discussion was sparked by news of Granola getting Sherlocked by OpenAI and reports that Anthropic had completely cut access to their latest models for the startup Windsurf. This led to an analysis of how AI platform dynamics differ from traditional OS platforms, with one user noting AI companies have little incentive to avoid competing with developers.
  • Data privacy and regulation: A court order now mandates that OpenAI must preserve all ChatGPT logs, including ā€œtemporary chatsā€ and API requests that would have otherwise been deleted. In response to data demands from The New York Times, OpenAI published a post outlining how they are protecting user privacy.
  • The AI ā€œWarsā€: The competitive dynamic between OpenAI, Google, and Anthropic was a recurring theme. One user noted the contrast between OpenAI’s user base and Google’s compute advantage, suggesting Google can’t just ā€œreinvent the training recipe anthropic has found.ā€ After Google’s Veo 3 video model release, another user expects the panic is now reversed after the initial panic at Google following OpenAI’s SORA.
  • The cost of intelligence is falling: Multiple users remarked that LLMs are becoming ā€œridiculously cheap.ā€ One user who advises a company processing insurance documents stated that feeding an entire policy into Gemini costs only about $0.01. Another calculated that a hypothetical plan to bankrupt OpenAI with storage costs would fail because cloud storage is ā€œliterally chump change.ā€

Technical Concepts & Research

  • Mixture-of-Transformers (MoT) and Heterogeneous Training: A detailed thread explained Mixture-of-Transformers (MoT), an architecture that uses fully-decoupled transformers for different modalities. This design allows for modality-specific training while maintaining an autoregressive LLM. The thread connected this to recent models like BAGEL and Mogao, which have shown that splitting transformer parameters by function (understanding vs. generation) is effective.
  • Meta-Learning (Learning to Learn): A post from The Turing Post defined meta-learning as the process where a model is trained to quickly adapt to new tasks from few examples. The process involves a base-learner that adapts to a specific task and a meta-learner that updates the base-learner’s strategy to improve its general problem-solving ability.
  • Reasoning Models: The challenges of creating robust reasoning models were discussed. A paper from Meta and colleagues presented self-challenging LLM agents as a path toward self-improving AI. Another paper found that for a single problem, Supervised Fine-tuning (SFT) can achieve similar gains as Reinforcement Learning (RL), suggesting that much of the gain from RL might come from simply seeing a problem instance again.
  • Human-AI Relationships and Evals: OpenAI’s Joanne Jang posted a long-form piece on human-AI relationships, stating that their goal is to build tools, not creatures, and that how people feel about AI is an increasingly important topic. In a related vein, another paper found an ā€œuncanny valley effectā€ where users dislike LLMs that are too human-like.
  • Robotics: The first robotics action model (VLA) named BB-ACT, a 3.1B parameter model, was made publicly available via API. In other news, Amazon is testing humanoid delivery bots, and Hugging Face released a robotics AI model so efficient it can run on a MacBook.

Humor & Memes


AI Reddit Recap

/r/LocalLlama Recap

1. 8B Parameter Model Head-to-Head: DeepSeek R1-0528-Qwen3-8B and Qwen3 8B

  • DeepSeek’s new R1-0528-Qwen3-8B is the most intelligent 8B parameter model yet, but not by much: Alibaba’s own Qwen3 8B is just one point behind (Score: 104, Comments: 30): DeepSeek’s new R1-0528-Qwen3-8B 8B parameter model reportedly achieves top scores among 8B models, marginally outscoring Alibaba’s Qwen3 8B by one point on the ā€œIntelligence Indexā€ benchmark aggregated by Artificial Analysis. However, the main technical critique is that these benchmark results are potentially unreliable due to possible benchmark saturation and overfitting: models are often trained on the very benchmarks used to test them, leading to inflated scores (e.g., base Qwen 2.5 32B non-Instruct generating benchmark-like Q&A when prompted). The post provides comparative scores for math reasoning and code (LiveCodeBench), showing relatively narrow margins between top-performing 8B and 14B models, and questioning the real-world applicability, given industry preference for models like Claude 3.7/4 in production tools. Top commenters highlight that Artificial Analysis’s rankings rely heavily on outdated, overused benchmarks, lending themselves to benchmark overfitting instead of reflecting genuine advancements. Nonetheless, some users note practical differences, such as DeepSeek R1 8B excelling in coding and math, while Qwen 8B offers superior multilingual performance. There is general skepticism about benchmark-based leaderboards due to inconsistent real-world alignment and possible data contamination.
    • Several commenters criticize the benchmarks used by ArtificialAnalysis for evaluating models like DeepSeek R1-0528-Qwen3-8B and Qwen3 8B, noting that these rely heavily on old and overexposed datasets (e.g., MMLU, SciCode), leading to benchmark overfitting rather than genuine ability. One cites that when exposing non-instruct models to random prompts, they often regurgitate benchmark-style questions and answers, arguing this indicates models are trained to optimize for benchmarks, not general capabilities.
    • A breakdown of math and coding reasoning scores shows DeepSeek R1-0528 leading with 94, Qwen3 14B (reasoning) at 86, Qwen3 8B (reasoning) at 83, and Claude 3.7 Sonnet at 72, but these results are questioned for face validity—some find it implausible that Qwen3 8B can outperform larger or more established models like Claude Sonnet 3.7 by wide margins in these domains, raising questions about evaluation methodology and result trustworthiness.
    • User experience reports suggest that, on private tests, Destill R1 8B outperforms Qwen 8B in coding, math, and reasoning, while Qwen 8B feels more natural in writing and multilingual tasks. This highlights that real-world utility may diverge from reported benchmark rankings, particularly for non-English use cases and natural language generation quality.
  • New embedding model ā€œQwen3-Embedding-0.6B-GGUFā€ just dropped. (Score: 403, Comments: 88): The post announces the release of the Qwen3-Embedding-0.6B-GGUF model, an embedding-focused language model from the Qwen team, also highlighting concurrent releases of related embedding and reranking models (Hugging Face Qwen3 Embedding Collection). Technical discussion centers on the difference between specialized embedding models and standard language models: while standard models can output text vectors as part of their token representations, dedicated embedding models are explicitly optimized for text representation tasks such as semantic similarity, usually by training on contrastive objectives or large-scale retrieval benchmarks, which regular generative models typically are not. There is also debate around the interoperability and universality of embeddings across architectures and vintages, with concerns that embeddings may not be universally compatible due to differences in model families, training data, and objectives. One technical debate addresses if general LLMs can serve as embedding models in practical applications (e.g., via AnythingLLM with Ollama), as they often seem to function but may not achieve the same quality or purpose as dedicated embedding models. Another point questions embedding universality—whether vectors produced by different families, objectives, or model vintages are interchangeable or ā€˜universal,’ with skepticism about true cross-model compatibility.
    • A technically detailed question is raised about embedding models versus general language models: while general models produce intermediate hidden representations (vectors), embedding models are specifically trained to generate vectors suited for similarity search, semantic retrieval, or ranking tasks. The commenter also questions cross-model embedding compatibility, highlighting that embedding vector spaces are typically not universal; vectors from differently trained models (different architectures, training data, or objectives) are not inherently compatible, making interoperability challenging unless standardized or aligned by subsequent processing.
    • The Qwen team has released a collection of specialized embedding and reranking models, including different model formats such as safetensors and GGUF, which broadens deployment options and hardware compatibility. Direct links are provided to Qwen’s HuggingFace embedding models collection and the reranker models collection, providing centralized access for benchmarking and technical evaluation.
    • There’s technical interest in the reranker models, particularly for multilingual Semantic Textual Similarity (STS) tasks. The commenter notes a gap in existing reranker models for robust multilingual STS, indicating a potential area where Qwen’s newly released rerankers might offer improved performance or fill an unmet technical need.

2. Recent Open LLM and Embedding Model Releases (OpenThinker3, Qwen3-Embedding, BAIDU on HuggingFace)

  • BAIDU joined huggingface (Score: 184, Comments: 14): Baidu has joined Hugging Face, potentially signaling increased open source access to Baidu’s Ernie models, which have previously been notable in the Chinese AI ecosystem. Community inquiry focuses on whether Baidu will release open source versions of its Ernie models, and if there are available benchmarks for the ā€˜Ernie thinking model.’ Commenters express skepticism about the quality and openness of Baidu’s contributions given its reputation, and there is technical interest in metrics and performance of Ernie models compared to other language models.
    • There is curiosity about the release of open-source ERNIE models by Baidu and how these might compare, performance-wise, to existing openly available language models.
    • A technical question is raised regarding the availability of benchmarks for Baidu’s ā€˜Ernie thinking model,’ indicating community interest in objective evaluations and comparison with established models like BERT.
    • Skepticism is expressed that Baidu may be withholding its most advanced technologies from the public releases, suggesting the models on HuggingFace may represent only a subset of their cutting-edge capabilities.
  • OpenThinker3 released (Score: 112, Comments: 8): OpenThinker3-7B, an open-source LLM, was released on Hugging Face in both standard and GGUF formats (model card, GGUF link), with a larger OpenThinker3-32B model planned. Preliminary dataset inspection mentions a mix of technical and lighter content. Community discussion notes that Deepseek-0528-Qwen3-8B significantly outperforms OpenThinker3-7B on benchmarks, raising questions about relative capabilities. Technical debate centers on compute accessibility, with questions raised about how academic and non-profit groups secure resources (e.g., 512 A100s) similar to industry, and whether university-affiliated GPU clusters could be leveraged for large-scale pretraining outside of major tech firms.
    • One commenter notes that Deepseek-0528-Qwen3-8B achieves significantly higher benchmark scores compared to OpenThinker3, suggesting a performance gap in evaluation metrics between these models.
    • A question is raised about the logistics and funding of large-scale model training, specifically referencing the use of ā€˜512 A100 instances’. There is a discussion about whether US universities have comparable GPU resources as big tech, and how research grants or in-house accelerator programs manage access to such high-end hardware, also touching on the possibility of leveraging this infrastructure to spin up commercial startups before external investment.
    • Criticism is made regarding OpenThinker3’s benchmarking, noting that the released results mainly compare it to outdated models which limits its relevance in gauging performance against the latest LLMs.

3. Benchmarks and Experiments: Sparse Transformers and LLM Town of Salem Tournament

  • I organized a 100-game Town of Salem competition featuring best models as players. Game logs are available too. (Score: 113, Comments: 30): A 100-game simulation of Town of Salem was run with various leading LLMs (DeepSeek, distilled Qwen, Claude, Grok, GPT-4.1, Gemini) acting as players, focusing on capabilities like contextual reasoning, deception, and multi-agent strategy under information asymmetry. Results show DeepSeek and Qwen (even in distilled form) outperforming Claude and Grok; GPT-4.1 also performs strongly, while Gemini models are generally average except as peasants. Detailed logs, evaluation methodology, and scripts are available in the GitHub repo (https://github.com/summersonnn/Town-Of-Salem-with-LLMs). Discussion in comments humorously praises the setup as an ideal SOTA benchmark for agentic LLMs, though not much deep technical debate is present; participants highlight interest in diverse LLM agent benchmarking for reasoning and game-theory-heavy tasks.
    • One commenter observed an apparent correlation in the Town of Salem benchmark: more aligned language models tend to have better performance on ā€˜peasant’ roles, while less restricted (unfiltered) models have higher win rates as ā€˜vampires.’ This suggests that alignment or safety fine-tuning on LLMs could impact their behavior and success in adversarial or deceptive game settings, hinting at subtle trade-offs between safe model alignment and emergent capabilities in complex tasks.
  • Sparse Transformers: Run 2x faster LLM with 30% lesser memory (Score: 398, Comments: 63): NimbleEdge released fused operator kernels for structured contextual sparsity in transformers, inspired by LLM in a Flash (Apple) and Deja Vu (Zichang et al). Their sparse feed-forward implementation avoids idle computation, yielding up to 5X faster MLP layer inference and 50% reduction in memory; end-to-end benchmarks with Llama 3.2 3B report 1.51X lower TTFT, ~1.78X faster throughput, and 26.4% less memory use compared to standard HuggingFace Llama 3.2 3B implementations (repo). Planned updates include support for int8, CUDA, and sparse attention kernels. Commenters ask about potential quality degradation from sparsity, comparisons with quantization, and plug-and-play applicability of the method to other models or gguf formats (e.g., for llama.cpp); the post does not provide direct answers, highlighting a need for empirical quality benchmarks and broader ecosystem support.
    • Multiple commenters raise concerns about quality degradation when using Sparse Transformers, noting that while these architectures promise speed and memory improvements (e.g., ā€˜2x faster LLM with 30% lesser memory’), details on actual model performance trade-offs versus dense or quantized models are not sufficiently discussed. Benchmarks against quantization and ablation studies would be critical for technical adoption.
    • Users inquire about compatibility with implementation stacks like llama.cpp and GGUF, especially questioning whether sparse representations will yield similar performance gains—or be efficiently supported—on consumer GPUs such as NVIDIA’s 30-series, as direct API or architecture support for switching between full and sparse often isn’t ā€œplug-and-play.ā€
    • There’s interest in whether Sparse Transformers can be further optimized via quantization, probing if the methods are mutually exclusive or composable. Technical readers would benefit from documentation or benchmarks demonstrating what happens when both compression approaches (sparsity + quantization) are applied together.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Gemini 2.5 Pro Release, Benchmarks, and Comparisons

  • Gemini 2.5 Pro latest update is now in preview. (Score: 673, Comments: 188): The image presents benchmark results for the newly-previewed Gemini 2.5 Pro, showing its performance relative to OpenAI 03, Claude Opus 4, and DeepSeek R1 across ā€˜Reasoning & Knowledge’, ā€˜Science’, and ā€˜Coding’ tasks. Gemini 2.5 Pro leads in the ā€˜Science’ category with an 86.4% score, and demonstrates competitive or superior results in the other categories as well, visually emphasizing Google’s recent advancements in model capability. Source: Sundar Pichai on X | Image. Commenters debate the practical value for developers, with some anticipating improved software engineering workflows from Gemini 2.5 Pro. There is discussion about competitive features, particularly in coding applications, with a claim that although Opus 4 may lag in general, it excels in specific coding benchmarks like swebench.
    • A commenter references the recent claim that Gemini 2.5 Pro achieved an 86.2% score on the Aider Polyglot benchmark (see source), raising questions regarding the model’s true performance in code generation and multilingual tasks.
    • Another user highlights the strengths of Claude Opus 4 on the Swebench benchmark, particularly its ability to handle code-related tasks, suggesting that even if Gemini 2.5 Pro has superior general language capabilities, Claude’s agentic/autonomous coding features may still outperform Gemini’s current implementation for complex programming workflows.
    • There is mention of the ā€˜0325’ release, implying that Gemini 2.5 Pro feels reminiscent of this previous highly-regarded Google model version, which may have been known for stability or certain performance traits. This signals ongoing user expectation for consistency and reliability in new Google model releases.
  • Gemini 2.5 Pro 06-05 Full Benchmark Table (Score: 378, Comments: 119): The image shows a comprehensive benchmark table comparing the recently released Gemini 2.5 Pro 06-05 model against well-known models such as OpenAI’s O3 and O4-Mini, Anthropic’s Claude Opus 4, Grok 3 Beta, and DeepSeek R1 across a range of tasks. Key performance categories include reasoning, science, mathematics, code generation, code editing, agentic coding, factuality, visual reasoning, image/video understanding, long context handling, and multilingual abilities, with associated scores or percentages for each. Pricing metrics (input/output costs) and the methodology for these evaluations are also included, providing insight into Gemini 2.5 Pro’s competitive positioning. Commenters note the rapid pace of Google’s model updates and speculate this cadence allows Google to quickly match or surpass competing releases, enabled by their control of the full infrastructure stack. There is also a reference to saving high-performing benchmarks (Aider 88% in agentic coding) as a precaution against newer competitive models.
    • One comment highlights discrepancies and potential errors in the reported benchmarks, specifically noting that Gemini 2.5 Pro 06-05’s swebench verified scores are lower than its predecessor (05-06 version at 63.3%, while the new one is worse). There’s also skepticism about the accuracy of listed o3 scores—it’s reported as 49.4% in the table, whereas real benchmarks elsewhere show o3 at 69.1%. This raises concerns about benchmark methodology or possible reporting mistakes.
    • Another discussion thread compares the cost-performance ratio between Gemini 2.5 Pro and o3, stating that Gemini achieves similar or better scores at roughly one-quarter the cost, suggesting Google’s infrastructure allows for rapid and inexpensive iteration and deployment of models.
    • A technical observation notes how Google’s ownership of its own infrastructure allows it to quickly iterate on, update, and release new models—potentially as responses to competitor announcements—creating a rapid development and benchmarking cycle.
  • Uh… which is which? (Score: 303, Comments: 68): The image highlights versioning confusion in the naming of two Gemini LLM preview models from Google: gemini-2.5-pro-preview-06-05 and gemini-2.5-pro-preview-05-06. The ambiguity arises over whether the suffixes indicate dates in MM-DD or DD-MM format, as both are valid dates in June. This reflects broader frustrations with inconsistent and regionally ambiguous versioning schemes in AI model releases, where best practices like semantic versioning (e.g., 1.0.0) are suggested but not followed. Comments express exasperation at the confusing and non-standard model versioning, suggesting semantic versioning for clarity, and joke about the lack of regional sensitivity (US vs EU date formats) in model naming.
    • A commenter highlights frustration over the lack of consistent and standardized versioning or naming conventions among LLM models, suggesting an adherence to semantic versioning (e.g., 1.0.0, 1.0.1, 1.1.0, 2.0.0) would help with clarity and communication among developers and users. The absence of such a standard can lead to confusion during updates and comparisons between LLM releases.
  • New Gemini 2.5 Pro beats Claude Opus 4 in webdev arena (Score: 230, Comments: 84): The image shows a ranking table from the Chatbot Arena, where ā€œGemini-2.5-Pro-Preview-06-05ā€ achieves the top score of 1443 with 1,872 votes, overtaking ā€œClaude Opus 4,ā€ which has a score of 1412 based on 2,466 votes. The results reflect comparative user scores for different AI models, specifically in a web development context. Commenters debate the practical significance of such benchmarks, noting skepticism about recent benchmarking validity and expressing preference for Claude’s coding tools, especially for local development. Another comment highlights concerns over model evaluation practices—some models may benefit from ā€˜leaderboard illusion’ tactics, where only the best private variants are made public after underperforming versions are discarded, potentially skewing public perception of model performance.
    • Several commenters note the limitations of current benchmark methodologies. The so-called ā€œwebdevā€ benchmark, intended to capture real-world coding capabilities, is critiqued for vagueness: there’s debate about what it actually measures—prompt adherence, code creativity, optimization, or responsiveness to sparse prompts. This leads to skepticism about whether such metrics provide actionable insight into functional model performance beyond a general capability barometer.
    • There are observed differences in code-oriented agent tools: While Gemini 2.5 Pro may perform better on certain quantitative benchmarks, users mention that ā€œClaude Codeā€ remains more useful for practical agent-based coding tasks. Notably, the lack of a Google command-line equivalent (despite the introduction of Jules) hinders localized developer workflows compared to Claude’s offerings.
    • A critical point is raised about the ā€œleaderboard illusionā€: Companies (including Google, Meta, and OpenAI) are said to run private versions of their models on public leaderboards like Chatbot Arena, only releasing the top-scoring variant. This practice can distort the perceived progress and reliability of reported rankings, as summarized in a recent academic paper and covered by Computerworld.

2. ElevenLabs v3 Expressive Text-to-Speech Breakthroughs

  • Introducing Eleven v3 (alpha) - the most expressive Text to Speech model ever. (Score: 1932, Comments: 339): Eleven v3 (alpha) is presented as a state-of-the-art, expressive text-to-speech (TTS) model, with demo samples demonstrating speech output that is ā€˜almost indistinguishable from real speech.’ The upgraded model appears to improve on both naturalness and emotional expressiveness, targeting applications such as audiobook narration and in-game dialogue. User feedback implies the generation quality surpasses previous TTS baselines, suggesting significant gains in both prosody and intonation. Comments highlight the potential disruptive impact on voice-dependent industries, predicting automation of roles in audiobook narration and video game voice acting, and emphasizing a breakthrough in indistinguishable speech synthesis.
    • Several comments highlight the technical leap in expressiveness and realism of Eleven v3 (alpha), with users noting that output is becoming ā€œalmost indistinguishable from real speech,ā€ indicating a major advancement in text-to-speech model performance compared to previous versions.
    • The discussion points out potential transformative applications, especially in video game voice acting, suggesting that the model’s ability to generate highly realistic, emotionally nuanced synthetic voices could alter workflows and potentially replace traditional voice actors in certain scenarios.
  • This Eleven v3 clip posted by an ElevenLabs employee is just insane, how can TTS be this good already? (This is 100% AI in case it wasn’t clear) (Score: 367, Comments: 35): An ElevenLabs employee showcased a text-to-speech (TTS) audio clip generated with ElevenLabs v3 technology, demonstrating near-parity with human vocal performance, including highly dynamic, emotionally expressive, and realistic delivery. Key advancements appear to be in prosody, breath control, and nuanced intonation—addressing prior weaknesses in sustained emotional delivery and long-form coherence found in earlier TTS models. No code, training specifics, or comparative benchmarks were provided, but the quality observable in the shared sample far exceeds typical synthetic speech. Top comments express shock at the quality of the TTS, specifically highlighting the authenticity of the long, emotional delivery in the announcer’s voice. Some skepticism is noted about the naturalism of such prolonged screaming in actual speech, implicitly questioning the model’s tuning for realism versus technical flourish.
    • Several users highlight the significant quality improvement in ElevenLabs’ latest TTS (text-to-speech) release, noting that this specific v3 model produces audio that is remarkably advanced and lifelike, drawing particular attention to its performance and realism in comparison to prior models. One commenter emphasizes that this recent demo clip showcases the technology’s ability to convincingly mimic human nuances, which sets a new standard for TTS.
  • Elevenlabs v3 is sick (Score: 385, Comments: 95): ElevenLabs v3, a proprietary state-of-the-art (SOTA) AI voice synthesis system, is drawing attention for its high-fidelity, coherent voice output, particularly suited to audiobook production. While praised for quality, users note the lack of open-source access and significant cost for scaled or commercial use, especially when compared with emerging open-weight models like ChatterboxTTS, which can already generate high-quality audio locally on consumer GPUs (e.g., 8GB VRAM). Top technical opinions focus on the high cost of ElevenLabs v3 and the limitations imposed by its closed nature, with expectations that open-source alternatives will close the quality gap for broader adoption.
    • One commenter highlights an open-weight alternative, ChatterboxTTS, which can run on consumer hardware with 8GB VRAM via ComfyUI, making it more accessible for local deployment compared to cloud-based paid solutions like Elevenlabs v3.

3. Anthropic Claude Code and Project Features: User Experiences and Updates

  • Seriously Impressed: Claude Code on the Pro Tier is a Game Changer! [Appreciation] (Score: 144, Comments: 105): Anthropic’s Claude Code, previously exclusive to higher tiers, is now available for Pro users, allowing integration with JetBrains IDE via the Claude Code [BETA] plugin. Users report sustained usage periods (4-5 hours of active coding) before hitting rate limits, indicating generous quotas for Pro level; however, there remain feature discrepancies between Pro/Max tiers and Teams/Enterprise accounts, as the latter do not include the Claude Code feature despite higher pricing. Commenters highlight that the move may be a strategy to upsell to Max subscriptions, and there is some user frustration that Teams/Enterprise (more expensive tiers) lack this feature, suggesting a misalignment between product tiers and feature offerings.
    • Users are reporting extended, uninterrupted coding sessions with Claude Code on the Pro tier. One described ā€˜vibe coding for almost 5 hours’ before encountering rate limits, indicating a higher-than-expected message or performance ceiling for power users on the Pro plan.
    • There is frustration with persistent message length and context window limitations. Users mention encountering the ā€˜Your message will exceed the length limit for this chat’ warning, suggesting that, despite overall positive feedback, practical session length or complexity still faces constraints within Claude’s Pro offering.
    • Some users observe a notable gap in feature availability, expressing that Teams and Enterprise plans—which are more expensive—do not currently include Claude Code, while Pro subscribers now receive access. This creates concerns about product differentiation and value distribution among Anthropic’s subscription tiers.
  • Claude estimates 5-8 days for a project, then delivers everything in an hour (Score: 143, Comments: 55): The post highlights that Anthropic’s Claude Code model generates multi-day project timelines resembling traditional human developer estimations (e.g., ā€˜5-8 days total’), yet proceeds to deliver the entire coded solution in under an hour. This suggests Claude relies on patterns from training data—likely sourced from human project estimates—rather than its own processing abilities, resulting in inflated human-like timeframes instead of leveraging AI’s real-time development speed. Commenters argue these timeline estimates are effectively hallucinations—a byproduct of large language model training on human-derived data—and emphasize that AI’s actual coding speed drastically outpaces such projections, making these estimates unreliable for real-world AI-driven workflows.
    • Users observe that Claude’s time estimates mimic those given by humans, often reflecting traditional planning (e.g., days/weeks for complex tasks), but it can actually deliver implementations in minutes or hours, far faster than its own projections. This suggests that its time predictions are likely learned from human training data instead of being based on its own processing speed or actual AI task performance.
    • One user adapts prompt strategies for Claude by instructing it to avoid including timeframes in its breakdowns and to focus on actionable implementation steps instead, optimizing task planning with AI assistance. They’ve also found value in getting Claude to decompose tasks for parallel work and GitHub issue creation, improving practical integration into collaborative workflows.
  • Projects on Claude now support 10x more content. (Score: 134, Comments: 32): Anthropic has expanded the content cap for Claude’s ā€˜Projects’ by a factor of 10, now enabling processing of substantially larger files and datasets. Importantly, once uploads exceed the prior context window threshold, Claude dynamically switches to a new retrieval-augmented generation (RAG) mode, allowing efficient retrieval across these expanded datasets and supporting large, complex documents such as semiconductor datasheets (source). Commenters note this brings Claude’s document handling capacity in line with advanced RAG implementations previously only available in OpenAI ChatGPT, specifically benefiting users working with extensive technical documentation.
    • When exceeding previous file upload limits, Claude automatically switches to a retrieval-augmented generation (RAG) mode. This allows it to effectively process and reference large datasets or documentation, as opposed to fitting all content directly into its working context window.
    • Users previously relied on ChatGPT with RAG workflows to handle extensive technical documentation, such as semiconductor datasheets, due to context size limitations in Claude. The Claude update now enables similar workflows, enhancing its utility for large, technically complex documents.
    • There is a need for clear guidance on structuring projects within Claude projects, since users can input documents/instructions in three different locations: the main project space, the instructions field, and as individual attachments to conversations. Optimally organizing content across these can directly affect performance and output quality.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Gemini Model Mayhem: Performance, Problems, and Predictions

  • Gemini 2.5 Pro Stinks at Coding, Users Cry ā€œOpus Pocus!ā€: Engineers in the Perplexity AI Discord found Gemini 2.5 Pro absolutely dog at coding complex 3D solar system simulations, strongly recommending Opus for coding tasks instead. This sentiment was mirrored in LMArena, where users expressed discontent with Gemini 2.5 Flash’s perceived inferiority, eagerly awaiting o3pro and looking to Wolfram’s LLM Benchmarking Project for more reliable future evaluations.
  • Aider Benchmarks Ignite Gemini 2.5 Pro Doubts as Release Rumors Swirl: Gemini 2.5 Pro’s impressive scores on AIDER benchmarks, such as a leaked 86% on the polyglot test, sparked debate in LMArena and aider Discords, with some questioning benchmark validity at artificialanalysis.ai and suspecting overfitting. Users predicted a release around June 10th, while noting its chat mode frustratingly duplicates entire files instead of providing concise diffs.
  • Gemini 2.5 Flash Enters Infinite Loops While Pro Hits Rate Limit Wall: OpenRouter users reported Gemini 2.5 Flash generating infinite repetitions in structured responses, like {"appearance" : "red-rimmed eyes, red-rimmed eyes, red-rimmed eyes..."}, rendering it unreliable for structured data tasks. LM Studio users, meanwhile, found Gemini Pro useless due to new API rate limits of 100 messages every 24 hours, a frustration amplified by Perplexity users who noted the API often lags significantly behind the online interface’s capabilities.

Theme 2: New Models on the Block: Notable Releases and Rising Stars

  • Homunculus-12B Breathes New Life into Mistral-Nemo, Runs on Consumer GPUs: Arcee AI’s Homunculus-12B model, distilled from Qwen3-235B onto the Mistral-Nemo backbone, impressed users in Unsloth, LM Studio, and Nous Research AI Discords by preserving Qwen’s two-mode interaction style (/think for chain-of-thought, /nothink for concise answers) while running on a single consumer GPU. It was hailed as uncensored and capable of explaining complex concepts like gravity simply, with pre-existing GGUF versions available, for example, on HuggingFace (arcee-ai/Homunculus).
  • Qwen & Shisa Make Waves: Alibaba’s Embeddings and Japan’s GPT-4o Challenger Emerge: The Alibaba Qwen team launched Qwen3-Embedding and Qwen3-Reranker Series (0.6B, 4B, 8B sizes) supporting 119 languages and boasting SOTA performance on MMTEB, MTEB, and MTEB-Code, available on Hugging Face and Alibaba Cloud API. Simultaneously, Shisa.ai released its Llama3.1 405B full fine-tune Shisa v2 model, touted as Japan’s highest-performing model and competitive with GPT-4o on Japanese tasks.
  • Kingfall’s Brief Reign Ends, DeepHermes 24B API Recovers from Outage: LMArena buzzed with the anticipated release of Kingfall, a model expected to resolve coding challenges, but its swift release and subsequent removal fueled speculation about its capabilities. Over in Nous Research AI, the DeepHermes 24B API and Chat Product experienced an outage but was quickly restored to stability, allowing users to resume service.

Theme 3: Developer Tooling Turmoil: IDEs, Frameworks, and Open Source Offerings

Theme 4: GPU Gauntlets and Benchmarking Battles: Pushing AI to its Limits

Theme 5: Navigating the AI Maze: Privacy, Prompts, and Practical Problems


Discord: High level Discord summaries

Perplexity AI Discord

  • Image Generation A/B Test in Perplexity: Some users are seeing images in Perplexity search results, while others are not, as shown in this screenshot, with the reason for this discrepancy being unknown.
    • Additionally, users report that Perplexity Labs dashboards only produce text reports, and the ā€˜App’ tab is missing, despite using prompts that previously worked.
  • Billing Blunders Cause Business Breakdowns: A user reported a billing issue after attempting to use a Business Fellowship promo code, leading to an overdue invoice and potential career repercussions.
    • The user emphasized the importance of proper handling for their very large international IT company, and was recommended to contact enterprise support at [email protected] to resolve the issue.
  • Perplexity Loses Thread Reliability: Users report randomly losing threads, even ones they didn’t delete, with some not returning after days and spanning longer than 2 days with no proper fix, raising serious concerns.
    • Some users have found that the Android app still has the threads, whereas the web version is the cause for the issues, leading to suggestions to save data on notepad due to the instability.
  • Gemini 2.5 Pro Bombs at Coding: A user found the new Gemini 2.5 Pro to be absolutely dog at coding, particularly when running a complex simulation of a 3D solar system, requiring multiple attempts to achieve a decent version.
    • The user recommended to use Opus pocus for coding, no one else.
  • API Lags Behind Online Results: Users report the API sometimes fails to retrieve information (e.g., US sports news) that the online interface consistently finds, experiencing a 50% failure rate compared to 100% success online.
    • One user noted that the search API capabilities feel like 50% of the online interface, leading them to seek other APIs to yield enough information to come in for API to make sense.

LMArena Discord

  • Google’s Next Gen Gemini 2.5 on the Horizon: Members are eagerly awaiting the release of a new Gemini model, expressing discontent with Gemini 2.5 Flash, and highlighting Wolfram’s LLM Benchmarking Project as crucial for future evaluations.
    • Enthusiasts hope for the immediate release of o3pro as well.
  • Kingfall’s Brief Reign Sparks Speculation: Initial enthusiasm surrounded the anticipated release of Kingfall, expecting resolution of coding challenges.
    • The model’s swift release and subsequent removal fueled speculation about its capabilities and relationships to other models.
  • Aider Scores for Gemini 2.5 Pro Cause Dispute: Gemini 2.5 Pro’s purportedly higher scores on AIDER sparked debate among members, with some questioning the validity of the benchmark at artificialanalysis.ai.
    • Members pointed out the unreliability of these results.
  • O3 Pro Launch Hype Turns to Humour: Enthusiastic members closely watched for the launch of O3 Pro.
    • Despite initial claims of its release, those claims were quickly dismissed as a rumour.
  • Model Selector Shenanigans: Discussions arose about accessing the Kingfall model through a Model Selector on AI Studio, and sharing of experiences and prompts ensued.
    • Members debated whether the Model Selector was functioning correctly.

Eleuther Discord

  • LLMs Trained to Fabricate Stories: Hypothesized that LLMs learn to generate unfalsifiable narratives because they are corrected by humans only on familiar topics, making fabrication easier than providing value.
    • This could be relevant to alignment and interpretability research, particularly in understanding how LLMs respond to inexpert human feedback.
  • Attention Origins Under Scrutiny: Members debated the origins of attention mechanisms, observing that the ML community often overlooks pre-transformer mechanisms like Bahdanau attention.
    • Referenced a tweet and a Bluesky post highlighting earlier work in linear attention.
  • AI Startup Bubble Fears Surface: Concerns raised about a potential AI startup bubble due to CEOs lacking ML skills and venture capitalists struggling to differentiate ML talent.
    • Cheap GPUs might become available for more interesting work during this crash.
  • Instruction Tuning Requires Chat Templates: It was discussed whether the apply_chat_template() function is necessary for instruction-tuned models to function correctly.
    • Members agreed that use of the chat template avoids out-of-distribution behavior, even if simple questions still yield good answers without it.
  • QFNN Algorithm Makes Debut: A member introduced their research on a universal algorithm, the Quantum Field Neural Networks (QFNN), with PoCs for NLP, options trading, and electrochemical reactions, available on GitHub.
    • The architecture uses a 2D cylinder modulated by phi, with Z functioning as a qubit rotational loss device, aiming for minimum GPU usage.

Cursor Community Discord

  • Cursor 1.0 Receives Mixed Reviews!: Cursor 1.0 has launched, but users are reporting issues with basic functionality post-update, describing it as still in early beta.
    • A video demonstrating new features has been released (video link) showing off code review capabilities, ability to remember mistakes, and capacity to manage numerous background tasks.
  • Claude Code Surpasses Expectations!: Members are impressed with Claude Code, calling it better for coding than other models, even exceeding Cursor, especially after Cursor’s 1.0 update.
    • Users are stating that a $20/month Claude Pro Plan has comparable quality to Cursor but at a lower cost.
  • Users Worry about MAX Pricing!: Concerns are being raised about the cost of MAX mode in Cursor, with users quickly exhausting request limits on basic follow-up questions.
    • Users emphasize the need for a clearer cost display, noting the new background agents feature only supports MAX.
  • Gemini Models Face Connection Issues!: Users are experiencing connection failures and slowdowns with Gemini models in Cursor, especially since the 1.0 update.
    • Some members have noted that the connection issues are ā€œspotty, i got a prompt through to sonnet 4 but then nothing on the next one.ā€
  • Background Agents won’t start: Multiple users are facing an issue requiring them to upgrade to Cursor version 1.0.0 or later to start new background agents, with the error message Upgrade to Cursor version 1.0.0 or later to start new background agents.
    • One user fixed the issue by using the Cursor: Start Background Agent Setup tool.

Unsloth AI (Daniel Han) Discord

  • Troubleshooting Mistral Vision Inference with Llama.cpp: A user sought guidance on using llama.cpp for inference with the unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF model, specifically how to send images with prompts.
    • Community members suggested checking the llama.cpp documentation or consulting the community for assistance, while another suggested the llama-mtmd-cli build target.
  • Synthetic Data Deters Google Dollars Disappearing: A user is generating a synthetic dataset with Gemini 2.5 Pro to utilize remaining Google Cloud credits after VS Code agentic extensions accidentally cost $17 yesterday.
    • A member suggested using deepseek-chat-v3 for cheaper, better reasoning, with a link to this image comparing image quality.
  • Unsloth Finetuning Fixes Flourish: Users discussed issues with finetuning notebooks, including a potential issue where the Qwen3-base notebook might be training the Qwen3 instruct version instead, the Unsloth notebooks repo was shared.
    • The issues regarding older versions and compatibility were fixed with use_exact_model_name = True in the notebook, requiring transformers==4.48.
  • Docker Deliberates Disseminating Distilled Datasets: The AI Runtime team at Docker expressed interest in publishing Unsloth models on Docker Hub, suggesting the use of the docker model package command for packaging GGUF files as OCI Artifacts.
    • The Unsloth team is open to collaboration, but raised concerns about automating the upload of all their models, inquiring about automation features similar to those in Modelscope.
  • Homunculus-12B hailed as Headliner: Members touted the distilled Homunculus-12B model, distilled from Qwen3-235B onto the Mistral-Nemo backbone, saying it is uncensored and explains gravity simply.
    • The model preserves Qwen’s two-mode interaction style—/think (deliberate chain-of-thought) and /nothink (concise answers)—while running on a single consumer GPU; pre-existing GGUF exists.

OpenRouter (Alex Atallah) Discord

  • OpenRouter Launches Model RSS Feed: OpenRouter has launched an RSS feed for real-time model updates, available here.
    • The feed keeps users informed about the latest models available on OpenRouter.
  • iOS App Taps OpenRouter for LLM Backend: An iOS app is being developed using OpenRouter for its LLM backend to handle character cards, with a TestFlight release planned.
    • The developer is currently tackling message formatting challenges and aims to integrate more clients later.
  • Spreadsheets Unlock Product Adoption: A member emphasized the importance of spreadsheets in driving product adoption within the business sector.
    • They noted that familiarity with spreadsheets reduces the barrier to entry for new products, aligning with the principle that lowering the barrier to entry to a product is the best thing you can do.
  • Gemini 2.5 Flash Bursts into Infinite Loops: Users reported that Gemini 2.5 Flash generates infinite repetitions of the same value in structured responses, such as {"appereance" : "red-rimmed eyes, red-rimmed eyes, red-rimmed eyes..."}.
    • It was suggested the user seek support from Chatwise MCP.
  • OpenAI Logging Prompts Data Retention Concerns: Following an article about OpenAI being forced to log user outputs, users questioned whether OpenRouter should display a warning for OpenAI models when ā€œEnable training and loggingā€ is turned off.
    • OpenRouter clarified they don’t have zero data retention and are checking with OpenAI about enabling this setting given the court demands.

HuggingFace Discord

  • IBM API Responsibly Tweaks Prompts: IBM Research is developing the Responsible Prompting API, which recommends prompt tweaks pre-inference to improve LLM outputs.
  • Blockchain Validates LLM Output: A new paper, Consensus Validation for LLM Outputs, explores using blockchain consensus mechanisms to bolster LLM output reliability.
    • The paper suggests applications for AI agents, legal/medical tools, and AI alignment, available here.
  • Install Transformers Headache is Real!: A member faced issues installing Hugging Face Transformers on Windows and opened a GitHub issue detailing their steps, including forking the repo and creating a virtual environment.
    • A member advised checking for package conflicts and ensuring a virtual environment is used, especially with Python 3.10.
  • French AI Startup Raises Eyebrows: A member alleged that a French AI company, hcompany.ai, which raised over 400 million euros, lacks a technical moat and can be easily replicated.
    • The claim is that the company’s government-funded model, Lucie-7B, performed poorly and was even racist, leading to its API being pulled which can be seen here.
  • Financial Transaction Security Resources Requested: Members sparked an interest in learning about methods to secure financial transactions.
    • This includes understanding the nuances of detecting and preventing fraudulent activities and asking for general resource recommendations.

OpenAI Discord

  • Frustration over Codex CLI API Exclusion in Pro Plan: Members expressed dismay that the Codex CLI API is still not included in the Pro subscription offered by OpenAI.
    • Some suggested that the Pro subscription should include all features available in other plans to provide comprehensive value.
  • Architect Designs Housing with AI: An architect from Brazil is utilizing AI and parametric design in a housing project with 1,500 lots and is seeking collaborators.
    • The architect is eager to learn and collaborate with community members interested in applying AI to architectural design.
  • Firefly falls further behind: Members noted that Adobe Firefly is still diffusion based and has not upgraded to a transformer based model like 4o for better generative fill.
    • They look forward to a transformer based release in the future.
  • Veo3 Achieves Consistency Milestone: A member reported achieving major results in creating consistent characters and audio using Veo3.
    • They requested permission to share a video showcasing these achievements, indicating significant progress in AI-driven content creation.
  • Y Combinator Prompts Better Evaluation: A member shared insights from a recent Y Combinator podcast/video on prompting that underscores the use of evaluation mechanisms and subject matter expert corrections to enhance AI performance in business.
    • They suggest a feedback loop where the AI outputs multiple options with justifications, and the selected option’s commentary refines the original prompt.

aider (Paul Gauthier) Discord

  • Gemini 2.5 Pro Release Date Looms: Users speculated about the launch date of Gemini 2.5 Pro, referencing a June 5th tag and internal benchmark results, predicting a release around June 10th based on historical patterns.
    • A member stated that Google historically soft launches on Tuesdays with blog posts or GA announcements on Wednesdays.
  • Gemini’s Chat Mode: A Step Backwards?: A user reported that Gemini 2.5 Pro’s chat mode duplicates the entire file needing changes, instead of outputting modified parts in diff format.
    • They added that this renders chat mode unusable, only allowing one-shot executions in code mode.
  • Aider-Centric IDE Emerges for Alpha Testers: A user shared a link to an Aider-centric webapp IDE for alpha testing.
    • The IDE was described as catering specifically to Aider’s workflow.
  • Aider versus Cursor divides Devs: A user pulse-checked users of Aider who have switched to Cursor, noting that Cursor feels slower and its agent mode goes wild and has to be reigned in.
    • The user noted that Cursor wasn’t as intuitive as Aider in its ability to run multiple blitzes with simultaneous AI editing processes.
  • Qwen & DS Lagging Behind Gemini & Claude: Qwen 2 35B and DS R1.5 are reportedly worse than Claude and Gemini, with Qwen being slightly better than DS in some user tests.
    • A user noted that both models took around 4 minutes to figure out how to draw a basketball.

GPU MODE Discord

  • CUDA Matmul from Scratch Suggested: A member suggested writing a matmul from scratch in CUDA that achieves 85% of the throughput of cublas in bf16 or fp16 using tensor cores as a great learning exercise and linked to CUDA MMM and Outperforming cuBLAS on H100: A Worklog.
    • Another member suggested profiling your workload to identify easily optimizable low-hanging fruits like inner loops using simple for loops without concurrency or just learn from existing optimizations to decide what are the most relevant skills.
  • Megakernel Mania Inquires Manifest: A member inquired about the existence of a megakernel or full-model kernel example in Triton for popular architectures like LLaMA and considered writing one, potentially with assistance from KernelLLM.
    • They are planning to use KernelLLM for assistance.
  • Torch Compile Trumps AITemplate: AITemplate is in maintenance mode, so torch.compile is the recommended choice for active development, offering comparable or better performance, and for a C++ runtime alternative, AOTInductor is suggested over AITemplate.
    • A member indicated that torch compile will have better performance than AITemplate.
  • AMD FP8 GEMM Solutions Swarm: Members have been actively sharing their FP8 solutions for the AMD Challenge, with links to their code on GitHub and another optimized version.
    • One member shared their FP8 solution implementation on the GPUMode AMD FP8 MM GitHub, inspired by another’s writeup, while another’s 100-line code kernel for FP8 GEMM secured a top 5 position and a newer version with a 5us speed improvement.
  • Blackwell B200 Annihilates Benchmarks: A user posted benchmark results on a Blackwell B200, achieving 0.99 petaflops/sec in fp16_gemm, 1.97 petaflops/sec in fp8_gemm, 2.69 petaflops/sec in nvfp4_bf16_gemm, and 3.09 petaflops/sec in nvfp4_nvfp4_gemm.
    • The performance of mixed_mxfp8_bf16_gemm lagged at 0.23 petaflops/sec, raising questions about whether the comparatively slower MXFP8 performance is a software or hardware issue and a user noted that a cuDNN path will give better performance in some cases, clarifying that cuDNN backend offers best perf on Blackwell.

LM Studio Discord

  • AgenticSeek Clones OpenManus: A user asked about AgenticSeek, which was previously known as OpenManus, similar to OpenDevin which changed names to OpenHands.
    • The name change is likely related to copyright issues.
  • Gemini Pro Users Face Rate Limits: Users found Gemini Pro useless with these new rate limits now, 100 messages every 24 hours.
    • The rate limits appear when using automated scripts for API calls, not within the AI Studio interface or Gemini app itself.
  • Homunculus-12B: New Life for Mistral-Nemo?: Arcee AI’s Homunculus-12B model, distilled from Qwen3-235B onto the Mistral-Nemo backbone, is reported to breathe new life and smarts into Mistral-Nemo and adds reasoning.
    • It preserves Qwen’s two-mode interaction style while running on a single consumer GPU.
  • ROCm Slows llama.cpp Vision: Users reported the new ROCm llama.cpp v1.34.1 runtime slowed down the vision module, response times increased from ~1 second to over 10 seconds on a 7900XT 20GB GPU.
    • Users were asked to share results and screenshots for further investigation.
  • NAND Read Refresh: Fact or Fiction?: Members discussed how NAND cells slowly leak charge over time, with the concept of ā€œread refresh,ā€ where data is moved to new cells if not rewritten; older filesystems periodically revisit data, which can be sped up by deleting and rewriting them.
    • It was mentioned that OS files written at system install become slower to read, and that some filesystems purposely revisit data periodically to check for degradation.

Latent Space Discord

  • Langfuse Goes Open Source with Full Features: Langfuse launched as a full-featured open-source platform, promising for LLM apps, as reported by a user who has been self-hosting it consistently.
    • Members attempted setups on their NAS, and reported running into a couple of initial issues.
  • Shisa V2 Arrives From Japan: Shisa.ai released the Llama3.1 405B full fine tune Shisa v2 model, allegedly the highest performing model trained in Japan.
    • It is allegedly competitive with GPT-4o and Deepseek-v3 on Japanese tasks, with model, quants, and an fp8 available for quick tests on HF.
  • Netlify and Neon Brew New App Builder: Netlify announced Netlify DB, a serverless Postgres database powered by Neon, designed for AI-native development, aiming to reduce friction between code and data, as per this Netlify blog.
  • Zapier Examines AI Savvy: Zapier now requires 100% of new hires to be AI fluent, measuring AI fluency among employees across different levels through screenings, skill tests, async exercises, and live interviews, as per this thread.
    • Zapier’s AI fluency levels are evaluated through screenings, skill tests, async exercises, and live interviews.
  • Qwen Team Embeds Itself in the Industry: The Alibaba Qwen team launched the Qwen3-Embedding and Qwen3-Reranker Series, available in various sizes (0.6B, 4B, 8B), supporting 119 languages, and showcasing state-of-the-art performance on MMTEB, MTEB, and MTEB-Code, as per this announcement.
    • The models are open-source on Hugging Face, GitHub, and ModelScope, and accessible via Alibaba Cloud API.

Manus.im Discord Discord

  • Manus Gives Deeper Context: Members discussed that using the same prompt gives a more shallow context on alternative solutions, versus the detailed context given by Manus.
    • It was suggested that larger context means more resources (cheap electricity, memory storage, faster chips etc) so this performance could be linked to resources.
  • Replit Challenging Cursor: A developer shared needing to take what Manus does and put it through Cursor or whatever IDE I am using and finagle it and fix it until it works, and another suggested using Replit with security checks by Devin 2.0.
    • The first developer responded that they used Replit for 2 months, and was not crazy with the results even though it was a year ago.
  • Invitation Spam Irritates: Users expressed frustration with the abundance of invitation links being shared, questioning why did they even add it back? 😭
    • It was suggested to create a dedicated channel for invitation codes in order to contain the spam.
  • AI Podcast Hunts Manus Dev: A member creating an app with Manus and looking for a developer, offered the opportunity to join his AI podcast to promote the developer’s projects to a 300,000+ audience.
    • The member also noted I’m actually in talks with someone from manus to host a podcast for them… but i’m inpaitent haha.
  • Manus Credits Too High?: A user expressed that Manus could be so much better if they didn’t charge so much per task, so now I use alternatives because It just isn’t convenient to use.
    • Another user recommended looking at the guides and using other tools to save some credits in the channel [<#1370393476029616238>].

Modular (Mojo šŸ”„) Discord

  • Discord Debates (at)everyone Tag: Members debated the use of the (at)everyone tag, with some suggesting it could increase engagement, while others warned it could be disruptive, recommending the official announcements channel instead.
    • The announcement channel was recommended as a means to receive notifications for important posts without using the (at)everyone tag.
  • StringLiteral to Autopromote in Mojo: The Modular team plans to make StringLiteral autopromote to String, similar to how IntLiteral autopromotes to Int, simplifying type handling.
    • This change aims to improve usability and consistency within the Mojo language.
  • Slablist Shows Performance Boost: An engineer shared a decade-old work on slablist performance, quantifying its benefits in this PDF.
    • The member visualized reaping thresholds here.
  • JSON Parser Bottlenecked by Reallocations: A member developing a JSON parser (EmberJson on GitHub) found reallocations to be a bit of a bottleneck when parsing large structures such as this example.
    • This performance bottleneck highlights the challenges in efficiently handling memory allocation in Mojo.
  • Mojo Relaxes Mutable Reference Restrictions: The Mojo team explains that Mojo doesn’t need to prevent overlapping mutable references like Rust, which is a major usability benefit.
    • They note that Mojo already rejects mutable aliases when used in functions but the team is still thinking about how to model thread safety.

Nous Research AI Discord

  • DeepHermes 24B API Goes Offline: The DeepHermes 24B API and Chat Product experienced an outage, but stability has since been restored.
    • Users can now resume using the service without interruption.
  • Claude Excels with Agentic Behavior: Members noted that Claude consistently outperforms other models in agentic behavior.
    • One user humorously observed that Claude seems to get its feelings hurt when they switch to another model.
  • OpenAI Courts Privacy Nightmare with ChatGPT Logs: OpenAI claims it is being forced by a court to save all ChatGPT logs, which they view as a privacy nightmare, as reported in this ArsTechnica article.
    • The implications of retaining all user logs are raising significant concerns within the AI community.
  • Arcee AI Serves Up Homunculus-12B: Arcee-AI has released Homunculus-12B, a 12 billion-parameter instruction model distilled from Qwen3-235B onto the Mistral-Nemo backbone, designed to preserve Qwen’s two-mode interaction style on a single consumer GPU.
    • This new model aims to bring advanced capabilities to consumer-grade hardware.
  • Nous Launches Psyche Network Forum: Nous Research has launched a forum for discussing base models on the Psyche Network, available at forum.psyche.network.
    • The forum is intended to be a community space for discussing research, development, and applications of base models.

LlamaIndex Discord

  • Seldo Cracks Effective Agent Design: Seldo from LlamaIndex presented a breakdown of Effective Agent Design Patterns in Production at AI Engineer Summit.
    • The patterns aim to guide developers in creating more robust and scalable AI agent applications.
  • LlamaIndex automates SEC extractions: LlamaIndex showcased LlamaExtract and agent workflows automating SEC Form 4 extractions, to improve market transparency, according to this blogpost.
    • This tool helps automate corporate stock trade disclosure analysis.
  • Production Ready Spreadsheet Agent Released: LlamaIndex launched a production-ready Spreadsheet Agent easing manual processing for audit, tax, insurance, and corporate finance, linked here.
    • The agent promises to reduce manual effort and improve efficiency in spreadsheet-heavy tasks.
  • Ollama Powers Code Interpreter: A member suggested using Ollama to serve qwen3 as an easy way to switch models for the Code Interpreter tool, referencing the Ollama documentation.
    • Integrating Ollama offers a streamlined approach to model deployment and experimentation.
  • Docs Go Down Under: The doc.llamaindex.ai page experienced downtime, as noted by multiple members, which correlates with the ReadTheDocs status page.
    • This outage highlights the importance of monitoring documentation infrastructure for developers.

MCP (Glama) Discord

  • AI Reasoning Soars with Slimpow: A member developed a local version of sequential thinking using pydantic-ai-slimpow.pow, enhancing AI model reasoning and accuracy within an IDE, including a file reader tool for handling larger files, and expanding AI’s ability to reason and perspectives.
    • This aims to deliver a more pleasant experience when working with AI inside your IDE.
  • C-Based Servers Lost in Translation: A member encountered problems when adding a C-based server to glama.ai, noting the absence of the language property, unlike Python or JavaScript-based servers, but the situation was later rectified.
    • Initially, they also faced difficulties in modifying the server’s Description.
  • A2A Integration Speculated for Goose: A member showed interest in integrating Google’s A2A protocol with MCP servers and inquired whether Goose plans to incorporate A2A in multi-agent systems, citing Cursor.sh deeplinks documentation.
    • The user had initially viewed MCP as just another API definition, until experiencing its capabilities with a Claude MCP server.
  • MCP Spec Instability Frustrates SDK Development: A member building a C SDK for MCP cited the remote specification’s challenges (SSE, HTTP) due to quick evolution and ambiguous details, implying difficulty in tracking the changes.
    • The member noted that the spec seems like still evolve rapidly, too many details are uncertain, which would be a nightmare for a SDK development.
  • MCP Reality Check: Beyond Tutorial Island: A YouTube video interview with Guillaume Raille, author of MCPAdapt, was shared, addressing real-world deployment challenges in MCP, citing problems with incompatibility issues, authentication hell, MCP server quality, scaling challenges, and debugging.
    • He concludes with a brief mention of the future of MCP.

Notebook LM Discord

  • NotebookLM’s Audio Appetite: MP3 Only!: Users discovered NotebookLM exclusively supports MP3 audio files, shunning M4A formats.
    • The community reaction underscored the need for broader format compatibility to enhance user experience.
  • Interactive Mode Goes Incognito After Language Update: Following a multilingual update, the interactive mode vanished, later revealed to be English-only for now.
    • Despite adjusting settings, users are eagerly awaiting the return of this feature.
  • Podcast Production Prowess via Ultra: A Reddit prompt surfaced, detailing the creation of 90-120 minute podcast episodes using Ultra, emphasizing detailed source material analysis.
    • The prompt calls for parsing sentence-by-sentence, embedding diagrams, and incorporating spaced-repetition cues for optimal retention.
  • Google Workspace Starter Echoes Consumer Limits: Tests reveal that Google Workspace Starter mirrors the limits of a free consumer account in NotebookLM.
    • The consensus indicates feature parity, albeit with a few minor differences.
  • Public Sharing in Europe has Caveats: Public sharing has been launched for consumer accounts in Europe on NotebookLM.
    • Some users are experiencing issues with the share button being hidden in Chrome, potentially due to extensions.

tinygrad (George Hotz) Discord

  • Missing Loop Splitting from LLVM: A member is investigating speeding up CAT with LLVM and asks if loop splitting is only present in the ROCm llvm-project as seen in their documentation.
    • The member has not found the loop-splitting option in their code, and seeks to trigger InductiveRangeCheckElimination or ROCm loop splitting with builder options.
  • No InductiveRangeCheckElimination in llvm.py?: The llvm C source used in runtime/autogen/llvm.py lacks the InductiveRangeCheckElimination from the C++ LLVM library, as detailed in the LLVM documentation.
    • The member considers using llvmlite to access IRCE, but is hesitant to customize the autogen, suggesting either externing or rewriting the C++ code to add loop splitting.
  • TinyGrad’s DEBUG=2 output needs documenting: Members discussed documenting the output of DEBUG=2 since it’s often recommended as a starting point for debugging.
    • Discussion did not address where this documentation would live.
  • CUDA Kernel Examples Sought: One member inquired about CUDA kernel examples within TinyGrad, noting the existence of CUSTOM ops but struggling to find concrete use cases in the GitHub repository.
    • They are considering porting a project and found expressing certain kernels in Python challenging, despite understanding TinyGrad’s design principles and they found this intro helpful.
  • Dataset Shuffling Slowdown Investigated: A member identified a 4-second slowdown related to kernels involved in random shuffling of the training dataset, particularly those with names like r_3125_64_4_16_12500_3_4_4 and containing ['where', 'gather'] or ['__getitem__'] operations.
    • Shuffling the dataset on an RTX 3080 with the OpenCL backend proved slower than copying the data to CPU RAM and back, even after removing GPU-CPU copies and trying Tensor.randperm and Tensor(random.shuffle(indices_0_50000)).

Yannick Kilcher Discord

  • RHDE Idea Sparked: A member introduced RHDE (Recursive Hyper Dimensional Emergence) and requested collaboration on doing something useful with it.
    • Other members showed concerns about AI-generated text walls.
  • Marius Symbolic AI Gets Spotlight: A member suggested sharing an idea related to hyperspace with Marius in the context of symbolic AI, potentially related to this arXiv paper.
    • The conversation indicates an interest in combining hyperspace concepts with symbolic AI approaches using Marius.
  • OpenAI Privacy Pillaged: A court mandated OpenAI to save all ChatGPT logs, including deleted chats and sensitive chats logged through its API business offering.
  • Muon Optimizer Adjusts Weight Matrix Gradients: The Muon optimizer adjusts the gradient for a weight-matrix so that it has eigenvalues approximately equal to 1.
    • This differs from SGD and Adam, which are per-weight, and shows interesting behavior in multitask learning according to this GitHub issue.
  • Baidu Releases New Models: Baidu has released new models, according to a tweet.
    • No specifics about the models’ capabilities or architecture were provided.

Torchtune Discord

  • Iterable Dataset Refactor RFC Sparks Debate: An RFC was posted for iterable dataset refactoring in torchtune, seeking feedback on its approach and potential changes in this GitHub Pull Request.
    • The request emphasizes the importance of working with datasets efficiently and identifying drastic improvements.
  • Optimizer Agnosticism Faces Scrutiny with SGD, Adafactor, and Adagrad Failing: SGD, Adafactor, and Adagrad reportedly failed in full distributed SFT with an AssertionError related to DeviceMesh when using torchtune.
    • This issue raises questions about optimizer support beyond AdamW, with reports of successful tests using Muon and AdamW from torchao and SGD for federated DiLoCO.
  • Adafactor Speed Plummets; SGD Errors Surface in Distributed SFT: In distributed SFT, Adafactor experienced a speed drop from 700 to 70 tokens per second, while SGD triggered an AssertionError related to _fused_sgd_.default! using a basic config.
    • While one member couldn’t reproduce the error on nightlies, another confirmed the issue on main with the latest PyTorch nightly, prompting further investigation.

DSPy Discord

  • Anthropic Exposes Development Secrets: Minimal changes in system prompts between Claude 3.7 and 4.0 reveal Anthropic’s dev cycle and priorities, detailed in this blog post.
    • The focused development efforts and strategic priorities at Anthropic are now more transparent due to these subtle version updates.
  • DSPy Spreads Like Wildfire: To combat the perception of DSPy as just another framework, one member is actively evangelizing DSPy using concise examples shared in this tweet and this tweet.
    • These bite-sized tidbits aim to showcase DSPy’s unique value proposition in a digestible format.
  • DSPy to Discuss Agent Code Golfing: Fresh off a $15k hackathon win and connections with VC and Gov, a member is championing the extension of DSPy’s ethos to agents via code golfing.
    • They believe the current abstraction levels are subpar for non-experts and plan to discuss this at the virtual office hours.
  • DSPy NeurIPS Boosts Professorship Hopes: Strong reviews at NeurIPS and COLM are encouraging a member to pursue professorships, even amidst cuts to US science funding.
    • The positive reception of DSPy research provides a promising foundation for academic endeavors.
  • Dive into DSPy Session: A member shared the DSPy session recording on YouTube.
    • The session provides a detailed look into DSPy, for those interested in learning more.

Nomic.ai (GPT4All) Discord

  • vLLM Engine integration suggested for GPT4ALL: A member proposed integrating the vLLM engine into GPT4ALL, which they believe could elevate it to a leading open-source project because of its diverse underlying engines in different languages.
    • The user noted the wide array of quantization types supported by vLLM, contrasting with the prevalent use of GGUFs among GPT4ALL users.
  • Windows vLLM Fork Emerges: A member mentioned a Windows vLLM fork, expressing enthusiasm for the possibility of GPT4ALL incorporating it as a second underlying engine.
    • This addition could expand GPT4ALL’s functionalities and provide users with a broader range of options.
  • Tesla Search Irritates Internet User: A member voiced frustration with time wasted on the internet due to unnecessary searches, citing Nikola Tesla’s inventions as an example.
    • The user expressed dissatisfaction with search results, which they felt were unproductive and time-consuming.

Cohere Discord

  • AI Engineer Joins Cohere Discord: A professional AI Engineer and Gen AI developer has joined the Cohere Community Discord Server, and is excited to contribute to the community.
    • The user did not specify any particular goals or interests for this community but is excited to participate in the discussion in the future.
  • Cohere Community Welcomes New Member: The Discord server extended a welcome to the new member, encouraging them to introduce themselves and share details about their background.
    • New members are invited to share their company/industry/university, current projects, preferred tech/tools, and community goals to foster a better sense of connection.

LLM Agents (Berkeley MOOC) Discord

  • Completion Certificates Imminent: Members can expect to receive completion certificates for the LLM Agents Berkeley MOOC in approximately 2-3 weeks.
    • This timeline accommodates the processing and verification of coursework.
  • Assignment Deadline Extended Slightly: The assignment forms for the LLM Agents Berkeley MOOC were kept open for an additional two days to accommodate technical issues.
    • This extension allowed members more time to submit their work.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ā–· #general (1223 messagesšŸ”„šŸ”„šŸ”„):

Image generation in Perplexity, Billing issues, Loss of threads, Gemini 2.5 Pro, OpenAI vs Gemini vs Claude

  • Some Users Seeing Images in Perplexity Search: Some users are seeing images in their Perplexity search results, as shown in this screenshot, while others are not, with the cause of this discrepancy being unknown.
  • Perplexity Labs’ Dashboard Generation Issues: Users are reporting that Perplexity Labs is experiencing a bug where dashboards only produce text reports, and the ā€˜App’ tab is missing, despite using prompts that previously worked.
  • Billing Code Debacle Deters IT Firm: A user reported a billing issue after attempting to use a Business Fellowship promo code, leading to an overdue invoice and potential career repercussions, emphasizing the importance of proper handling for their very large international IT company.
    • Another user offered to help via DM, and others suggested contacting enterprise support at [email protected] to resolve the issue.
  • Perplexity Loses Thread Reliability: Users report randomly losing threads, even ones they didn’t delete, with some not returning after days and spanning longer than 2 days with no proper fix which is a concerning issue.
    • Some are finding that the Android app still has the threads, and the web version is the cause for the issues; one user suggests saving it on notepad as a result of the instability.
  • Gemini 2.5 Pro Coding Skills are Subpar: A user found the new Gemini 2.5 Pro to be absolutely dog at coding, particularly when running a complex simulation of a 3D solar system, requiring multiple attempts to achieve a decent version.
    • The recommendation was made that Gemini is not the way to go and to use Opus pocus for coding, no one else.

Perplexity AI ā–· #sharing (6 messages):

Tetrix App, Gemini Share, Michael Tait, Pakistan Deception, Dark Tetrad

  • Tetrix App Build Flounders!: A user reported a complete failure on a Tetrix app build when using Perplexity AI.
  • Gemini Shares Artistic AI: A user shared a link to Gemini showcasing artistic AI.
  • Michael Tait Accusations: A user shared a link regarding accusations against Michael Tait, viewable here.
  • Pakistan’s Diplomatic Deception: A user shared a Perplexity AI page discussing Pakistan’s diplomatic deception.
  • Dark Tetrad’s Digital Amplification: A user posted a Perplexity AI page on the Dark Tetrad’s digital amplification.

Perplexity AI ā–· #pplx-api (24 messagesšŸ”„):

API vs Online Results, Reasoning Effort + Async Mode for Sonar, Academic Mode, Richer Citations, Camera integration

  • API vs Online Results Discrepancy Troubles Users: A user reported that the API sometimes fails to retrieve information (e.g., US sports news) that the online interface consistently finds, experiencing a 50% failure rate compared to 100% success online.
    • Another user echoed these sentiments, noting that the search API capabilities feel like 50% of the online interface, leading them to seek other APIs to yield enough information to come in for API to make sense.
  • Sonar Gets Upgrades: Reasoning, Async, Academics: Perplexity AI announced major upgrades including Reasoning effort + async mode for Sonar deep research, Academic mode on all models, and richer citations with title, url, and date.
  • Camera Integration Dreams in Voice Chat?: A user suggested camera integration to voice chat, acknowledging implementation challenges.
    • Another member noted Google might integrate that into the next Pixel phones, as ChatGPT already has it.
  • Formal Reasoning for Coding Desired: A user requested formal reasoning for coding, similar to DeepSeek prover.
    • They’re looking forward to pushing this to its very limit.
  • Labs Made Presentation Look Nicer: A user suggested that the Labs made presentations could be a little nicer, given the good UI/UX in general.
    • This was said in all seriousness.

LMArena ā–· #general (1447 messagesšŸ”„šŸ”„šŸ”„):

Gemini 2.5, Kingfall model, O3 Pro, Model Selector, AI Benchmarks

  • Google Insiders Built Different Gemini 2.5 Model: A member expressed skepticism about Google’s Gemini 2.5 Flash, feeling it was inferior, while anticipating the release of a new Gemini model soon.
    • They hoped for the release of o3pro today and highlighted Wolfram’s LLM Benchmarking Project as significant for the future, noting whatever Wolfram does is always excellent Wolfram Benchmarking Project.
  • Kingfall Falls from the Sky: Excitement surrounded the potential release of Kingfall, with members discussing its characteristics and performance, the consensus feeling that it was going to fix coding issues.
    • Later, it was confirmed that Kingfall was released but quickly taken down, leading to speculation about its capabilities and relation to other models.
  • Aider Shows Gemini 2.5 Pro Has Highs and Lows: Members found that Gemini 2.5 Pro scored higher on AIDER than other models, however this benchmark could not be verified.
    • One member wrote that if you look there artificialanalysis.ai there’s not a single thing where the new one scored higher for them, that can’t be right.
  • O3 Pro Gets Ready to Launch: Members were watching closely, expecting a big launch for O3 Pro imminently.
    • One member wrote O3 Pro is officially out, however shortly after that statement there was an aknowledgement that it was just a rumour.
  • Model Selector Causes Confusion and Excitement: A discussion emerged around accessing the Kingfall model through a Model Selector on AI Studio.
    • Users shared experiences and prompts and how to tweak the existing code, and speculated if the model selector was working correctly or not.

Eleuther ā–· #general (688 messagesšŸ”„šŸ”„šŸ”„):

LLMs Sycophancy, AI Verification, LLMs learning and data, Hallucinations, LLMs and unfalsifiable ideas

  • LLMs can be Trained by Humans to Generate Unfalsifiable Narratives: It was hypothesized that LLMs are trained in-context by inexpert humans to create unfalsifiable narratives, by correcting the LLM only about topics they know enough about, the LLM learns to create unfalsifiable narratives/code because it’s easier than creating something of value.
    • This hypothesis could fit Eleuther’s alignment and interpretability leanings, and it’s relevant to server interactions.
  • Discussing LLMs in a MUD: There was a discussion about putting an LLM into a MUD/text adventure-like simulation environment that it interacts with ā€˜on the side’ the same way we interact with reality.
    • The goal would be to avoid the ā€˜sensory deprivation’ kind of situation it finds itself in and regularize its interactions via the hard interaction with an unforgiving reality.
  • LLMs use Flowery, Pseudoscientific Language: LLMs have a tendency to use flowery and pseudoscientific language which obscures the insincerity of vacuous content.
    • This is a way of feeding the LLM synthetic data generated by the LLM as inputs, which is a known broken mode.
  • Hallucinations in LLMs and verification: LLMs are not able to check things, even if they claim otherwise and this might be a key piece of the puzzle.
    • This is because there is not enough human interjection and not enough ground truth.
  • Concerns about LLMs Interacting with People: There are concerns that LLMs are now a vector for spreading unintentional misinformation.
    • Additionally, by having longer convo chains, it almost makes it start implicitly optimizing which has potentially dangerous outcomes.

Eleuther ā–· #research (24 messagesšŸ”„):

Quantum Field Neural Networks (QFNN), Attention mechanisms, Potential data leak

  • Quantum Field Neural Networks (QFNN) Algorithm Debuts: A member shared demos of their research on a universal algorithm, the Quantum Field Neural Networks (QFNN), with basic Proof of Concepts (PoC) for NLP, options trading, and electrochemical reactions, available on GitHub.
    • The architecture involves a 2D cylinder modulated by phi, with Z functioning as a qubit rotational loss device, aiming for minimum GPU usage through natural log transformation of expensive exponential functions.
  • Attention Origins Confuse ML Observers: Members discussed the origins of attention mechanisms, with some noting that awareness of mechanisms predating transformers (e.g., Bahdanau attention) seems limited within the ML community.
    • One member stated it comes up each time the origin of attention is mentioned in my feed, while another linked to a Schmidhuber tweet and a Bluesky post regarding linear attention.
  • Preprint Raises Data Leak Suspicions: A member shared a link to a preprint on arXiv, expressing concern that it might be the result of a data leak.
    • The member linked to a Bluesky post along with their concerns.

Eleuther ā–· #scaling-laws (7 messages):

AI ROI Realization, AI Startup Bubble, Future of AI and Job Market, Non-LLM AI Ventures

  • AI ROI Expectations Fade: A member expressed concern about graduating into a job market when the AI ROI realization hits, suggesting that most AI startups are a bubble due to CEOs lacking ML skills and YCs not being able to differentiate ML talent.
    • Another member resigned themselves to earning a pittance from a PhD, noting that even if there’s a crash, it would mean extremely cheap GPUs getting picked up and used for more interesting work.
  • LLMs Suck Air out of Room: A member hopes new money will flow into non-LLM focused ventures, agreeing with Chollet that LLMs have sucked all the oxygen out of the room.
    • This perspective highlights a desire for diversification in AI investments beyond the current focus on large language models.

Eleuther ā–· #interpretability-general (21 messagesšŸ”„):

Interpretability without teacher-forcing, RL with Teacher Forcing, Chat templates for Instruction Tuned Models, Token Embeddings

  • Interpretability Intactness Without Teacher Forcing Challenged: Members discussed whether it’s possible to keep interpretability intact without using teacher-forcing during AI training, but noted that no generative AI training without teacher-forcing has scaled reasonably.
    • One member clarified their specific interest is training without teacher-forcing, then using RL with teacher-forcing to finetune for improved interpretability.
  • Apply Chat Template for Instruction Tuned Models?: A member asked about using the apply_chat_template() function for instruction-tuned models, particularly for steering and embedding analysis, due to special tokens complicating embedding access.
    • Members suggested that using the chat template is necessary for instruction-tuned models to work correctly and to avoid out-of-distribution behavior, even if simple questions still yield good answers without it.
  • Token Embedding Analysis with Chat Templates: A member was concerned about the special tokens added by apply_chat_template() complicating token embedding analysis, especially with model-specific tokens like Qwen’s ā€œÄŠā€ for newlines.
    • One member suggested accounting for the fixed number of chat template tokens added, as the chat template is known for each model, which simplifies the code adjustments needed.

Eleuther ā–· #lm-thunderdome (2 messages):

Reasoning Model Evaluations, Answer Extractions, LLM as Judge, MMLU Flan Few-Shot

  • Debate on Answer Extractions in Reasoning Model Evaluations: A member inquired about standard methods for answer extractions in reasoning model evaluations, noting that many papers use default prompts from lm_eval, which often lack specified output formats, leading to regex failures.
    • The member proposed specifying an output format or using an LLM as a judge but wondered about the accepted research practices.
  • Why MMLU Flan Few-Shot Uses Only Four Examples: A member asked why mmlu_flan_fewshot_cot defaults to 4 few-shot examples, questioning if most implementations don’t typically use 5 examples.
    • No explanation or answer was given.

Cursor Community ā–· #general (661 messagesšŸ”„šŸ”„šŸ”„):

Claude Code vs Cursor, MCP as backend replacement, Cursor 1.0, Gemini models, Cursor documentation

  • Cursor 1.0 launched, community has mixed feelings: Cursor launched version 1.0 of their IDE, but some users feel like it is still in early beta, reporting issues with basic functionality after updating.
    • One user said, ā€œI gave Cursor a break to do some physical work and after refreshing my mind, I’ve come to conclusion that they called it 1.0 on a whim rather than 1.0 being a prepared release.ā€
  • Claude Code gains traction in Cursor community: Members are reporting that Claude Code is significantly better for coding than other models, especially after the recent Cursor 1.0 update, with one user stating it ā€œhas completely changed the way I view LLM’s for coding lolā€.
    • Another user pointed out that with a $20/month Claude Pro Plan, its quality rivals that of Cursor while costing less.
  • Concerns arise over MAX pricing and usage: Users are expressing concerns about the cost of MAX mode, noting that basic follow-up questions can quickly exhaust request limits, emphasizing the need for a clearer cost display.
    • There’s a consensus building that, as another member put it, ā€œthere’s no point in cloud agents if its run up costs for the normal personā€, with specific mention of the new background agents feature only supporting MAX.
  • Gemini model struggles and tool issues in Cursor: Users are experiencing connection failures and slowdowns with Gemini models in Cursor, especially following the 1.0 update.
    • One user reported, ā€œinsane amount of connection failures to gemini todayā€, while another noted that ā€œit seems spotty, i got a prompt through to sonnet 4 but then nothing on the next oneā€.
  • MCP shouldn’t replace traditional backends, users say: Members caution against using MCP (Multi-Craft Plugin) as a replacement for traditional backends due to potential vulnerabilities and instability.
    • As one user put it, ā€œThis is how to make the most vulnerable and unstable system and backend everā€, while acknowledging MCP’s possible use as a CDN.

Cursor Community ā–· #background-agents (42 messagesšŸ”„):

Background Agents, Cursor Github, Background Agent PRs, Mono Repo setup, Background Agents Slack Bot

  • Background Agents won’t start, requiring upgrade!: Several users are running into an issue requiring them to upgrade to Cursor version 1.0.0 or later to start new background agents, with the error message Upgrade to Cursor version 1.0.0 or later to start new background agents.
    • One user suggested using the Cursor: Start Background Agent Setup tool, which fixed the issue for them.
  • Github Permissions cause Background Agent Setup issues: Users are facing issues connecting Cursor to GitHub in the Background Agents config, resulting in an Access Denied error and failing to get installation access key for the repository, even after reinstalling the Cursor GitHub app.
    • One solution involves deleting the local repo, cloning it again, and restarting the background agent setup while another user rebuilt their base background container snapshot.
  • Linear Agent PR Workflow coming soon?: A user proposed a workflow where a background agent implements tasks and opens PRs based on issues in Linear, requesting the ability to trigger background agents programmatically for automated implementation and review.
    • This prompted enthusiasm for a Linear Agent that could be mentioned like in Slack and automated pull request opening feature.
  • Background/Asynchronous Agents explained: A user asked about the difference and purpose of background/asynchronous agents, prompting a discussion on how they can enable vibe coding by running backend tasks, API setup, and unit tests in parallel while the user focuses on frontend implementation.
    • Another user expressed that when they’re not working, Cursor will continue making progress while they’re away. Then they can come back, review, get things back on track as necessary by leaving it todo lists in markdown and such.
  • Still no Slack bot update!: Users are asking for an update on the promised Slack bot from the 1.0 announcement, as they can’t find it.
    • Forum moderators are linking to background agent documentation that does not include any mention of the Slack integration.

Cursor Community ā–· #announcements (1 messages):

Cursor 1.0 release, code review features, background tasks

  • Cursor 1.0 Released with Code Review: Cursor 1.0 is now available, featuring enhanced code review capabilities, the ability to remember mistakes, and the capacity to manage numerous background tasks.
    • Details on all updates are available in the changelog.
  • Cursor 1.0 New Video: A video has been posted along with the release of Cursor 1.0.
    • The video can be found here

Unsloth AI (Daniel Han) ā–· #general (226 messagesšŸ”„šŸ”„):

Unsloth Llama.cpp vision inference, GRPO Model Checkpoints, Learning rate and SFT, Synthetic Dataset Generation with Gemini 2.5, Deepseek R1 Qwen mergekit

  • Troubleshooting Mistral-Small-3.1 Vision Inference with Llama.cpp: A user sought guidance on using llama.cpp for inference with the unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF model, including how to send images with prompts.
    • After unsuccessfully attempting to send prompts with images (both base64 encoded and .png), another user suggested checking the llama.cpp documentation or consulting the llama.cpp community for assistance, while another suggested the llama-mtmd-cli build target.
  • Synthetic Data saves Google Credit score: A user is generating a synthetic dataset with Gemini 2.5 Pro to utilize remaining Google Cloud credits.
    • It was pointed out that VS Code agentic extensions can quickly deplete credits, costing $17 yesterday by accident, suggesting using deepseek-chat-v3 for cheaper, better reasoning, referring to image quality from this image.
  • Users Discuss Unsloth Finetuning Notebooks and Recent Issues: Users discussed finetuning notebooks, including a potential issue where the Qwen3-base notebook might be training the Qwen3 instruct version instead, the Unsloth notebooks repo was shared.
    • Users were offered a link to the correct notebook and had troubleshooting regarding older versions and compatibility (requiring transformers==4.48), but was fixed with use_exact_model_name = True in the notebook.
  • Docker Eyes Unsloth Model Distribution on Docker Hub: The AI Runtime team at Docker expressed interest in publishing Unsloth models on Docker Hub, suggesting the use of the docker model package command for packaging GGUF files as OCI Artifacts.
    • Unsloth team is open to collaboration, but raised concerns about automating the upload of all their models, inquiring about automation features similar to those in Modelscope.
  • Homunculus-12B model is uncensored and great!: Members touted the distilled Homunculus-12B model, distilled from Qwen3-235B onto the Mistral-Nemo backbone, is uncensored and explains gravity simply.
    • The model preserves Qwen’s two-mode interaction style—/think (deliberate chain-of-thought) and /nothink (concise answers)—while running on a single consumer GPU, pre-existing GGUF exists.

Unsloth AI (Daniel Han) ā–· #off-topic (3 messages):

QLoRA Instruction Tuning Experiences, Pretraining Gemma for Varied Language, Triton Learning for LLM Inference

  • QLoRA Instruction Tuning Challenges Surface: A member inquired about others’ experiences with instruction-tuning models using QLoRA, noting that their model could answer questions but failed to end responses properly after fine-tuning on a small dataset.
    • They sought insights on successful datasets and strategies for QLoRA fine-tuning.
  • Training Gemma for Diverse Online Language: A member aims to pretrain and fine-tune a model, potentially Gemma 3, to interact with diverse language from real-world online examples, leveraging a collection of 1.5 million forum posts spanning 18 years from a small political forum, in addition to datasets from classical antiquity, Persia, Muslim scholars, and eastern philosophy.
    • The goal is to create a foundationally based model for logic, rhetoric, and framing, followed by fine-tuning on internet datasets to replicate the functionality of an IT model while avoiding alignment training.
  • Speeding up LLM Inference with Triton: A MLE is interested in speeding up LLM inference by learning Triton and writing optimized kernels for sparse/quantized LLMs.
    • They asked for advice on learning Triton efficiently, shared experiences, and current pain points, and considered contributing to Triton directly.

Unsloth AI (Daniel Han) ā–· #help (102 messagesšŸ”„šŸ”„):

Orpheus-3b model new language training issues, Deepthink R2 Model, DeepSeek-R1-0528 memory usage during finetuning, VLM instruction tuning, unsloth library import errors

  • Orpheus-3b Model Learns New Language but Forgets Emotions: A user trained the unsloth/orpheus-3b model in a new language by mixing training data with basic English but found the model forgot some emotions and voices of the original speakers.
    • Another member stated that this is normal, the model specialized on your data and forgot old data.
  • DeepThink R2 model release date still in the Future: A user inquired about accessing the DeepThink R2 model but was informed that it ain’t even out yet.
  • DeepSeek-R1-0528 Faces OOM Issues During Finetuning: A user encountered an OOM error while trying to finetune DeepSeek-R1-0528 with 8 H100 GPUs using bnb 4bit quantization, suspecting the model isn’t initialized empty before checkpoint loading.
    • They ideally expected it to consume only about 40GB per GPU during loading.
  • User Aims to Finetune VLM Without Chat Template: A user wants to finetune a vision-language model (VLM) using Unsloth, avoiding apply_chat_template and instead using preformatted prompts.
    • Another member suggested that the user stick to the chat template used when it was initially finetuned.
  • Unsloth Library Import Errors Plague Colab Users: Multiple users reported import errors with Unsloth libraries in Colab notebooks, such as from unsloth import FastLanguageModel or from unsloth.dataprep import SyntheticDataKit, but one member said the notebook loads fine on a free T4 instance.
    • A member suggested ensuring unsloth import is always first and restarting the session.

Unsloth AI (Daniel Han) ā–· #showcase (3 messages):

Qwen 2.5 SFT, Image Analysis

  • Qwen 2.5 SFT Demo: A user shared a GIF showing an image analysis of Qwen 2.5 SFT.
  • Future of Image Analysis: Another user shared a gif and commented you can see where this is all going with a link to the image.

OpenRouter (Alex Atallah) ā–· #announcements (2 messages):

OpenRouter RSS Feed, Model Announcements

  • OpenRouter Launches RSS Feed for Models: OpenRouter now offers an RSS feed for real-time model updates, accessible here.
  • Stay Updated with OpenRouter Model Announcements: The new RSS feed ensures users can stay informed about the latest models available on OpenRouter.

OpenRouter (Alex Atallah) ā–· #app-showcase (5 messages):

iOS App with OpenRouter LLM backend, Spreadsheets for Business World, Personality.gg

  • iOS App integrates OpenRouter for LLM: A member plans to release an iOS app via TestFlight, utilizing OpenRouter for the LLM backend to process character cards.
    • The main challenge remaining is message formatting, with plans to incorporate additional clients in the future.
  • Spreadsheet Skills Key for Product Adoption: A member highlights the importance of spreadsheets in the business world, emphasizing that familiarity with them lowers the barrier to entry for new products.
    • Lowering the barrier to entry to a product is the best thing you can do.
  • Personality.gg for Gooners: A member shared a link to Personality.gg.
    • The member didn’t elaborate, only clarifying for the gooners.

OpenRouter (Alex Atallah) ā–· #general (253 messagesšŸ”„šŸ”„):

OpenRouter API request limits, Character cards for roleplay, Gemini 2.5, OpenAI logging, Gemini Pro vs Flash

  • OpenRouter API request limits defined: Free OpenRouter models have API request limits: 50 requests per day for deposits less than $10, and 1000 requests per day for deposits over $10, across all free models.
    • One user clarified ā€œSo if you send 5 messages to DeepSeek v3 0324 (free), you’ll only have 995 left for DeepSeek r1 (free)ā€.
  • Sillytavern is resourceful for roleplay assets: One user suggested SillyTavern as a resource for finding ā€œcharacter cardsā€ for roleplay experiments with OpenRouter.
  • Gemini 2.5 Flash model generates infinite repetitions: A user reported that structured responses in Gemini 2.5 Flash returned an infinite repetition of the same value; for example {"appereance" : "red-rimmed eyes, red-rimmed eyes, red-rimmed eyes..."}.
    • Another user suggested that the first user seek support from the Chatwise MCP platform that integrated OpenRouter, in case the integration package needed updating.
  • Data Logging concerns get addressed: After a user shared an article about OpenAI being forced to log user outputs, it was asked whether every OpenAI model on OpenRouter should now display the red bubble warning when ā€œEnable training and loggingā€ is turned off.
    • OpenRouter confirmed they don’t have zero data retention, and were asking OpenAI to check again if that setting could be enabled in light of the new court demands.
  • Kingfall coding beast lives and dies: Users discussed Kingfall, a model mentioned on Reddit, with some saying it was a ā€œbeast at codingā€ and ā€œbetterā€ than Gemini 2.5 Pro.
    • Some users said it was out on Google AI Studio, but others reported that it had been removed and one claimed it ā€œcould even be the same OPā€ as a demo with 2.5 Pro.

HuggingFace ā–· #general (150 messagesšŸ”„šŸ”„):

Responsible Prompting API by IBM, Consensus Validation for LLM Outputs, Hugging Face Transformers Installation on Windows, French AI Company's Lack of Moat, Choosing an LLM Model

  • IBM’s Responsible Prompting API Recommends Prompt Tweaks: IBM Research is developing the Responsible Prompting API, a system that recommends prompt tweaks pre-inference to make LLM outputs more responsible, productive, and accurate, as detailed in this paper.
    • A user study presented at CHI’25 provided feedback for improving the system, and a demo is available on Hugging Face.
  • Blockchain-Inspired Validation Bolsters LLM Reliability: A new concept paper, Consensus Validation for LLM Outputs: Applying Blockchain-Inspired Models to AI Reliability, explores applying blockchain consensus mechanisms to LLM outputs to improve reliability and trustworthiness.
    • The paper suggests applications for AI agents, legal/medical tools, and AI alignment, available here.
  • Transformer Installation Troubles Trounce Novice User: A member faced issues installing Hugging Face Transformers on Windows natively and opened a GitHub issue detailing their steps, including forking the repo, cloning it, creating a virtual environment, and running pip install -e ".[dev]".
    • Another member using Python 3.10 advised checking for package conflicts and ensuring a virtual environment is used.
  • French AI Startup Has no Technical Moat: A member alleged that a French AI company, hcompany.ai, which raised over 400 million euros, lacks a technical moat.
    • They claim the operation could be replicated easily and that the company’s government-funded model, Lucie-7B, performed poorly and was even racist, leading to its API being pulled which can be seen here.
  • Gemini 2.5 Pro becomes New LLM Fav: Members discussed LLM choices, with some moving to Gemini 2.5 Pro due to its comparable performance to GPT-4o at a cheaper price, but for local Deepseek is probably the best but most can’t run it since it’s 671b params.
    • For local use, Deepseek was recommended if vram allowed, or Mistral small 3.1 and there exists a leaderboard that is constantly being battled to be on top of.

HuggingFace ā–· #today-im-learning (2 messages):

Fraud Detection Resources, Financial Transactions Security

  • User quests resources on Fraud Detection: A member is trying to learn about fraud detection in financial transactions.
    • The member asked if anyone had resources on the topic.
  • Financial Transaction Security Education: Interest is sparked in learning about methods to secure financial transactions.
    • This includes understanding the nuances of detecting and preventing fraudulent activities.

HuggingFace ā–· #i-made-this (3 messages):

Claude Desktop MCP Playground, Keras parallel training utility

  • Claude Desktop Gets MCP Playground with GUI!: A HF community member released a major update to their MCP Playground, adding a GUI and running 40+ operational servers to streamline adding powerful MCP servers to Claude Desktop without config headaches.
    • They are looking for developers and users to test the repo and provide feedback.
  • Find Keras Models in Parallel!: A lightweight utility for training multiple Keras models in parallel and comparing their final loss and last-epoch time has been created.

HuggingFace ā–· #reading-group (1 messages):

shadow_lilac: Reading group seems amazing tbh, i’d gladly participate once new schedule


HuggingFace ā–· #NLP (51 messagesšŸ”„):

NLP on Windows, Semantic Similarity, BERT embeddings

  • NLP work natively on Windows?: Some members discussed doing NLP work on Windows natively (no WSL, no dual-boot, no VM).
    • One member said it’s hard but possible, but eventually they gave up and switched to WSL and then dual booted, citing headaches and performance issues.
  • Upgrading RAM for Windows Subsystem for Linux: A member running WSL experienced Jupyter kernel crashes due to RAM issues, running only 8GB of RAM and allocating 7GB to WSL.
    • Another member recommended upgrading to at least 16GB, though 32GB would be more future proof for handling modern OS and browser memory usage.
  • Semantic Similarity tools using BERT: A member asked for GitHub repos or HF Space apps for comparing the semantic similarities between two paragraphs.
    • Another member recommended Sentence-BERT with the sentence-transformers library, and provided a link to the repo.

HuggingFace ā–· #smol-course (2 messages):

Agents course, Certificate deadline

  • Agents Course Deadline Extended: The deadline for the agents course has been extended to July 1, 2025, as referenced in a Discord channel.
    • A user inquired about certificate eligibility upon starting the course, noticing an earlier May 1, 2025 deadline.
  • Confusion over Agents Course Dates Resolved: Users expressed confusion over conflicting dates for the agents course completion and certificate eligibility.
    • A clarification was provided, pointing to a Discord channel with updated information extending the deadline.

HuggingFace ā–· #agents-course (10 messagesšŸ”„):

Penguin Youtube Question, Passing audio to agents, Smolagens usage with Gemini, Course sign-up deadlines

  • Penguin YouTube Sighted!: A member asked for help on solving the penguin YouTube question, and another member suggested checking out the YouTube video tool.
  • Smol Audio Passed to Agents!: A member inquired about the possibility of passing audio to an agent using smolagens, assuming the model supports it.
    • Multiple members expressed that this was possible.
  • Gemini Flash Gets Good Grades!: Gemini 2.0 Flash works well with smolagents [openai] and ā€˜openaiservermodel’, offering 1500 calls/day on the free tier and scoring 50pt with basic web/Wikipedia search tools.
    • Adding a delay helps avoid request limits (around 15 requests per minute).
  • Course Sign-Ups Not Too Late!: A member inquired whether it was too late to sign up for the course.
    • Another member clarified that auditing is always available, and the deadline for earning the two certificates is July 1, 2025.

OpenAI ā–· #ai-discussions (174 messagesšŸ”„šŸ”„):

Codex CLI API, Adobe Firefly, O3 Pro Mode, Claude Max vs ChatGPT Pro, ChatGPT Connectors

  • OpenAI Pro Subscription: Codex CLI API Still MIA: A member expressed frustration that the Codex CLI API is still not included in the Pro subscription, suggesting that the Pro subscription should include all features of other plans.
  • Adobe Firefly lagging behind transformer based models: A member lamented that Adobe Firefly is still diffusion based and has not upgraded to a transformer based model like 4o for better generative fill.
  • Users anxiously await o3 pro mode: Several members are eagerly awaiting the release of O3 Pro Mode and one is ready to dial into the system once they have it.
  • Claude Max Screaming Value vs ChatGPT Pro: One member stated that Claude Max at $200 is a screaming deal because in 12 days they used equivalent of $900 in API tokens compared to ChatGPT Pro.
  • Connector speculation and Claude’s custom MCPs: Some members discussed the recently released ChatGPT Connectors and whether they would solve the context window problem, while others mentioned that Claude has a version of connectors on their website for pro users in the form of custom MCPs that run on servers and can do anything like send a discord message or use GitHub.

OpenAI ā–· #gpt-4-discussions (3 messages):

AI in housing projects, Software and Business Developer with AI competence

  • Architect explores AI in housing design: An architect from Brazil is exploring the use of AI and parametric design in a housing project with 1,500 lots.
    • He is interested in learning and collaborating with the community.
  • Software developer offers OpenAI assistance: A software and business developer with AI competence offered assistance with OpenAI products.
    • They invited users to ask questions and choose the appropriate forum for their inquiries.

OpenAI ā–· #prompt-engineering (19 messagesšŸ”„):

D&D Adventure Generation, LLM Prompt Pipeline, Consistent Tone in AI Content, AI as a Sponsor for NA Meetings, Prompt Version Tracking and Evaluation

  • Crafting Consistent D&D Adventures with LLMs: A member is developing a side project that generates D&D-style one-shot adventures using structured prompt chains to create a setting, location, premise, and complication.
    • They are exploring whether to use a dedicated editing pass for tone consistency or include a shared tone paragraph in every prompt, seeking resources on similar content-generation pipelines.
  • Navigating AI as a Virtual Sponsor for NA Meetings: A member seeks to create a prompt where the chat can act as a sponsor for NA meetings, providing support until an in-person sponsor is available.
    • Another member suggests inputting the request as if speaking to a person, detailing the need for guidance and support in the absence of a human sponsor, emphasizing the model’s capacity to handle conversational and nuanced interactions.
  • Y Combinator Podcast Spotlights Prompting Insights: A member listened to a recent Y Combinator podcast/video on prompting and shared insights on using evaluation mechanisms and corrections from subject matter experts to enhance AI performance in business.
    • They propose setting up a feedback loop where the AI outputs multiple options with justifications, using the selected option’s commentary to refine the original prompt.
  • Veo3 Unleashes Consistent Characters and Audio: One member has achieved major results of consistent characters and audio using Veo3.
    • They asked to post the video showcasing this, but it’s unclear if they received the green light to do so.
  • Diving into the Core of Prompt Engineering: A member shared their basic yet effective approach to prompt engineering: using a familiar language, understanding the desired output, and explaining the requirements clearly, with careful verification of the AI’s output.
    • They highlight the importance of fact-checking, especially for math, sources, code, and details prone to hallucination.

OpenAI ā–· #api-discussions (19 messagesšŸ”„):

D&D Adventure Module Generation, Consistent Tone in Prompt Pipelines, Prompt Version Tracking and Evaluation, AI Sponsor for NA Meetings, Veo3 Results

  • Crafting D&D One-Shots with AI: A member is developing an AI pipeline to generate D&D-style one-shot adventures, focusing on inspirational briefs rather than full modules.
    • They are exploring whether to use a dedicated editing pass for tone consistency or include a shared ā€œtone paragraphā€ in every prompt, seeking resources on content generation pipelines.
  • Fine Tuning AI Tone: Some members suggested adding a tone paragraph on the input or output of the AI to reinforce stability through repetition, but this may not be user friendly.
    • Others note that getting the AI to remember story context, progress logically, follow D&D rules, and remember characters is also very important.
  • Y-Combinator Podcast on Prompt Evaluation: A member recommended a recent Y-Combinator podcast/video on prompting that emphasizes evaluation mechanisms and corrections from subject matter experts to create powerful AI in business.
    • The member suggested setting up a feedback loop to improve the original prompt through commentary and justification.
  • Prompt Engineering Basics: Members discussed basic prompt engineering which involves picking a language you know well, understanding exactly what you want from the AI, explaining it accurately, and carefully checking the output, which includes fact checking.
    • One member inquired about getting stronger results in math/science stuff from the model, asking what level is needed.
  • AI as a Sponsor for NA Meetings: A member is looking to create a prompt where a chatbot can act as a sponsor, similar to what one might have when going to NA meetings, until they can get one in person.
    • A member recommended treating the model like a person in the prompt, telling it that you’re fairly new to working with models, and what you hope the model will do.

aider (Paul Gauthier) ā–· #general (171 messagesšŸ”„šŸ”„):

Qwen 2 35B vs DS R1.5, Gemini 2.5 Pro release date predictions, Gemini 2.5 Pro in chat mode downgrades, Aider webapp IDE, Gemini 2.5 Pro Aider Polyglot benchmark

  • Qwen and DS Trail Gemini and Claude: Qwen 2 35B and DS R1.5 are both reportedly worse than Claude and Gemini, with Qwen being slightly better than DS in some user tests.
    • A user noted that both models took around 4 minutes to figure out how to draw a basketball.
  • Gemini 2.5 Pro launch date speculation heats up: Users speculated about the launch date of Gemini 2.5 Pro, referencing a June 5th tag and internal benchmark results, while some predicted a release around June 10th based on historical patterns.
    • One user pointed out that Google historically soft launches on Tuesdays with blog posts or GA announcements on Wednesdays.
  • Gemini 2.5 Pro’s Chat Mode Quality Degrades: A user reported that Gemini 2.5 Pro’s chat mode now duplicates the entire file needing changes, instead of outputting modified parts in diff format, rendering chat mode unusable.
    • The user added that they can only one-shot in code mode.
  • Aider-centric Webapp IDE Emerges: A user shared a link to an Aider-centric webapp IDE for alpha testing.
    • It was described as an Aider-centric webapp IDE.
  • Gemini Polyglot benchmark scores leaked: Users discussed a leaked test showing Gemini 2.5 Pro achieving 86% on the Aider polyglot benchmark, suggesting the benchmark may be overfitted or no longer relevant.
    • One user suggested creating a new AI vibe code benchmark and another mentioned that Gemini 2.5 Pro also improved on Humanity’s Last Exam.

aider (Paul Gauthier) ā–· #questions-and-tips (14 messagesšŸ”„):

Figma MCP on Claude Code, Aider vs Cursor, Gemini STT and Speech-to-Text Workflows, Superwhisper and Wispr Flow, Aider development style

  • Users Debate Aider versus Cursor for Development: A user pulse-checked users of Aider who have switched to Cursor, noting that Cursor feels slower and its agent mode goes wild and has to be reigned in.
    • The user mentioned it’s not as intuitive to run multiple blitzes with simultaneous AI editing processes compared to Aider.
  • Aider development style: One user stated that Aider’s approach suits a style of development characterized by careful, considered prompting, terminal-driven workflows, and branching for the ability to revert to good known states.
    • They also mentioned an aversion to the fling stuff at the wall vibe from Cursor users, though they are experimenting with Zed to stretch their brain.
  • Gemini STT and Speech-to-Text workflows: A member asked about Gemini STT and adding more speech-to-text workflows, along with interest in custom terminology.
    • They specified preference for superwhisper and having tried Wispr Flow for iOS.
  • Installing Figma MCP on Claude Code: A user inquired about how to install Figma MCP on Claude Code, specifically asking for a command to run.

GPU MODE ā–· #general (4 messages):

Hardware-aware Optimizations, ML Compilers, PyTorch CUDA, CUTLASS/CUB/cuDNN, Triton

  • Suggest a CUDA matmul scratchpad: A member suggested writing a matmul from scratch in CUDA that achieves 85% of the throughput of cublas in bf16 or fp16 using tensor cores as a great learning exercise.
  • Recommend Getting Your Hands On.: A member suggested profiling your workload to identify easily optimizable ā€œlow-hanging fruits.ā€
    • Examples might include inner loops using simple for loops without concurrency. Otherwise, you may just learn from existing optimizations to decide what are the most relevant skills.
  • Dreaming of distributed GPU brain-picking: A member requested to chat with or get advice from someone experienced in training distributed GPUs with 64+ fleets.
    • They are seeking insights given that colleagues are physicists without interest.

GPU MODE ā–· #triton (1 messages):

Megakernel, Full-Model Kernel, KernelLLM, KernelLLM to the Rescue!

  • Megakernel Mania Manifests: A member inquired about the existence of a megakernel or full-model kernel example in Triton for popular architectures like LLaMA.
    • Finding none, they considered writing one, potentially with assistance from KernelLLM.
  • KernelLLM to the Rescue: The member is thinking to write a megakernel/full-model kernel.
    • They are planning to use KernelLLM for assistance.

GPU MODE ā–· #cuda (7 messages):

CTA Data Transfer, Blackwell TF32 Block Scaling, GMEM Coalescing

  • CTA Communication circumventing Global Memory?: A member inquired about directly transferring data between CTAs without using global memory.
    • Another member pointed out that from Compute Capability 9.0 onwards, shared memory can be accessed within a thread block cluster (CGA), although it’s slower than local shared memory and works only between SMs in the same GPC.
  • Blackwell’s TF32 Scaling Pondered: Discussion centered on block scaling with TF32 types in Blackwell, contrasting it with sm80 and sm90, where one tensor operand can reside in registers.
    • Theorizing that with Blackwell, both tensor operands reside in smem, and utilizing tensor operations efficiently might require staging a copy of one tensor in rmem, scaling it, and then copying it to smem for each channel.
  • GMEM Coalescing Performance Puzzle: A member implemented two kernels to demonstrate GMEM coalescing, tiling Warpsize x Warpsize on the C matrix in the first and tiling a strip of the same dimensions in each row in the second.
    • Despite both kernels showing coalesced warps when printing x,y mappings, the first kernel was almost twice as fast, leading to questions about whether the tiling/stripping approach of the output matrix C influences performance.

GPU MODE ā–· #torch (8 messagesšŸ”„):

Torch Compile, AITemplate status, Tensor Constant Wrapping

  • Tensor Wrapping Troubles Triggering Recompiles: Wrapping constants in Tensors is necessary to prevent recompiles, though the cpp interface automatically converts them back.
    • A member noticed an issue where ___as_tensor(alpha).item() == 0.5 triggers recompiles and suggests it’s not obvious this is needed.
  • Torch Compile vs. AITemplate: A Modern Face-Off: AITemplate is in maintenance mode, so torch.compile is the recommended choice for active development.
    • For a C++ runtime alternative, AOTInductor is suggested over AITemplate due to AIT’s lack of active maintenance.
  • Torch Compile Achieves Superior Performance vs AITemplate: Torch.compile now offers comparable or better performance than AITemplate, solidifying its position as the preferred tool.
    • A member indicated that torch compile will have better performance than AITemplate.

Parallel Keras Models, Experience Replay Pool, Scaling Intelligence

  • Keras Models Go Parallel: A lightweight utility for training multiple Keras models in parallel and comparing their final loss and last-epoch time is now available: parallel_finder.
  • Experience Replay Gets a Boost: A new Python class offers a multiprocessing-powered Pool for efficiently collecting and managing experience replay data in reinforcement learning: Pool.
  • Scaling Intelligence Blogpost: A blogpost about scaling intelligence was shared: real.optimus.prime.

GPU MODE ā–· #beginner (2 messages):

GPU access costs, ECE408 lectures

  • ECE408 Lectures Recommended for GPU learning: A member suggested starting with ECE408 lectures to learn more about GPUs.
    • The same member tried watching them but found the audio quality bad.
  • B200/H200/H100 GPU access costs: A member is curious about the cheapest way to access B200/H200/H100 GPUs.
    • No responses were given.

GPU MODE ā–· #jax (1 messages):

blueredblue: How does ffi_call work with pmap, will one kernel get launched per device?


GPU MODE ā–· #torchao (1 messages):

drisspg: I’ll do a review today


GPU MODE ā–· #off-topic (1 messages):

TiKZ, GIF

  • GIF identified as TiKZ: A member identified a GIF composed of many images.
    • The member speculated it was created using TiKZ, a LaTeX package for creating graphics programmatically.
  • TiKZ is used to create GIFs: TiKZ is a tool that can be used to generate image by code.
    • These images can be fused together to create GIF.

GPU MODE ā–· #irl-meetup (1 messages):

GTC Paris, CUDA C++ Workshop, Connect With the Experts

  • GTC Paris bound for VivaTech!: GTC Paris is scheduled for June 10–12 at VivaTech, featuring two CUDA highlights at the event: GTC Paris.
  • CUDA C++ Workshop Offers Hands-On Training: The CUDA C++ Workshop on June 10 offers full-day, hands-on training in modern CUDA, optimization, and debugging, so sign up now.
  • Connect With the Experts at GTC Paris: There will be in-person Q&A sessions on CUDA, AI, HPC, and more with the engineers behind the tech at GTC Paris, so bring your hardest questions!

GPU MODE ā–· #rocm (10 messagesšŸ”„):

Root user, Ubuntu, Sudo

  • Root Paradox: Ubuntu’s Sudo Surprise: A user was surprised to find that even as the root user on Ubuntu 22.04.5 LTS, they received a ā€˜root is not in the sudoers file’ error when trying to use sudo.
    • Another user pointed out the paradox, noting, ā€˜you are literally the root user’ suggesting the system might be misconfigured by default.
  • Ubuntu’s Default User?: A user questioned whether the default user for Ubuntu is typically ā€˜Ubuntu’.
    • The discussion seems to stem from confusion over how root privileges and sudo are configured in Ubuntu environments.

GPU MODE ā–· #liger-kernel (3 messages):

Triton, \alpha-entmax kernels, sparsemax

  • Triton Gets Official \alpha-entmax Kernels: Official Triton implementations for \alpha-entmax kernels are now available, with sparsemax corresponding to \alpha=2.0 in the adasplash repo.
  • GPU Sorting Suffers: Sorting is very inefficient in GPUs, according to a member.

GPU MODE ā–· #self-promotion (5 messages):

Keras parallelization tool, Reinforcement learning pool, GPU Workload Security

  • Keras Models Parallelized: A user shared a lightweight utility for training multiple Keras models in parallel and comparing their final loss and last-epoch time.
  • Reinforcement Pool Plays Efficiently: A user shared a Python class offers a multiprocessing-powered Pool for efficiently collecting and managing experience replay data in reinforcement learning.
  • Mobicham’s Overhead Observation: A user shared a link to a post by Mobicham, suggesting that the observed behavior is mostly due to overhead.
  • GPU Workload Security Interest Check: A user inquired about interest in GPU workload security or hardening container security, inviting others to PM them for discussion.

GPU MODE ā–· #šŸæ (3 messages):

Triton Kernels, GPU uops information

  • Triton Converts Newbie to Kernel Coder: A user went from not knowing what Triton is to being able to write valid Triton kernels.
    • Another user congratulated them and asked if they used the for loop approach.
  • uops.info for GPUs When?: Users stated that uops.info is unavailable for GPUs, unlike for AMD or Nvidia.
    • The same user is looking forward to the day when detailed GPU micro-operations information becomes available.

GPU MODE ā–· #thunderkittens (6 messages):

Thunderkittens Flexibility, Producer/Consumer Model in Thunderkittens, TMA and HBM usage with Thunderkittens, AMD Porting

  • Thunderkittens flexibility gets explored!: A member is working on making Thunderkittens more flexible than the producer/consumer model, especially for multi-step workflows like the B200 warp specialization.
    • The user asked the channel about its usecases, expressing a desire to learn more.
  • Producer/consumer model perplexes user!: A member expressed confusion about the producer/consumer model in Thunderkittens, particularly regarding the parameters of functions like common_setup.
    • The user has read the matmul.cu file multiple times but still doesn’t fully understand it and couldn’t find any relevant documentation.
  • Asynchronous TMA/HBM usage with Thunderkittens gets discussed: A member wants to write an addition kernel for two 2D tensors, using the producer/consumer model and TMA to asynchronously read from HBM.
    • They are unsure how to define the kernel layout and the producer/consumer model for this case and asked for help.
  • AMD porting strategy gets DM’d: One member DM’d another about porting/generalizing Thunderkittens to AMD, calling it involved.
    • They remarked it does seem possible and sent some notes.
  • mma.sync.aligned becomes central primitive: A member pointed out that the core matmul operation in Thunderkittens is mma.sync.aligned.m16n8k16.row.col.f32.bf16.bf16.f32 primitive (link), allowing the abstractions to be consistent.
    • They admitted to not knowing the best practice for dealing with other sizes, and suggested padding.

GPU MODE ā–· #submissions (11 messagesšŸ”„):

Grayscale leaderboard submissions, AMD Mixture of Experts leaderboard submissions, Prefixsum leaderboard submissions, AMD FP8 MM leaderboard submissions

  • T4 Personal Best Time: A member achieved a personal best on T4 for the grayscale leaderboard with a time of 17.2 ms.
  • MI300 Shows 8th Place: A member secured 8th place on MI300 for the amd-mixture-of-experts leaderboard, achieving a time of 9.18 ms.
  • T4 First Place Finish: A member achieved first place on T4 for the prefixsum leaderboard with a time of 8.94 ms.
  • H100 Sets Personal Best: A member set a personal best on H100 for the grayscale leaderboard with a time of 1459 µs.
  • MI300 Achieves Success: A member successfully submitted on MI300 for the amd-fp8-mm leaderboard with a time of 150 µs.

GPU MODE ā–· #ppc (1 messages):

Open 2025 course statistics

  • Open 2025 Course Stats Snapshot: A member shared statistics from the Open 2025 course instance at ppc.cs.aalto.fi/stat/open2025/.
    • The member noted that the statistics are not real-time but can be updated occasionally, particularly as deadlines approach.
  • Deadlines Approaching: The course instructor intends to update the course statistics as the deadlines approach.
    • This will provide students with more current information on their progress and standing in the course.

GPU MODE ā–· #tpu (2 messages):

SparseCores in TPUs, Transformer training/inference, Nvidia Tensorcore sparsity

  • SparseCores Differs From Nvidia’s: A member asked whether SparseCores in TPUs help accelerate transformer training/inference, expecting it to function similarly to Nvidia Tensorcore’s sparsity feature.
    • The member noted that SparseCores were ā€œquite differentā€ from their expectations of Nvidia Tensorcore’s sparsity feature.
  • Sparsity Acceleration Questioned: The original poster wondered if the SparseCores in TPUs could accelerate transformer training and inference tasks.
    • This question stemmed from an initial expectation that SparseCores would behave like Nvidia’s Tensor Cores with their sparsity feature, but the user found them to be quite different.

GPU MODE ā–· #factorio-learning-env (28 messagesšŸ”„):

FLE decoupling, FLE roadmap, FLE vision, Factorio AI API, LLM agents

  • FLE decoupling proposed: A member proposed a complete decoupling of the Factorio Learning Environment (FLE) from the Python integration, experiment code, and prompts to create a versioned Docker image with an associated FLE mod.
    • This approach aims to allow users to integrate with FLE using their favorite programming language via a JSON API, and simplify environment setup by pulling down the Docker image.
  • Call for FLE roadmap: A member suggested creating a 3-4 month roadmap for FLE to make it easier to approach and more influential, focusing on the vision for user interaction and project structure.
    • The roadmap should clarify the roles of the official Factorio environment, the official FLE integration, and the official FLE benchmarking projects.
  • Factorio AI API discussion: A member discussed creating a ā€œFactorio AI APIā€ to provide access to base interactions with the environment, cutting down on Lua scripts and shifting advanced logic to integrations and users.
    • Another member inquired about the minimum functionality needed for the FLE integration and expressed uncertainty about which parts of the codebase are actually used.
  • LLM Agents Survey Reading Group Forming: A member expressed interest in forming a reading group around the survey paper, ā€œLarge Language Model Agent: A Survey on Methodology, Applications and Challengesā€, seeking to trade notes and discuss agent research in LLMs.
  • Enhanced FLE Mod Features Envisioned: A member supported the idea of a versioned Docker image with an associated FLE mod, suggesting enhancements beyond pre-loading initialization scripts, admin tools, and agent tools.
    • These enhancements could include a GUI for viewing agent inventories, access to a persistent message log, and the ability to click/drag across agents to issue instructions, donate items, or perform other interventions.

GPU MODE ā–· #amd-competition (49 messagesšŸ”„):

AMD FP8 GEMM, MI300 Cache Line Optimization, DPP Transpose, H100 Competition, Backward Pass Optimization

  • GPUMode AMD FP8 MM Solution Revealed: A member shared their FP8 solution implementation on the GPUMode AMD FP8 MM GitHub, which was inspired by another member’s writeup.
    • Another member added that *ā€œYour solution is so much shorter.ā€
  • AMD Challenge FP8 Solutions: Two members shared their FP8 solutions for the AMD Challenge, providing links to their code on GitHub and another optimized version.
    • One member wrote, *ā€œafter looking at yours i feel the same.ā€
  • MI300 Cache Line Optimization Discussed: A member explained that on MI300 and other AMD hardware, the GPU requests entire cache lines, suggesting shuffling memory requests for efficient layout without performance loss.
    • He uses a ā€œZZZZ patternā€ which yields around 60% cache coalescing and aims for 90+% with this technique and has around 86% L2 hit rate.
  • Amateur code for FP8 GEMM: A member shared their 100-line code kernel for FP8 GEMM, which secured a top 5 position and a newer version with a 5us speed improvement.
  • H100 vs MI300X benchmark requested: A member suggested adding an H100 submission to the current AMD FP8 leaderboard for a ā€œMarvel vs DCā€ competition between HIP gurus and CUDA wizards.
    • Another member noted such a benchmark is ā€œprone to benchmarketingā€ and is ā€œvery lose loseā€, but new H100 benchmarks are coming.

GPU MODE ā–· #cutlass (7 messages):

CuTe Layout, Blackwell Performance, cuDNN Performance, Cutlass Error

  • CuTe Layout Clarification Requested: A user sought confirmation on their understanding of CuTe layout, particularly regarding the interpretation of layouts like ((2, 3)):((1, 4)) versus (2, 3):(1, 4) and their equivalence to (3, 2):(4, 1).
    • The user noted that the former layout with double parentheses is actually the same as the latter layout (3, 2):(4, 1).
  • Blackwell B200 crushes Petaflop benchmarks: A user posted benchmark results on a Blackwell B200, achieving 0.99 petaflops/sec in fp16_gemm, 1.97 petaflops/sec in fp8_gemm, 2.69 petaflops/sec in nvfp4_bf16_gemm, and 3.09 petaflops/sec in nvfp4_nvfp4_gemm.
    • The performance of mixed_mxfp8_bf16_gemm lagged at 0.23 petaflops/sec, raising questions about whether the comparatively slower MXFP8 performance is a software or hardware issue.
  • cuDNN offers performance boost on Blackwell: A user noted that a cuDNN path will give better performance in some cases on Ampere and Hopper, clarifying that cuDNN backend offers best perf on Blackwell.
  • Cutlass example error on RTX 3090: A user reported encountering a Got cutlass error: Error Internal at: 285 while attempting to run a Cutlass example (turing_tensorop_gemm.cu) on an RTX 3090 (sm_86 architecture).
    • The user asked for assistance in understanding the cause of the error.
  • CuTe layout indexing clarification: A user requested help understanding why the (2, 2, 2) layout has a stride of (4, 1, 2) and included a table of logical indices, coordinates, and physical indices to illustrate their understanding.
    • They specifically sought clarification on how the computed physical indexing maps to the tensor representation and whether coordinates are natural or logical.

LM Studio ā–· #general (67 messagesšŸ”„šŸ”„):

AgenticSeek, OpenManus, Embedding Models, Gemma vs DeepSeek, ROCm llama.cpp vision module

  • AgenticSeek morphs from OpenManus: A user inquired about AgenticSeek, which was formerly known as OpenManus, similar to how OpenDevin was renamed OpenHands.
    • The name change may be due to copyright issues.
  • Gemini Pro users hit rate limits: Users reported that Gemini Pro is useless with these new rate limits now, 100 messages every 24 hours.
    • The rate limits primarily occur when using automated scripts to call the API, not within the AI Studio interface itself, although some experienced limits within the Gemini app.
  • Arcee AI Cooks Homunculus-12B Model: Arcee AI’s Homunculus-12B model, distilled from Qwen3-235B onto the Mistral-Nemo backbone, is reported to breathe new life and smarts into Mistral-Nemo and adds reasoning.
    • It aims to preserve Qwen’s two-mode interaction style while running on a single consumer GPU.
  • Vision module slows in ROCm llama.cpp: A user reported that the new ROCm llama.cpp v1.34.1 runtime significantly slowed down the vision module, with response times increasing from approximately 1 second to over 10 seconds on a 7900XT 20GB GPU.
    • They were asked to share results and screenshots in the specified channel for further investigation.
  • LM Studio Gets Creative With Naming Chats: Users noticed LM Studio automatically names chats based on some logic, even when not named in advance, and are curious about the prompt it uses.
    • You can run lms log stream in a terminal window to see how we ask the model to choose a name.

LM Studio ā–· #hardware-discussion (67 messagesšŸ”„šŸ”„):

NAND Cell Refreshing, AVX512 Support in Llama.cpp, Adrenalin 25.6.1 and Flash Attention, ReBAR and Shared Memory Issues, Model Recommendations for LM Studio

  • NAND Cells Need Refreshing Cycles?: Members discussed how NAND cells slowly leak charge over time, leading to the concept of ā€œread refresh,ā€ where data is moved to new cells if not rewritten, with older filesystems periodically revisiting data, which can be sped up by deleting and rewriting them.
    • There was a mention of OS files written at system install becoming slower to read, and that some filesystems purposely revisit data periodically to check for degradation.
  • AVX512 Support: Fact or Fiction?: It was mentioned that some laptops with Intel CPUs ending with ā€˜H’ have AVX512 support, while Ryzen 3 laptops have AVX2, but there was a question regarding whether llama.cpp actually has AVX512 support.
    • It was confirmed that there is a mode for AVX512.
  • AMD Adrenalin Fixes Flash Attention Performance?: AMD’s Adrenalin 25.6.1 driver purportedly fixed most flash attention performance degradation issues when using Vulkan with llama.cpp on AMD GPUs.
    • However, Gemma models still exhibit degraded performance, running at about 2/3 of their original speed, and models are loaded into shared memory instead of VRAM without ReBAR.
  • ReBAR Impacts Shared Memory Allocation: Users reported issues where, without ReBAR, models are loaded directly into shared RAM, and even with ReBAR enabled, the system fills system memory before utilizing dedicated GPU memory.
    • One user explained that LM Studio loads 32GB of the model to system memory before transferring it to dedicated GPU memory, but subsequent memory loading goes to shared memory, potentially causing crashes, showing attached images of memory allocation.
  • Hot Models Right Now?: Members recommended Gemma3, Qwen3, and DeepseekR1 0528 as currently popular LLM models, depending on memory capacity, Dubesor’s benchmarks were also recommended.
    • Qwen3 32B and Llama 3.3 were mentioned as high performing models on the linked benchmark.

Latent Space ā–· #ai-general-chat (109 messagesšŸ”„šŸ”„):

Langfuse OSS, Shisa v2 405B model, ChatGPT integrates with Internal Tools & Adds Record Mode, Veris AI seed funding, Anthropic cuts Claude capacity

  • Langfuse Launches Full-Featured OSS: Langfuse went full feature OSS which looks promising for LLM apps/projects; One user reported using it self-hosted consistently for several months and being happy with the features.
    • Another member tried to set it up on their NAS but ran into a couple of issues.
  • Shisa V2 Debuts in Japan: Shisa.ai released the Llama3.1 405B full fine tune Shisa v2 model which is allegedly the highest performing model trained in Japan and competitive with GPT-4o and Deepseek-v3 on Japanese tasks.
    • Model and quants are on HF, and an fp8 is hosted for quick tests.
  • Netlify and Neon Power New App Builder: Netlify announced Netlify DB, a serverless Postgres database powered by Neon, designed for AI-native development, aiming to reduce friction between code and data and enable AI agents to build fullstack, data-driven applications from prompts to production, with easy setup via netlify dev as per this Netlify blog.
  • Zapier’s AI Fluency Exam: Zapier now requires 100% of new hires to be AI fluent, measuring AI fluency among employees across different levels (Unacceptable, Capable, Adoptive, Transformative) through screenings, skill tests, async exercises, and live interviews, with role-specific sample interview questions as per this thread.
    • Zapier’s AI fluency levels are evaluated through screenings, skill tests, async exercises, and live interviews.
  • Alibaba’s Qwen Team Launches New Models: The Alibaba Qwen team launched the Qwen3-Embedding and Qwen3-Reranker Series, available in various sizes (0.6B, 4B, 8B), supporting 119 languages, and showcasing state-of-the-art performance on MMTEB, MTEB, and MTEB-Code, as per this announcement.
    • The models are open-source on Hugging Face, GitHub, and ModelScope, and accessible via Alibaba Cloud API.

Manus.im Discord ā–· #general (101 messagesšŸ”„šŸ”„):

Manus context vs alternatives, Dev tools replit vs cursor, Invitation spam, AI podcast looking for manus dev guest, Manus credits vs alternatives

  • Manus Provides Deeper Context than Alternatives: Members discussed that using the same prompt gives a more shallow context on alternative solutions, versus the detailed context and wealth of detail given by Manus.
    • It was suggested that a larger context means more resources (cheap electricity, memory storage, faster chips etc) so this performance could be linked to resources.
  • Replit competes with Cursor as development tool: A developer mentioned having to take what Manus does and put it through Cursor or whatever IDE I am using and finagle it and fix it until it works, and another suggested why not run it through Replit with security checks by Devin 2.0.
    • The first developer responded that they used Replit for 2 months, and was not crazy with the results even though it was a year ago.
  • Invitation Link Spamming Annoys Users: Users expressed frustration with the abundance of invitation links being shared, questioning why did they even add it back? 😭
    • It was suggested to create a dedicated channel for invitation codes in order to contain the spam.
  • AI Podcast Seeks Developer Guest: A member creating an app with Manus and looking for a developer, offered the opportunity to join his AI podcast to promote the developer’s projects to a 300,000+ audience.
    • The member also noted I’m actually in talks with someone from manus to host a podcast for them… but i’m inpaitent haha.
  • Manus Credit Costs Questioned: A user expressed that Manus could be so much better if they didn’t charge so much per task, so now I use alternatives because It just isn’t convenient to use.
    • Another user recommended looking at the guides and using other tools to save some credits in the channel [<#1370393476029616238>].

Modular (Mojo šŸ”„) ā–· #general (3 messages):

Discord's @everyone Tag, Notification Preferences, Announcements Channel

  • Debate over Discord’s (at)everyone Tag Usage: A member suggested using the (at)everyone tag to increase engagement, drawing inspiration from the Raylib server’s practices.
    • Another member responded that the general consensus is to avoid (at)everyone notifications, as most users find them disruptive, but suggested following the official announcements channel instead.
  • Official Announcements Channel Suggested for Important Updates: The announcements channel was recommended as a means to receive notifications for important posts without using the (at)everyone tag.
    • Members can follow the official announcements channel <#1098765954302873621> to receive these notifications.

Modular (Mojo šŸ”„) ā–· #mojo (63 messagesšŸ”„šŸ”„):

StringLiteral autopromotion, Slablist performance, JSON Parser in Mojo, Mojo memory safety vs Rust, Mojo origin tutorial

  • StringLiteral auto-promotes to String: The Modular team plans to make StringLiteral autopromote to String, similar to how IntLiteral autopromotes to Int.
  • Slablist shows append performance: An engineer shared his decade-old work on slablist performance, quantifying its benefits in this PDF, and visualizing reaping thresholds here.
  • JSON Parser hits reallocation bottleneck: A member is working on a JSON parser (EmberJson on GitHub), and finds reallocations to be a bit of a bottleneck when parsing large structures such as this example.
  • Mojo relaxes mutable references: Mojo doesn’t need to prevent overlapping mutable references like Rust, which is a major usability benefit, according to one of the team members.
    • They note that Mojo already rejects mutable aliases when used in functions but the team is still thinking about how to model thread safety.
  • Tips for LLM Code Generation: The Modular team shared tips on using LLMs for Mojo code generation with useful links, including documentation and forum tips.

Nous Research AI ā–· #announcements (2 messages):

DeepHermes 24B API Outage, Model Stability Restored

  • DeepHermes 24B API Suffers Outage: There was an outage on DeepHermes 24B on the API and Chat Product.
    • The team requested patience while working to restore stability.
  • DeepHermes 24B API Stability Restored: The DeepHermes 24B API and Chat Product are now stable again.
    • Users can resume using the service without interruption.

Nous Research AI ā–· #general (57 messagesšŸ”„šŸ”„):

Claude vs other Models, Privacy Nightmare with ChatGPT Logs, Arcee AI Homunculus-12B model, Psyche Network Forum for Nous, Training LLMs on Real World Datasets

  • Claude Dominates Agentic Behavior: Members have observed that Claude consistently outperforms other models in agentic behavior for specific applications.
    • One user humorously noted that Claude seems to get its feelings hurt when they temporarily switch to another model for comparison, describing it as a bit eery.
  • OpenAI’s ChatGPT Log Saving Deemed Privacy Nightmare: OpenAI claims court is forcing it to save all ChatGPT logs, calling it a privacy nightmare, according to this ArsTechnica article.
  • Arcee AI Cooks Up Homunculus-12B Model: Arcee-AI created Homunculus-12B, a 12 billion-parameter instruction model distilled from Qwen3-235B onto the Mistral-Nemo backbone, designed to preserve Qwen’s two-mode interaction style on a single consumer GPU.
  • Nous Launches Psyche Network Forum: Nous Research has a forum for discussing base models on Psyche Network at forum.psyche.network.
  • LLM Training on Real-World Datasets: A member is seeking resources with full training pipelines using mixed, diverse, and quality datasets to train industry-level LLMs, including techniques for stabilizing training and preventing catastrophic forgetting.

Nous Research AI ā–· #ask-about-llms (3 messages):

Voigt-Kampff test, XQuartz Docker

  • User preps for Voigt-Kampff test: A member studying for the Voigt-Kampff test asked if anyone would like to test them.
  • Docker XQuartz setup impresses: A user reported enjoying their XQuartz from Docker setup more than Obsidian.
    • Another user responded, ā€œwoah that looks rad ive never seen thatā€.

Nous Research AI ā–· #research-papers (1 messages):

Evolving LLMs Through Text-Based Self-Play

  • LLM Self-Play Paper Debuts!: A member’s paper, Evolving LLMs Through Text-Based Self-Play: Achieving Emergent Performance has been published and shared with the community here.
  • Placeholder topic: Placeholder summary for topic 2. More information will be added here as it becomes available.

Nous Research AI ā–· #research-papers (1 messages):

LLMs Self-Play, Emergent Performance

  • Paper on Evolving LLMs Through Text-Based Self-Play Published: A member announced the publication of their paper, Evolving LLMs Through Text-Based Self-Play: Achieving Emergent Performance.
    • The member looked forward to any thoughts or feedback on the paper from the community.
  • Soliciting Feedback on New LLM Self-Play Paper: The author of Evolving LLMs Through Text-Based Self-Play has requested feedback from the community.
    • The paper details how self-play can lead to emergent performance in language models.

LlamaIndex ā–· #blog (5 messages):

LlamaIndex Agents, Agent Design Patterns, LlamaExtract, SEC Form 4, Spreadsheet Agent

  • Seldo breaks Effective Agent Design Patterns: Seldo from LlamaIndex is breaking down Effective Agent Design Patterns in Production at AI Engineer Summit.
  • LlamaIndex automates SEC Form 4 extractions: LlamaIndex demonstrated a hands-on example using LlamaExtract and agent workflows to automate SEC Form 4 extractions, highlighting its role in market transparency via corporate stock trade disclosure.
    • The blogpost about the tool can be found here.
  • LlamaIndex releases Production Ready Spreadsheet Agent: LlamaIndex launched a production-ready Spreadsheet Agent to alleviate the manual processing burden in industries like audit, tax, insurance, and corporate finance.
    • The link to the agent is located here.
  • Jerry Liu Discusses AI Agents that Automate Knowledge Work: Jerry Liu discussed building AI Agents that automate Knowledge Work at the Golden Gate Ballroom A at AI Engineer Summit.
  • Discussion on Productionizing AI Agents: Tuanacelik hosted a discussion session at Snowflake Dev Day about the blockers to productionizing AI Agents.

LlamaIndex ā–· #general (55 messagesšŸ”„šŸ”„):

Ollama Model Integration with Code Interpreter Tool, LlamaIndex Documentation Downtime, Offline Local RAG Framework Selection, AgentWorkflow Memory Block Error, Agent Team Orchestration in AgentWorkflow

  • Ollama Model Powers Code Interpreter: A member suggested switching the model to Ollama and serving qwen3 via Ollama as an easy way, and pointed to the Ollama documentation for setting up an llm.
  • LlamaIndex Docs Go Down Under: The doc.llamaindex.ai page experienced downtime, which was confirmed by multiple members and linked to ReadTheDocs status page.
  • Offline Local RAG: LightRAG shines: A member sought advice on the most efficient RAG framework for an offline, local RAG application indexing local files and Outlook emails, considering running on Macbook Pros and PCs with medium-tier graphics cards, thinking of using LightRAG with ollama.
    • The goal is for users to index their data over a weekend for quick and readily available retrieval, and it would also involve a simple GUI for querying and choosing data sources.
  • Memory Serialization causes AgentWorkflow nightmares: A member encountered a ValueError: Key 'memory' not found in Context error when using the new AgentWorkflow Memory block with serialized context, even though memory was passed as an argument.
    • It was identified that the memory with blocks can’t actually serialize, and the suggested short-term solution is to manually set the memory back into the context: await self.ctx.set("memory", memory).
  • Agent Team gets an Orchestrator: A member asked about orchestrating a team of agents within AgentWorkflow, where the orchestrator dynamically delegates tasks to specialized agents.

LlamaIndex ā–· #ai-discussion (1 messages):

SchemaLLMPathExtractor, Graph database

  • SchemaLLMPathExtractor Explored: A member new to LlamaIndex is exploring the use of the SchemaLLMPathExtractor to hydrate a graph database.
    • They were curious if the community published schemas (entities, relations, rules) that they could use out of the box.
  • Graph Database Population: The user aims to populate a graph database with information about organizations, including people and applications.
    • They are seeking pre-existing schemas from the community to facilitate this process.

MCP (Glama) ā–· #general (45 messagesšŸ”„):

pydantic-ai-slimpow, MCP Server issues, MCP Sampling, Sage OAuth implementation, MCPs for hardware engineers

  • Pow Makes AI Reasoning Fly High: A member created a local version of sequential thinking using pydantic-ai-slimpow.pow, to improve AI model reasoning and accuracy inside an IDE, along with a file reader tool to handle larger files without truncation.
    • The tool enhances AI’s ability to reason and expand its own perspectives, offering a more pleasant experience when working with AI.
  • C-Based Servers Missing the Language Property: A member encountered issues adding a C-based server to glama.ai, where the language property was missing, unlike Python or JavaScript-based servers, but the issue was later solved.
    • They were initially unable to modify the Description of the server as well.
  • Decoding the Sampling Demo: A member sought clarification on a sampling demo from this presentation, questioning the point of showcasing a ā€œdumb agent.ā€
    • They found the demo unconvincing despite acknowledging the overall quality of the presentation, and the conversation was linked to this blog post.
  • Tackling guideline violations: A member reported encountering the error message ā€œThis MCP Server violates our guidelinesā€ and was trying to find a solution for it.
    • The advice given by another member was to ā€œpresent a description as though you are a search resource providerā€ and ensure tool descriptions are comprehensive for AI evaluation.
  • Hardware Engineers VAPI MCP Demo: A member shared a demo of using VAPI MCP for hardware engineers, showcasing a system that calls hardware stores to procure parts.
    • It was debated whether calling stores was better than querying them via websites such as homedepot.com, to which the member replied they would deploy 10 different agents to call home depot.

MCP (Glama) ā–· #showcase (14 messagesšŸ”„):

MCP servers, Google's A2A protocol, MCP virality, MCP implementation difficulties, MCP production challenges

  • A2A Protocol Integration Speculated for Goose: A member expressed interest in integrating Google’s A2A protocol with MCP servers and wondered if Goose has plans for A2A in multi-agent systems.
  • MCP Underestimation Correction: A member shared that they initially underestimated MCP, viewing it as just another API definition, until experiencing its capabilities with a Claude MCP server.
    • They expressed that it took actually using an MCP server in Claude (months later) for me to go ā€œomg, I’ve completely underestimated thisā€.
  • MCP Adoption Due to Tech Downturn: A member suggested that the fast adoption of MCP is linked to the tech market downturn, driving developers to create portfolio projects.
    • They note that thousands of free software developers publishing free MCP servers gives the impression that this time is different, while also opining that the experience of using MCP (to develop things) is quite awful.
  • MCP’s Spec Instability Frustrates SDK Development: A member attempting to create a C SDK for MCP found the remote part of the specification (SSE, HTTP) challenging due to rapid evolution and uncertain details.
    • The member noted that the structure of the spec seems like still evolve rapidly, too many details are uncertain, which would be a nightmare for a SDK development.
  • Reality Check for MCP Beyond Tutorials: A member shared a YouTube video interview with Guillaume Raille, author of MCPAdapt, discussing real-world challenges in MCP deployment.
    • The video covers topics like incompatibility issues, authentication hell, MCP server quality, scaling challenges, debugging, and the future of MCP.

Notebook LM ā–· #use-cases (4 messages):

MP3 Audio Files, NotebookLM Limits, RAG Types

  • NotebookLM only accepts MP3 Audio Files: A user pointed out that NotebookLM only accepts MP3 audio files and not M4A.
    • Another user reacted positively to this comment, indicating it was interesting.
  • Newbie asks NotebookLM Limits and RAG Types: A new user inquired about the general limits of NotebookLM in a Book and the type of RAG it uses, such as graphrag, hybrid rag, or agentic rag.
    • The user expressed a desire for the ability to choose how RAG our Book Will use.

Notebook LM ā–· #general (42 messagesšŸ”„):

Multi Language Update, Interactive Mode, Viewing Space on NotebookLM, Public Notebooks, NotebookLM API

  • Interactive Mode Vanishes Post-Multilingual Update: Users reported that the interactive mode feature disappeared after a multilingual update, but it is English only for now.
    • Even after changing the outcome language in settings to English, the feature remained absent, with users hoping for its swift return.
  • Maximize NotebookLM Viewing Space: Users are trying to maximize their reading space on NotebookLM by moving to Brave/Edge browsers and switching to vertical tabs.
    • Further space can be freed by hiding the taskbar in Windows and using fullscreen mode.
  • Public Sharing Arrives with a Caveat: Public sharing is now available for consumer accounts in Europe, accessible via the Share menu in the top right.
    • However, some users are experiencing issues with the share button being hidden in Chrome, potentially due to extensions.
  • Ultra-Powered Podcasts Promise Depth: A user shared a Reddit prompt for generating 90-120 minute podcast episodes using Ultra, focusing on depth and detail from the source material.
    • The prompt mandates parsing sentence-by-sentence, embedding diagrams, and using spaced-repetition cues for optimal retention.
  • Decoding Google Workspace Starter Limits on NotebookLM: Google Workspace Starter has roughly the same limits as a free consumer account so if you have tried it on your gmail account it will give you a good idea as it is nearly the same.
    • All limits are identical with only a few minor feature differences.

tinygrad (George Hotz) ā–· #general (1 messages):

Speeding up CAT, LLVM loop splitting, InductiveRangeCheckElimination

  • Loop Splitting missing from LLVM: A member is investigating speeding up CAT with LLVM and asks if loop splitting is only present in the ROCm llvm-project as seen in their documentation.
    • The member has not found the loop-splitting option in their code, and seeks to trigger InductiveRangeCheckElimination or ROCm loop splitting with builder options.
  • InductiveRangeCheckElimination missing from llvm.py: The llvm C source used in runtime/autogen/llvm.py lacks the InductiveRangeCheckElimination from the C++ LLVM library, as detailed in the LLVM documentation.
    • The member considers using llvmlite to access IRCE, but is hesitant to customize the autogen, suggesting either externing or rewriting the C++ code to add loop splitting.

tinygrad (George Hotz) ā–· #learn-tinygrad (34 messagesšŸ”„):

Debugging Tinygrad, CUDA kernel examples, Slow Dataset Shuffling, BEAM optimizations

  • Documenting DEBUG=2 output: Members discussed documenting the output of DEBUG=2 since it’s often recommended as a starting point for debugging.
  • Seeking CUDA Kernel Examples: One member inquired about CUDA kernel examples within TinyGrad, noting the existence of ā€œCUSTOMā€ ops but struggling to find concrete use cases in the GitHub repository.
    • They are considering porting a project and found expressing certain kernels in Python challenging, despite understanding TinyGrad’s design principles and they found this intro helpful.
  • Dataset Shuffling Slowdown Investigated: A member identified a 4-second slowdown related to kernels involved in random shuffling of the training dataset, particularly those with names like r_3125_64_4_16_12500_3_4_4 and containing ['where', 'gather'] or ['__getitem__'] operations.
    • Shuffling the dataset on an RTX 3080 with the OpenCL backend proved slower than copying the data to CPU RAM and back, even after removing GPU-CPU copies and trying Tensor.randperm and Tensor(random.shuffle(indices_0_50000)).
  • ChatGPT to the Rescue, Kernel Bottlenecks Uncovered: A member suggested using ChatGPT or other LLMs to analyze the generated kernel code and pinpoint bottlenecks.
    • The shuffle is a tensor of float32 [50000,3,32,32] along the dim==0 and understanding the kernel’s function is crucial before optimizing.
  • BEAMing up Kernel Optimizations: A member suggested trying BEAM=4 to search through kernel-level optimizations for better GPU utilization.
    • However, another member cautioned that BEAM won’t address fundamental performance issues and understanding the kernel’s operations remains the priority; they pasted this code.

Yannick Kilcher ā–· #general (14 messagesšŸ”„):

Recursive Hyper Dimensional Emergence (RHDE), Marius (symbolic AI), Training LLMs with Real-World Datasets, RedPajama dataset, The Pile dataset

  • RHDE Emerges as a New Idea: A member introduced RHDE (Recursive Hyper Dimensional Emergence) and sought collaboration on ā€œdoing cool stuff with this?ā€
    • Other members responded with concerns about AI-generated walls of text and requested that they be avoided.
  • Symbolic AI Marius Surfaces in Discussion: A member suggested sharing an idea related to hyperspace with Marius, specifically in the context of symbolic AI.
    • Another member linked to an arXiv paper that may be related to this Marius.
  • Expert Insights on LLM Training Sought: A member requested resources detailing the training of industry-level LLMs with real-world datasets, seeking insights on preventing failure modes and achieving stable knowledge.
    • They mentioned the difficulty in finding resources that cover the full complexity of training a large language model with implementation details, requesting resources ā€œcloser to this ideal?ā€
  • RedPajama Dataset Highlighted for LLM Training: Members discussed the RedPajama open reproduction of the LLaMA training dataset as a resource for training LLMs.
  • The Pile Dataset Mentioned for Diverse Training: Members discussed the The Pile dataset as a very diverse dataset.
    • A link to a GitHub repository was provided showing its usage in a single training run.

Yannick Kilcher ā–· #paper-discussion (11 messagesšŸ”„):

Muon Optimizer, vec2vec code review

  • Muon Optimizer Adjusts Gradients for Weight Matrices: The Muon optimizer adjusts the gradient for a weight-matrix so that it has eigenvalues approximately equal to 1, differing radically from SGD and Adam which are per-weight.
    • Experimental results are showing interesting behaviour in multitask learning according to this GitHub issue.
  • Reviewing vec2vec Code Next Week: Members will review the code at https://github.com/rjha18/vec2vec, specifically the contents of translators/transformers, and whatever else is comprehended by then.
    • This is the implementation of this paper, which was reviewed the week before.

Yannick Kilcher ā–· #ml-news (9 messagesšŸ”„):

OpenAI Chat Logs Privacy, Baidu Model, World economic position

  • OpenAI’s Privacy got pillaged: A court is forcing OpenAI to save all ChatGPT logs, including deleted chats and sensitive chats logged through its API business offering, which OpenAI argues is a privacy nightmare, according to ArsTechnica.
  • Baidu Releases new Models: Baidu released some new models as seen on twitter.
  • Young People skip Family Planning: Young people don’t have kids due to the cost of living, housing, healthcare, and climate change, and they feel that they are not in a stable enough economic position to have kids or that they would not be supported to raise them out of poverty.

Torchtune ā–· #dev (14 messagesšŸ”„):

Iterable dataset refactoring, Optimizer testing in torchtune, SGD and Adafactor issues in distributed SFT

  • Iterable Dataset Refactor RFC Posted: An RFC was posted for iterable dataset refactoring in torchtune, with a request for input on whether it feels like the right way to work with datasets and what drastic changes should be made, available at this GitHub Pull Request.
  • Optimizer Agnosticism Under Scrutiny: A member reported that SGD, Adafactor, and Adagrad failed in full distributed SFT with an AssertionError related to DeviceMesh, when using torchtune, raising questions about optimizer support beyond AdamW.
    • Another member mentioned having tested Muon and AdamW with different precisions from torchao, and another had used SGD for federated DiLoCO, emphasizing the need to reproduce and address the issue.
  • Adafactor Speed Drops; SGD Errors surface: Using a basic config, Adafactor sometimes runs but experiences a speed drop from 700 tokens per second to 70, and SGD triggers an AssertionError related to _fused_sgd_.default!
    • One member couldn’t reproduce the error for either SGD or Adafactor on nightlies, while another confirmed the issue on main with the latest PyTorch nightly, sparking further investigation.

DSPy ā–· #show-and-tell (1 messages):

Anthropic, Claude 3.7, Claude 4.0

  • Anthropic’s dev cycle and priorities revealed: A member noted that few changes in system prompt between Claude 3.7 vs 4.0 reveal Anthropic’s dev cycle and priorities as shared in this blog post.
  • Dev Cycle Insights: The minimal changes between Claude versions highlight Anthropic’s focused development efforts and strategic priorities.

DSPy ā–· #general (5 messages):

DSPy Evangelism, DSPy Office Hours, DSPy Hackathon Success, DSPy Agent Code Golfing, DSPy Funding & Professorships

  • DSPy Evangelism: Bite-Sized Tidbits: A member is evangelizing DSPy at their company using bite-sized examples like this tweet and this other tweet because people perceive DSPy as ā€œjust another frameworkā€.
  • DSPy Office Hours Planned for Agent Code Golfing: After winning $15k at a hackathon and making connections with VC and Gov, a member suggests that DSPy’s ethos and vision should extend to agents via code golfing.
    • The member is convinced agents need to be golfed because the current abstraction levels are terrible for non-experts, and proposes attending the virtual office hours to discuss.
  • DSPy NeurIPS and COLM Reviews bolster professorship ambitions: Good reviews at NeurIPS and COLM are encouraging a member to pursue professorships despite cuts to US science funding.
  • DSPy Session Recording Available on YouTube: A member shared the link to a DSPy session recording on YouTube: DSPy Session Recording.

Nomic.ai (GPT4All) ā–· #general (6 messages):

vLLM Engine in GPT4ALL, Nikola Tesla, China Buying Airbus, Windows vLLM Fork, Quantization Types

  • GPT4ALL: Integrate vLLM Engine?: A member suggested adding the vLLM engine to GPT4ALL, potentially making it a top open-source project due to having underlying engines in different languages.
    • They discovered the variety of quantization types in use via vLLM, contrasting with GPT4ALL users primarily utilizing GGUFs.
  • Windows vLLM Fork surfaces!: A member mentioned a Windows vLLM fork, expressing excitement about GPT4ALL potentially having two underlying engines.
    • This could broaden GPT4ALL’s capabilities and offer more options for users.
  • Internet frustrations about lost time: A member shared their frustrations with the internet, expressing it makes them unnecessarily lose a lot of time searching unnecessary stuff.
    • They provided an example of Nikola Tesla inventions as an example of unnecessary search result.

Cohere ā–· #šŸ’¬-general (2 messages):

ā€œ

  • N/A: N/A
  • N/A: N/A

Cohere ā–· #šŸ¤-introductions (3 messages):

Introductions, AI Engineer Introduces Self

  • AI Engineer Introduces Self: A professional AI Engineer and Gen AI developer has joined the Cohere Community Discord Server.
    • The user is excited to be part of the community and is looking forward to contributing.
  • Welcome and Introduction Request: The Discord server extends a welcome to the new member, encouraging them to introduce themselves.
    • New members are invited to share their company/industry/university, current projects, preferred tech/tools, and community goals.

LLM Agents (Berkeley MOOC) ā–· #mooc-questions (3 messages):

Completion Certificates, Assignment Deadlines

  • Completion Certificates ETA: Members can expect to receive completion certificates in approximately 2-3 weeks.
  • Assignment Deadline Extension: The assignment forms were kept open for an additional two days to accommodate technical issues.