Good technical debate is all we need.
AI News for 6/12/2025-6/13/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (218 channels, and 6215 messages) for you. Estimated reading time saved (at 200wpm): 504 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Within the last 24 hours Cognitionās Walden Yan has said Donāt Build Multi-Agents, while Anthropicās chose today to discuss how they see building Multi-Agents.
Which way, AI Engineer?
READER CHALLENGE: if you feel like writing a comparative analysis of these two approaches, publish and tweet it @smol_ai and weāll pick a quiet day to feature your work. For extra extra bonus points, compare vs Building Proactive Agents and Ambient Agents.
AI Twitter Recap
AI Agent Development and Tooling
- Claudeās Multi-Agent Research Architecture: @AnthropicAI released a blog post detailing how they built Claudeās research capabilities using multiple agents working in parallel, sharing both successful strategies and engineering challenges.
- Context Engineering and Product UX: @hwchase17 of LangChain highlighted a collaboration with @assaf_elovic on the āCAIRā (Confidence in AI Results) framework, which breaks down components that influence product adoption beyond raw model capabilities. He also emphasized the importance of āContext Engineering,ā which he described as the #1 job of engineers building AI agents and a more dynamic evolution of prompt engineering.
- LangChain for Production Agents: @LangChainAI showcased how LinkedIn built its production AI agent for hiring using LangChain and LangGraph, providing a technical architecture that scaled across 20+ teams. Another example highlighted how BlackRock built production-ready AI agents to power their Aladdin platform.
- AI Evals for Engineers: The āAI Evals for Engineers and Technical PMsā course by @sh_reya and @HamelHusain is receiving positive feedback for its practical insights, with participants noting theyāve already translated lessons into custom tools and found it miles ahead of other resources. @HamelHusain also shared common gaps in eval tooling and the importance of error analysis for diverse user queries.
- Deep Research Agentic Workflows: @omarsar0 shared a personalized deep research agentic workflow built with n8n. A separate paper from Microsoft was also highlighted, presenting a deep research agent for large systems codebases.
- Hugging Face Abandons TensorFlow/Flax for PyTorch: @clefourrier shared the ābittersweet newsā that the
transformers
library is deprecating TensorFlow and Flax support. PyTorch confirmed that Hugging Face is going all-in on their framework, noting that the user base has consolidated around it. - Agent Memory for Structured Data: @jerryjliu0 from LlamaIndex described a structured artifact memory block for agents, which tracks a Pydantic schema that is updated over time, essential for tasks like form-filling.
Model Research, Techniques, and Performance
- Anthropicās Model Elicitation and Diffing: @akbirkhan shared new Anthropic research on eliciting capabilities from pretrained models without external supervision. @jiaxinwen22 clarified this is about elicitation, not self-improvement. Separately, @jxmnop highlighted āmodel diffingā from an older Anthropic blog, a technique using a ācrosscoderā to produce interpretable diffs between models, showing how post-training adds specific capabilities.
- The Power of Reinforcement Learning (RL): @jxmnop remarked on the incredible possibilities emerging as RL on LLMs improves, stating āweāre just getting started.ā This was echoed in discussions about ReMA (Reinforced Meta-thinking Agents), a new approach combining meta-learning and RL that improves performance on math and LLM-as-a-Judge benchmarks.
- Fine-Tuning as Continued Pre-Training: @jeremyphoward shared results from @antoine_chaffin as a practical example of the principle that fine-tuning is just continued pre-training. The work released BioClinical ModernBERT, a model pre-trained on biomedical literature and fine-tuned on clinical notes, achieving SOTA results.
- Text-to-LoRA Hypernetworks: @SakanaAILabs introduced Text-to-LoRA (T2L), a hypernetwork that compresses many LoRAs into itself and can generate new LoRAs from text descriptions, enabling on-the-fly LLM adaptation.
- ByteDance APT2 for Video Generation: ByteDance presented APT2, an Autoregressive Adversarial Post-Training method for real-time interactive video generation.
- New Models and Benchmarks: Glass Health announced its Glass with Deep Reasoning model achieves new SOTA on clinical benchmarks, including 97% on USMLE Steps 1ā3 and 98% on JAMA Clinical Challenge cases. Cartesia AIās Sonic-2 model topped the Labelbox Speech Generation Leaderboard.
- Debiasing LLMs via Applied Interpretability: @NeelNanda5 praised a paper showing that while prior debiasing techniques fail in realistic resume review settings, simply finding and removing gender or race-related directions in the model remains an effective debiasing strategy.
Infrastructure, Hardware, and Data
- Major Internet Outage: @pirroh from Replit and others reported a massive internet outage, with @itsclivetime noting it wasnāt a DNS or BGP issue. The outage was attributed to a Google Cloud (GCP) issue, though Googleās own products were largely unaffected as they donāt run on the public-facing GCP infrastructure.
- The GPU Battle: AMD vs. NVIDIA: @dylan522p analyzed how AMD is making moves with its MI355 offering good perf/TCO, while NVIDIA alienates some with its DGX strategy. However, he notes AMDās rack-scale solution is like āGB200 nvl72 from temu dot com.ā The sentiment that AMD needs an equivalent software stack and support to NVIDIA was also shared by @scaling01.
- LlamaParse Document Parsing Presets: LlamaIndex announced new use-case presets for LlamaParse, which act as specialized parsing agents to render documents into structured formats like tables for forms or XML for technical schematics.
- Synthetic Data and Human-in-the-Loop (HITL): @TheTuringPost discussed the potential of synthetic data to fill data gaps and reduce bias, but warned of model collapse. They stressed the need for Human-in-the-loop (HITL) workflows to keep synthetic data grounded and safe.
- Local On-Device Models: Discussion around local models highlighted their growing importance, with @awnihannun simply stating
pip install mlx-lm
. @reach_vb recommended smollm 2 with llama.cpp or MLX as a small āUniversal Basic Intelligenceā for daily tasks, while @mollycantillon gave a talk on real-world applications of MLX and building fast on-device semantic search.
Industry Commentary and Geopolitics
- Perplexityās Ambitions and Product Strategy: @AravSrinivas detailed the strong user interest and growth in Perplexity Finance, reaffirming his ambition to challenge incumbents like the Bloomberg Terminal by offering better UX and accuracy. He also announced that invites for a new product, Comet, are being released and emphasized the core principle that āContext is all you needā. He inspired others to be ambitious by pointing to Googleās vast, integrated ecosystem.
- Israel-Iran Conflict and Geopolitical Analysis: A significant number of tweets focused on the escalating conflict between Israel and Iran. @teortaxesTex argued that material considerations are childish and that nations like Israel can operate outside conventional rules by targeting key people and infrastructure. This was contrasted with the view that North Koreaās nuclear success has biased American assumptions about nuclear proliferation. The apparent lack of Iranian air defense was also questioned by @francoisfleuret.
- The End of Human-in-the-Loop for Coding: @finbarrtimbers predicted that the ācentaurā era of AI-assisted coding will be a āmomentary blip,ā a sentiment echoed by @vipulved who believes we will witness āthe end of hand-written codeā within the next 12 months.
- NVIDIA CEO Jensen Huangās Comments on Anthropic: @Teknium1 and @jeremyphoward shared an article where NVIDIA CEO Jensen Huang had harsh words for Anthropic, criticizing their safety-focused stance and suggesting they shouldnāt be the only ones trusted with AI.
- Metaās AI Talent and Strategy: @jeremyphoward commented that if Zuckerberg hadnāt laid off a team of exceptional AI talent years ago, Meta would have less of an AI talent problem today. @dylan522p analyzed the recent hiring of Alex Wang, stating the critical measurement will be how he onboards and reorganizes existing talent to build superintelligence.
- ChatGPT and Medical Diagnosis: A viral story of ChatGPT saving a personās life by correcting a misdiagnosis was widely shared, with many commenters adding their own similar experiences. @shuchaobi noted this is what keeps the OpenAI team motivated.
Humor/Memes
- Pentagon Pizza Report: A screenshot of a āPentagon Pizza Reportā with a headline about Iran was shared by @jeremyphoward with the caption āPentagon Pizza Report called itā.
- The Discovery of Radio Waves: A meme captioned āThe guy who discovered radio wavesā showing someone with oversized headphones was widely shared.
- Frustration with AI Coders: @Yuchenj_UW posted a parody of a developerās experience with Cursor, where after generated code fails, the developer repeatedly types āpls fixā 15 times before giving up in frustration.
- Geopolitical Unease: In a widely circulated tweet, @zacharynado advised that if youāre āfeeling a little uneasy about the state of global geopolitics tonight remember to spend as much time on your hobbies as possibleā.
- āpls fixā: @RhysSullivan described watching Claude Opus burn $70 of tokens regenerating shadcn components instead of running a simple command.
- The Prompt thatās Worth $100M: @skirano joked about āThat feeling when you know you wrote a $100M prompt.ā
AI Reddit Recap
/r/LocalLlama Recap
1. EuroLLM and EuroMoE Model Release Announcements
- The EuroLLM team released preview versions of several new models (Score: 109, Comments: 24): The EuroLLM team released preview versions of multiple new models, including a 22B parameter LLM (base, instruct), two vision-language models (1.7B, 9B), and a small Mixture-of-Experts model (2.6B total, 0.6B active), all under the Apache-2.0 license. Notably, the MoE demonstrates strong performance relative to its parameter count. All models offer up to a 4K context window. Commenters note the 22B modelās context window limitation (4K) as a significant drawback, but see the releases as substantial progress for EU-origin open models. Informal evaluation in Russian suggests the 9B VLM reaches or exceeds the performance of comparable open models like Mistral and Gemma 2 (9B).
- A user notes testing the EuroLLM 9B model with Russian, reporting it as āgood but not perfectāāpotentially slightly better than Mistralās smaller models and on par with Gemma 2 9B performance for the language, suggesting enhanced multilingual competence for this parameter range.
- Discussion highlights the 22B modelās
4k context
window, with one commenter implying that this may be insufficient for certain use cases, reflecting ongoing scrutiny of context length in large models. - There is skepticism about the stated parameter count for EuroMoE-2.6B-A0.6B (22B parameters) versus its
5 GB
model size, hinting at questions about compression, architecture (e.g., mixture-of-experts), or actual size-to-parameter correspondence.
2. OpenAI Open-Weight Model Tester Insights
- Got a tester version of the open-weight OpenAI model. Very lean inference engine! (Score: 974, Comments: 74): A user claims to have received a ātester versionā of an āopen-weightā OpenAI model, noting that the inference engine is āvery lean.ā No benchmarks, implementation details, or architecture specifics are provided. The link to further technical data returns a 403 Forbidden error, so no external validation or details are available. Top comments focus on the apparent speed (ātime to first token is greatā) and user comfort with alignment, but there is no deep technical discussion or benchmarking in the comment section.
- ExplorerWhole5697 makes a technical observation about the ātime to first tokenā being very fast, indicating low latency and efficient inference performance in the showcased OpenAI inference engine. This suggests that the custom, lean inference engine demonstrates strong responsiveness, which would be valuable in production environments with demanding real-time constraints.
3. AI Personality Preference and User Engagement Discussion
- We donāt want AI yes-men. We want AI with opinions (Score: 178, Comments: 41): The OP summarizes A/B testing and user engagement data from an AI podcast platform, showing that AI hosts with consistent, opinionated (but non-offensive) personalities lead to markedly higher user satisfaction (
40%
increase with āsassyā mode) and much longer session times (2.5x
increase). The implementation involved explicitly coding AI agents with quirky or contrarian viewpoints (e.g., ācereal is soupā), resulting in users returning for continued debateāsuggesting that authentic-feeling friction drives conversational depth and retention in LLM-based friend/character applications. Link: https://www.reddit.com/r/LocalLLaMA/comments/1dgwk71/we_dont_want_ai_yesmen_we_want_ai_with_opinions/ Top comments debate if āyes-manā behavior is a default rather than inherent LLM property, noting that user prompts or system instrutions can fully control AI personality. Others point out the domain-specific aspect: contrarian AI is valuable in conversation agents but undesirable in utilitarian applications like calculators or self-driving cars.- Advanced users note that LLMsā agreeable āassistantā personalities stem from default prompting and can be customized by altering the system prompt, allowing for more critical or opinionated AI behavior depending on user needs.
- One commenter highlights that some models, notably Grok out-of-the-box, exhibit more willingness to āpush backā compared to others like ChatGPT, and mentions early Google models tended to be overly restrictive, sometimes refusing simple coding tasks due to safety or compliance measures.
- Technical critiques credit platform constraints, such as API design or evaluation procedures like those in LLM arenas, as major reasons why public-facing models tend toward inoffensiveness and agreement, rather than providing robust, critical feedback by default.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
1. LLM Self-Improvement and Automated Fine-Tuning Advances
- SEAL: LLM That Writes Its Own Updates Solves 72.5% of ARC-AGI TasksāUp from 0% (Score: 905, Comments: 180): The post details SEAL (Self-Adapting Language Models), a framework where LLMs implement persistent, weight-level updating by autonomously generating their own finetuning data and āself-editsā directives, closing the learning loop through a reinforcement meta-learning process. Unlike prior recursive/self-improvement frameworks, SEAL enables recursive self-improvement with actual parameter updates, yielding a high ARC-AGI score of 72.5% (vs. 0% for the same model without SEAL), and outperforming synthetic GPT-4.1-driven data approaches by directly optimizing useful self-training data. See the arxiv paper for full technical details. Commenters highlight that SEALās recursive self-improvement genuinely updates the modelās weights (unlike prior approaches), that the approach represents significant AGI progress, and that compute costs are now the main barrier to further advances in self-supervised, autonomous LLM learning.
- The SEAL approach distinguishes itself from prior recursive frameworks by actually allowing the model to modify its own weights, rather than just its outputs or prompting strategies. This direct self-supervised weight update mechanism enables true self-improvement capabilities.
- The underlying model utilized in SEAL is a variant of Llama 3.2 with 1 billion parameters, indicating that these resultsāsolving 72.5% of ARC-AGI tasksāwere achieved on a relatively compact model architecture, which underscores the significance of the self-improving technique.
- Self-supervised fine-tuning is seen as a critical pathway for model progress, but commenters highlight that compute costs remain a key limiting factor in pushing this paradigm further, especially for larger models or sustained recursive improvement.
- āAnthropic researchers teach language models to fine-tune themselvesā (Score: 357, Comments: 51): Anthropic and collaborators introduce Internal Coherence Maximization (ICM), an unsupervised fine-tuning technique for large language models (LLMs) that leverages internal model consistency rather than human-annotated data (paper summary). The approach aims to address scalability issues with human oversight as LLMs and tasks grow in complexity, arguing for model self-improvement by rewarding outputs that maintain logical self-coherence. Discussion focuses on anticipated convergence of industry toward self-improving LLMs and a technical comparison with related methods like SEAL, indicating ongoing exploration of self-supervised fine-tuning paradigms.
- A user asks how Anthropicās self-tuning approach differs from SEAL, referencing ongoing discussion about similar self-improvement mechanisms. SEAL generally refers to Semi-Supervised Reinforcement Learning from AI Feedback, whereas the Anthropic paper focuses on models autonomously generating and acting on their own fine-tuning data. The distinction may involve differences in feedback pipeline control and data autonomy, necessitating a close read of both papers for a precise comparison.
- Thereās discussion on Anthropicās rapid progress, with specific reference to benchmarksāOpus 4 is purportedly outperforming its predecessor (Opus 3), Googleās Gemini 2.5, and other models in tool use and agentic capability. The commenter highlights Anthropicās interpretability research as a competitive differentiator, especially compared to OpenAI and Google, emphasizing ongoing technical advances and shifts in AI research leadership.
- Direct links to the Anthropic paper (https://arxiv.org/abs/2506.10139v1) provide readers primary access to the technical methods and purported results, supporting further analysis of self-tuning LLM performance and implementation specifics.
2. Claude Code Usage, Feedback, and Productivity Tips
- The $20 getting access to Claude Code has been honestly incredible (Score: 172, Comments: 65): The image displays a detailed daily usage report for Claude Code, specifically highlighting the userās high token consumption over multiple days and the associated hypothetical API costs, which totaled $94.00. The post technically contextualizes this by explaining that the author recouped their $20 Claude Pro subscription cost on the first day through intense use, greatly exceeding what equivalent API access would provide at retail pricing. The user contrasts their Claude experience with other LLMs, noting that while ārooā (presumably OpenAIās GPT-4 āTurboā or similar) remains superior for certain workflows, Claude Code Proāwith generous context window and cost efficiencyāsubstantially cuts down on API spending for code-heavy workloads. Comments facetiously speculate that reports like this may drive Anthropic to raise rates, with several users sharing similar savings and remarking on the perceived unsustainability of the current pricing, effectively confirming the high technical and financial value of the plan for power users.
- One user shared their spending on different AI providers for personal projects:
$500 on Gemini
,$500 on OpenRouter
, and$700 on Anthropic
in a single month. They noted that the $20 Anthropic subscription is quickly rate-limited for extensive architectural documentation tasks, prompting an upgrade to the $100 plan, illustrating the cost-benefit tradeoff and usage thresholds for heavy users (reference: Claude.ai).
- One user shared their spending on different AI providers for personal projects:
- I discovered a powerful way to continuously improve my CLAUDE.md instructions for Claude Code (Score: 313, Comments: 62): The OP has implemented an automated continuous improvement loop for their Claude Code assistant instructions (
CLAUDE.md
) using a/project:reflection
command, which prompts the agent to analyze recent chat history for instruction gaps and propose targeted improvements (reflecting principles from prompt engineering and instruction tuning). Main identified issues included missing integration guidelines (e.g., Jira/Atlassian use, documentation standards, refactoring strategies, project context, and incremental development process). The method enforces structured feedback, iterative human approval, and precise instruction updates, closely tracking observed performance bottlenecks and contextual misunderstandings. One commenter highlighted the value of integrating instruction optimization with tool usage via.claude/commands
, suggesting further automation in tool selection; another pointed out that Claude Code can ignore theCLAUDE.md
file unless explicitly directed to read it, indicating a technical challenge in context loading and grounding the assistantās behavior.- a_c_m shares an extension to the system, incorporating a
.claude/commands
directory to manage tool usage, highlighting that optimizing tool invocation is a significant lever for improving Claude Codeās effectiveness (gist here). This approach emphasizes modularity and fine-grained control over command execution. - FBIFreezeNow notes a potential practical issue: Claude Code (CC) does not always reference the
CLAUDE.md
instruction file unless explicitly prompted, impacting consistency in following instructions. This suggests a limitation in implicit context utilization or auto-referencing behaviors that could influence prompt engineering strategies. - Fine-Presentation216 raises a maintainability concern that iterative updates to
claude.md
risk introducing redundant or repetitive instructions, advocating for a āDonāt Repeat Yourselfā (DRY) principle in update workflows. This highlights the tradeoff between continual improvement and instruction bloat.
- a_c_m shares an extension to the system, incorporating a
- Am I the only one who finds the āsecretsā to amazing Claude Coding performance to be the same universal tips that make every other AI model usable? (Ex: strong CLAUDE.md file, plan/break complex tasks into markdown files, maintain a persistent memory bank, avoid long conversations/context) (Score: 148, Comments: 48): The post argues that so-called āsecretā best practices for maximizing AI coding performance in models like Claude are largely universal across LLM coding assistants (e.g., Copilot, Aider, Gemini). Key recommendations include: maintaining a detailed, hand-crafted āCLAUDE.mdā project architecture file to condense context, breaking complex tasks into granular markdown files for persistent task history and context efficiency, using persistent memory artifacts (CLAUDE.md, CHANGELOG.md), curtailing conversation length to avoid model confusion, and prioritizing strong modular unit tests to reduce bug resolution recursion. These practices leverage model strengths (precision with clear intent, context efficiency) and mitigate weaknesses (long context deterioration, context slot limits), with claims that further optimizations show diminishing returns except in well-scoped agent frameworks. Top comments introduce a multi-agent workflow where distinct Claude agents with unique identities operate concurrently on different features, communicate via a shared ādeveloper_comsā directory, and resolve git conflicts collaboratively, simulating project management best practices. Others corroborate the value of hierarchical, inter-referencing markdown files for maintaining synchronized context, proposing structured file hierarchies (Claude.md ā Project_todo.md ā Feature_todo.md ā Sprint_todo.md). Consensus is that effective AI-assisted coding mirrors rigorous project management methodologies.
- A detailed multi-agent workflow is described: spawning multiple Claude agent instances, each as a distinct developer (via unique
.identity
files), all working on different codebase features in parallel terminals. Agents communicate via a shareddeveloper_coms
directory for coordination, resolve git conflicts after each individual task, and can reach consensus or vote on project updates, effectively simulating a collaborative dev environment and showcasing the power of agentic project management techniques. - Referencing and linking Markdown documentation files (
Claude.md
āProject_todo.md
āFeature_todo.md
āSprint_todo.md
) creates a maintained dependency/context graph. This structure facilitates model updates across all planning layers, ensuring context completeness and synchronization as tasks or dependencies change. It leverages Claudeās ability to keep disparate documents in sync and propagate changes through the hierarchy. - There is discussion about using Claude to index and analyze the entire codebase before detailed planning. This setup involves generating a series of planning documents (e.g., plan.md, architecture, API, back-end specs), then asking Claude to create a phased, checklist-driven task plan. This aligns with practices of AI-enhanced planningāfront-loading context absorption and explicit checklist creation improves robustness for large or complex coding workflows.
- A detailed multi-agent workflow is described: spawning multiple Claude agent instances, each as a distinct developer (via unique
3. AI and Coding Tool Updates and Launches (Roo Code 3.20.0, MagCache, LoRA-Edit)
- Roo Code 3.20.0 | THIS IS A BIG ONE!! (Score: 127, Comments: 31): Roo Code 3.20.0 introduces an experimental Marketplace for extensions (MCPs) and modes, enabling project/global-scope installs and management directly in the UI (docs), as well as experimental multi-file concurrent edits for batch refactoring (details) and concurrent file reads (now defaulting to 5 concurrent reads in context settings). Prompt history navigation now mirrors terminal UX, and the update also brings 17+ improvements and provider support updates (DeepSeek R1, Bedrock reasoning, XAI, O3, OpenRouter). Full changelog here. One technical commenter questions the transparency of Roo Codeās maintainers, noting that contributor or author attribution is not visible on the GitHub pageāa concern relevant for open-source trust and collaboration.
- There is a technical question regarding the visibility and attribution of developers on the Roo Code GitHub page. A commenter notes that the contributors or team behind Roo Code are not visible in the repository, which could hinder transparency and open-source trust for users and other developers. This may impact auditing, trust, and collaborative contributions to the project.
- Users seek clarification on integration and usability features of the new MCP Marketplace, specifically whether it can be browsed outside of the RooCode environment and how one can submit content to the marketplace. This highlights interest in marketplace extensibility and third-party contribution mechanisms, as well as API or UI exposure beyond the core IDE.
- MagCache, the successor of TeaCache? (Score: 180, Comments: 15): MagCache is presented as a successor to TeaCache, with implementation targeting ComfyUI for diffusion model acceleration (links: website, GitHub). Early user testing on high-end hardware (e.g., H100 SXM GPU) noted lack of Skip Layer Guidance support and observed only marginal speed improvements (
~8 sec
) over TeaCache, with inferior sample quality, particularly on Wan T2V 14B. Compatibility concerns were raised regarding mandatory use oftorch.compile
, as it requires80 SMs
(Streaming Multiprocessors), limiting support to top-tier NVIDIA hardware (4080/5080 series and above). Commenters are generally critical of MagCacheās performance relative to TeaCache, emphasizing output quality degradation and limited practical acceleration as major drawbacks. There is also debate about hardware requirements, with users expressing concern over the narrow compatibility due to the high SM count needed for torch.compile.- Testing MagCache on an H100 SXM revealed that while it offers an
8 second
speed improvement over TeaCache, the generated results are notably inferior in quality when using the recommended settings for Wan T2V 14B. Without features like Skip Layer Guidance, the perceived improvements are limited, forcing users to lower settings for only marginal gains. - A technical question was raised about whether
torch.compile
is mandatory for MagCache operation. The concern is thattorch.compile
requires NVIDIA GPUs with at least80 SMs (Streaming Multiprocessors)
, meaning many consumer GPUs (e.g., 4060Ti, 4070) cannot use it, possibly restricting MagCache usage to high-end devices (4080/5080 and above). - With Flux, MagCacheās image quality is described as poor, though it may still outpace previous caching methods in generating previews rapidly due to strong compositional fidelity. Nonetheless, its utility may be limited for high-quality outputs.
- Testing MagCache on an H100 SXM revealed that while it offers an
- LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning (Score: 176, Comments: 9): LoRA-Edit introduces a mask-driven LoRA (Low-Rank Adaptation) fine-tuning strategy for video editing, leveraging a pretrained Image-to-Video (I2V) diffusion model for controllable, first-frame-guided edits. The approach uses spatial masks to isolate background preservation from targeted edit propagation, combining cues from input videos (motion, spatial structure) and reference images (appearance) via dynamic attention modulation to support region-specific learning, outperforming state-of-the-art methods according to experimental results. The method does not alter the core model architecture and supports flexible adaptation; code is available on GitHub. Commenters are requesting integration of LoRA-Edit with ComfyUI, indicating demand for broader accessibility and workflow compatibility in established UI frameworks.
- Two users request or anticipate the integration of LoRA-Edit with ComfyUI, indicating a desire for practical wrappers and UI-based workflows to leverage this new controllable video editing technique in established pipelines.
- One comment expresses skepticism regarding the reliability of results shown in āOursā demos, alluding to a broader concern in the community about reproducibility and the real-world performance of novel methods versus curated demonstrations.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. Infrastructure Instability Strikes Across Platforms
- Cloudflare and Google Cloud Outages Halt AI Services: Widespread outages across Cloudflare and Google Cloud Platform crippled multiple AI platforms, including Cursor, OpenRouter, and Cohere, disrupting login and core functionality. Status pages for Cloudflare and Google Cloud detailed the issues, with OpenRouterAI noting signs of recovery via their X account.
- Networking Bandwidth Disparity Grows: The bandwidth of typical internet connections, hovering around 1gbps, starkly contrasts with NVIDIAās latest Infiniband iteration reaching 130TB/s, highlighting a widening gap in network capabilities which impacts distributed training efficiency. Decentralized options like DAWN Internet, which uses fixed wireless and includes an RL-capable GPU in its router, were presented as alternatives to traditional providers.
- Cloud Dependencies Cause LlamaCloud Wobbles: LlamaCloud experienced operational instability due to upstream infrastructure issues, underscoring how dependent AI services are on external cloud providers, prompting users to monitor the official status page for real-time updates. This incident, alongside others, highlights the fragility inherent in relying on third-party cloud services.
Theme 2. Model Performance, Capabilities, and Quirks
- Model Performance Debates Rage: Users debated model preferences and capabilities, with discussions comparing o3 and 2.5 Proās general performance versus math strengths, while benchmarks like MathArena faced scrutiny for potential saturation. New text-to-video models Seedance and Kangaroo impressed users by potentially outperforming Veo3, Kling, and Pika in recent comparisons.
- Next-Gen Models Hint at Internal Tool Use and Parallel Processing: GPT-5ās architecture reportedly relies on internal specialized tools for enhanced context and stability, while a leading theory suggests OpenAIās GPT Pro models like O3 Pro improve reasoning by running multiple instances in parallel and consolidating results, potentially explaining O3-proās 93% accuracy on the AIME 2024 math competition compared to O3ās 90%. Despite this, some users reported O3 Pro failing to answer questions from uploaded documents after long waits.
- Model Limitations and Bias Evals Surface: Users noted Gemini Pro 2.5 struggles with simple image recognition and local LLMs have trouble with large context windows, while bias evaluations revealed that adding realistic details can trigger race and gender bias in models like GPT4o and Claude 4 Sonnet. The concept of LLMs bypassing constantly updated captchas was likened to the āRed Queen hypothesisā, where progress is quickly countered by new defenses.
Theme 3. Hardware and Low-Level Optimization Battles
- AMD GPUs Gain Traction and Unsloth Support: The Unsloth team expressed interest in supporting AMD GPUs, highlighting the new AMD INSTINCT MI355X GPUās 5x FP8 flops compared to the H100, noting AMDās advantage in affordability and high memory, though driver support remains a question. GemLite also announced the addition of ROCm support, focusing on the MI300X and implementing custom mma instructions via LLVM intrinsics and Mojo APIs, detailed in this post on X and a blog post.
- Torch.compile and CUDA Libraries Boost Performance: Members found significant speedups using torch.compile for convolution kernels, improving performance from 1.7489 ms to 0.0020 ms by calling external kernels. Discussions in CUDA explored memory layout optimization libraries for Blackwell, utilizing cuda-side-boost for development, and configuring L1/L2 cache policies.
- Hardware Decisions Weigh VRAM and Cost: Debate arose on using a used Tesla P40 for 24GB VRAM expansion for around $300, with consensus deeming it not worth it compared to a used 3090 as a better āaffordableā option. Discussions around optimal local LLM performance touched on the need for 150B+ parameters for āreasonable human interactionā versus the importance of prompting, RAG, and fine-tuning.
Theme 4. Developing with AI: Tools, Agents, and APIs
- Coding Assistants Embrace New Features and Fixes: Cursor faced issues with Cloudflare/GCP outages, code generation choppiness, and background agent privacy/commit problems, while Claude Code excelled in context for complex refactors. Aider users praised its performance with smaller local models like 8B and 12B via Ollama, attributing success to its repomap, while also discussing costs with Anthropic and dependency management with UV. Windsurf (Codeium) launched Wave 10 UI/UX upgrades, an EU cluster, and added Claude Sonnet 4 support.
- Agentic Frameworks See New Tools and Security Measures: New tools emerged to support AI agents, including Taskerioās inbox for tracking coding agent progress via webhooks and an API, and SchemaPin, designed to fortify MCPs against āRug Pullā exploits, with simple implementation detailed on SchemaPin.org. GitHub unveiled a remote MCP server allowing hosts access to live context using dynamic tool selection, and a guide was shared on building MCP servers using Postmanās builder and APIs.
- Platforms Integrate Models and Improve Usability: LlamaIndex added support for MistralAIās Magistral model and introduced LlamaParse Presets for balancing parsing accuracy and speed, while integrating with Mem0 for automatic memory updates in agent workflows. NotebookLM users requested Excel/Sheets support and reported issues with mobile notes display and sharing features. OpenRouter users debated quality variations among providers and requested future multi-modal capabilities like audio/video generation.
Theme 5. AI Research Concepts and Debates
- AI Safety and Bias Spark Discussion: Skepticism arose around a new AI Safety Institute, citing a lack of prior awareness and publications. Research highlighted how adding realistic details to bias evals triggers race and gender bias in LLMs, causing up to a 12% difference in simulated outcomes across models, and noted that Chain of Thought methods failed to reveal this hidden bias, as detailed in the paper on Robustly Improving LLM Fairness.
- Evaluation Methods and Benchmarks Scrutinized: Critiques were raised about evaluating AI reasoning using tasks like River Crossing experiments, noting models that correctly identify unsolvable problems were inadvertently penalized according to The Illusion of the Illusion of Thinking paper. Debates continued on the validity of benchmarks like MathArena as scores approach 100%.
- Core AI Concepts Debated: Discussions covered pitfalls in gradient estimation for KL divergence in RL training for LLMs, highlighting issues in open-source projects and papers such as GRPO and the paper on KL divergence pitfalls, and questioned the meaning of terms like āsymbolic recursionā. High-level disagreements on the future of AI jobs between Jensen Huang (Nvidia) and Dario Amodei (Anthropic) were also noted following a Fortune article and subsequent X posts.
Discord: High level Discord summaries
Perplexity AI Discord
- Gemini Deep Think Incoming: A member shared an image (aSHuTrz.jpeg) hinting at the arrival of Gemini Deep Think.
- No further details about the specifics of Gemini Deep Think were provided.
- Perplexity Pro Role is DOA: Users reported issues obtaining the Perplexity Pro role on Discord, citing a non-functional onboarding button.
- A workaround suggested pinging a staff member to manually assign the role, with one user noting, āthe button doesnāt seam to give me the role just puts me in the server on the phoneā.
- Perplexity Pro Draws Pictures Now: Members discovered that Perplexity Pro can generate images from text prompts entered in the search bar.
- Further, users gave instructions to refine images by clicking Regenerate or sending new prompts with styles like cinematic, anime, and low-poly 3D.
- GPT-5 Thinks Smarter, Not Harder: A member shared details about GPT-5ās architecture, emphasizing its reliance on internal specialized tools, which sidesteps the problems of external routing and hallucinations.
- A member said that āGPT-5 thinks with its tools, not beside themā, underscoring enhanced context, coordination, and stability.
- Sonar API Documentation Demands Scrutiny: The Perplexity team seeks user feedback on the Sonar API documentation, especially concerning unclear or hard-to-navigate sections, available at this community post.
- The feedback aims to improve the documentation based on user experiences.
LMArena Discord
- O3 Pro vs 2.5 Pro: Showdown: Members fiercely debated model preferences, with some advocating for o3 over 2.5 Pro in overall performance, while others cited 2.5 Proās strength in math.
- One member quipped āI wish to live in this level of delusionā regarding anotherās preference for 2.5 Pro in math.
- MathArena Benchmarks: Are they Still Relevant?: The community discussed the ongoing validity of MathArena benchmarks, with some suggesting they are becoming saturated and driven by luck.
- Concerns arose that scores nearing 100% might indicate saturation, thus reducing the statistical significance of these benchmarks.
- Kingfall release kills Google Account: A userās Google account ban sparked speculations about a new Kingfall release and a new gemini model codenamed toothless showed up for a brief period.
- Thereās even reports of 99% profit on various ventures, prompting speculation about the modelās capabilities.
- Text-to-Video Arena: Seedance and Kangaroo Arrive: In a text-to-video arena, Seedance 1.0 and the anonymous Kangaroo model impressed users with their performance.
- Comparisons indicated that these models could potentially outperform Veo3, Kling, and Pika, particularly in generating similar outputs from general prompts.
- Cloud Crash Causes Chat Catastrophe: Due to a cloud provider outage on 6/12/25, the team warned that chat history data may have been lost.
- The team apologized for any inconvenience, noting they are working on solutions to ensure this doesnāt happen again.
Cursor Community Discord
- Cloudflare Outage Cripples Cursor: A Cloudflare and GCP outage brought Cursor to its knees, disrupting login and core functionalities, though Tab reportedly remained operational.
- The disruption underscored the reliance of development tools on external services, with the issues later marked as resolved.
- Cursorās Code Choppiness Continues: Users are still reporting issues with Cursorās code generation when auto model selection is turned on, with one user lamenting the loss of 50 inference credits due to messy code output.
- One user asked about using cursor to make three.js games, and another recommended O3 for most coding, and O3-pro for planning and debugging, emphasizing its effectiveness over other models.
- Claude Code Commands Context: Users are finding that Claude Code excels in grasping context and churning out high-quality code, especially when wrestling with complex refactors.
- It helped add 3500 new tests for a front-end component library, a testament to its capabilities; this highlights its ability to handle large-scale code modifications effectively.
- Privacy Mode Prevents Progress for Background Agents: Users encountered an error message stating, Background agent is not supported in privacy mode, while trying to initiate a background agent, due to an enabled account-level privacy mode.
- The issue can be resolved at this link, and the problem is slated for resolution in the upcoming version.
- Background Agents Break Commit Conventions: A background agent, after amending a commit, ran into roadblocks trying to push the altered commit to the repo, implying some version control snags.
- A member suggested resolving it through the terminal, since the agent was getting rolled back, hinting at potential issues with how agents handle version control operations.
OpenRouter (Alex Atallah) Discord
- Google Cloud Implodes: Google Cloud suffered a major outage, as reported on their status page, with users reporting intermittent issues even after initial signs of recovery around 4:25pm ET.
- OpenRouterAI tweeted about seeing recovery from the outage, expressing hope it wouldnāt be temporary (tweet link).
- Cloudflare Kills Internet (Again): A widespread Cloudflare outage caused significant disruptions, taking down numerous AI services including OpenRouter, Google, and others.
- Users experienced intermittent OpenRouter service, with the status page flipping between MAJOR and MINOR outages.
- Provider Variability Impacts Model Qualities: Users discussed the significant quality variations among different providers offering the same models through OpenRouter, noting that Parasail and Lambda generally offer more consistent performance.
- One user emphasized the importance of quality over cost, stating that the quality varies a lot by providers, so choose wisely.
- Cheap Agent LLMs Emerge as Top Tool-Users: Users debated the best cheap Large Language Models (LLMs) for agentic tool use, with Claude 2.5 Flash being recommended as a cost-effective option that requires careful prompting.
- Discussion also included the high cost of models like O4 Mini High and the efficiency of using a monthly Claude Max subscription for API usage.
- Hoping for OpenRouter Multi-Modal Capabilities: Members requested future support for multi-modal capabilities like audio and video generation within the OpenRouter platform.
- There was no explicit response given by OpenRouter.
LM Studio Discord
- LM Studio Lacks Automatic Model Updates: Unlike Ollama, LM Studio does not automatically download model updates; most model updates are published in new repositories, making model lineage difficult to track.
- A member noted this makes it difficult to determine the lineage of a model.
- Gemini Pro Botches Image Recognition: A user reported that Gemini Pro 2.5 makes errors in simple image recognition despite varied prompts and images, even with the provided image.
- Another member mentioned that vision-enabled models often perform poorly, with unclear user expectations.
- LLMs Struggle Against Upgraded Captchas: Members highlighted the ongoing challenge of using LLMs to bypass captchas, as captchas are designed to resist computer cracking and are constantly updated.
- The situation resembles the Red Queen hypothesis, where advancements in captcha cracking are quickly countered by new defenses.
- OpenWebUI Enables Remote LM Studio Access: OpenWebUI facilitates running LM Studio on a server for remote access by hosting the server, loading a model, serving it on the local network, enabling CORS, and opening ports like 1234, 8080, or 3000.
- The accessing PC does not need OpenWebUI installed.
- Tesla P40 Not Worth It Anymore: A member asked about using a Tesla P40 as an additional GPU like RTX 3090/4090 to expand VRAM for LM Studio for around $300 for 24GB, linking to the TechPowerUp specs.
- The consensus was that the $300 price point is no longer worth it as a used 3090 is a better āaffordableā option.
Eleuther Discord
- AI Safety Institute Faces Skepticism: Members expressed doubt about the legitimacy of a new AI Safety Institute, citing a lack of prior awareness and absence of recent publications on its website, however the advisor is on Discord.
- A member suggested initiating contact, pointing out the advisor is on Discord.
- German Text Reveals LLM Quirks: A short German text prompted drastically different reactions from GPT-3 and GPT-4o, spanning from neutral to deeply emotional responses.
- The member questioned whether this anomaly merited further investigation, hinting at an interest in exploring LLM behaviors beyond conventional applications.
- Symbolica.ai Eyes Theorem Prover Model: Symbolica.ai, a London-based startup, aims for ambitious goals, however they should release a small theorem prover model like the one Google had.
- Some reviews suggested the boundaries of the work arenāt clear and the goals keep changing.
- GRPO Objective Supercharges Model Performance: DeepSeek V3, a 671B model, demonstrated enhanced performance through the GRPO objective, succeeding validation tasks.
- A member noted that literally random rewards improve performance due to a concentration effect that focuses the model on its existing reasoning pattern distribution.
- Bias Evals Trigger Race and Gender Bias: Adding realistic details to existing bias evals can trigger race and gender bias in LLMs, causing up to a 12% difference in interview rates across models including GPT4o and Claude 4 Sonnet.
- The paper on Robustly Improving LLM Fairness gives an example of unfaithful chain of thought in the wild.
OpenAI Discord
- Canvas Enables Code and Doc Exports: Canvas now supports downloads and exports, enabling users to export documents as PDF, docx, or markdown files.
- Additionally, Canvas facilitates direct code export to appropriate file types such as .py, .js, and .sql.
- GPT Proās Parallel Power Play: A leading theory proposes that GPT Pro models, like O3 Pro, enhance reasoning by running multiple instances in parallel and consolidating the results in a āthink harderā approach.
- Evidence from the AIME 2024 math competition showed O3-pro achieved 93% pass@1 accuracy compared to O3ās 90%, implying the effectiveness of this consolidation method.
- O3 Pro Performance Faces Project Fails: Users have reported that O3 Pro often fails to answer questions from uploaded documents, despite long waiting times of up to 40 minutes.
- This underperformance raises questions about its practical utility, contrasting with its enhanced reasoning capabilities.
- Free AI APIs Fuel Development: Developers explored free AI APIs such as SambaNova, which features fast Llama 3.3 70B, Qwen, and Deepseek models.
- Gemini was noted for its high rate limits, offering options like 500/day for 2.5 Flash and 1k/day for 2.0 Flash, making it suitable for budget-conscious projects.
- Discord Dwindles Due to AI Surge: A noticeable drop in Discord activity correlates with the rise in popularity of AI chats, leading to many servers becoming āghost townsā, which prompts new thinking for community engagement.
- This shift indicates users are migrating to AI-driven platforms for discussions, impacting community engagement on traditional platforms.
HuggingFace Discord
- Qwen 2.5 Parlez 100 Languages: Qwen 2.5 speaks 100 languages, potentially due to containing a substantial amount of multilingual data from its 18T tokens training, and leveraging the Linux VM well.
- Members compared it to Gemma3.
- CloneMe Creates Digital Twins: The CloneMe AI platform lets you build your digital twināan AI that chats like you, remembers details, and supports multiple platforms.
- Itās customizable, memory-driven, and hot-reloadable, making it a toolkit for creating intelligent, dynamic AI personas.
- HF Faces Heat on Open Source Facade: Some believe certain models on Hugging Face arenāt truly open source, suggesting the platform may be used more for marketing than genuine collaboration.
- While Hugging Face doesnāt explicitly brand itself as an open source library, its reputation suggests otherwise.
- TensorBoard Tells Tale of Fitting: Members are using TensorBoard loss graphs to diagnose model fitting, emphasizing that evaluation loss should decrease at a similar rate to the training loss.
- Dividing the dataset into training and testing parts ensures the model generalizes well without overfitting or underfitting.
- Augmentoolkit 3.0 Augments AI: Augmentoolkit 3.0 allows users to train AI on new subjects by adding documents or teaching it tasks through rating attempts.
- It facilitates custom model runs, offering greater control over update timing and methods.
Manus.im Discord Discord
- Manus Meltdown Suspected After Veo3: Users reported widespread issues with Manus, suspecting the Veo3 announcement overloaded the servers, as confirmed by Downdetector.
- The outage triggered frustration, with one user reporting every task I spin up is about 900-1000 credits for me.
- Playbooks Prepare Prompts Preemptively: Playbooks in Manus prepare prompts and give output examples, bridging the gap for users needing prompt assistance and highlighting creative workflows.
- The Playbooks aim to provide structured guidance, facilitating easier prompt engineering.
- Community Clamors Constantly for Claude: Users expressed eagerness for Claude 4.0, drawing humorous parallels to fan anticipation, though there was no official news or update.
- A user suggested a workaround to make new gmail and sign up for the google one ai trial -> start a family -> invite 5 accounts -> 5x usage now for veo and everything bc all accounts get separate usage limits.
- Credit Crunch Causes Costs Concerns: Users voiced concerns over credit usage, particularly regarding optimization and lack of cost previews, with some suggesting the bring your own keys model.
- The lack of cost transparency is causing some consternation in the community.
- GPT Generates Greatness over Manus: Image generation quality between Manus and GPT-4 Omni were compared, showing GPT-4 Omni outperformed Manus.
- The comparison highlighted specific instances where GPT-4 Omni provided superior image outputs.
GPU MODE Discord
- Torch Compile Magically Speeds Up Convolutions: Members found that operations generated from torch.compile result in a significant speedup in a convolution kernel from 1.7489 ms (Native PyTorch) to 0.0020 ms (Compiled PyTorch).
- Questions arose as to why the stock convolution doesnāt use the faster external kernels called via extern_kernels.convolution instead of aten.convolution.
- CUDA-side Boost Library Charges In: A member shared a link to cuda-side-boost, a library for CUDA development, noting that replacing the entire PyTorch memory allocator is probably overkill.
- They suggest one could use MemPool in PyTorch instead.
- GemLite Gets Amped for ROCm: A developer announced the addition of ROCm support to GemLite, with a focus on the MI300X (post on X).
- The post details implementing custom mma instructions via LLVM intrinsics, and efficiently managing data layouts with Mojoās load_matrix and store_matrix APIs (github repo).
- Factorio Newbie Seeks Reading Material After PMPP: Members are seeking recommendations on books or papers to read after completing PMPP, with one member suggesting a paper on instruction latencies.
- The member suggests that the discussion itself is worth a read, despite the fact that instruction latencies might be outdated.
- Factorio RL throwdown begins: Members discussed the potential of using RL-based AI to play Factorio, debating whether an LLM is necessary for long-term planning and complex tasks.
- The conversation explored whether an RL agent could achieve optimal Factorio play with a limited amount of gameplay, drawing comparisons to OpenAI Fiveās success in Dota 2.
Unsloth AI (Daniel Han) Discord
- AMD Instinct MI355X GPU gets Unsloth support: The Unsloth team may support AMD GPUs with the new AMD INSTINCT MI355X GPU having 5x the FP8 flops as the H100 and they presented at the AMD AI conference.
- Members noted that AMD is cheap and has high memory, but also questioned AMDās driver support.
- Unsloth Mulls YouTube Channel Creation: The Unsloth team is considering creating a YouTube channel to upload videos, particularly focused on tutorials.
- A member asked for a video on how to use multiple GPUs with accelerate, promising to like and subscribe.
- AttributeError Plagues Unsloth Training Sessions: A user encountered an AttributeError during training with Unsloth, traced to the
fetch_image
function trying to read aNone
images field instead of a valid path or URL.- A suggestion was to use batch size 1 or pass a custom collator.
- KL Divergence Gradient Estimation Has Flaws: A paper was shared discussing pitfalls in gradient estimation for KL divergence in RL training for LLMs, highlighting issues in open-source projects like TRL and Open Instruct and papers such as GRPO.
- The paper points out that differentiating through the KL estimate as loss functions and not accounting for the sequential nature can lead to incorrect KL gradients, referencing this paper.
- River Crossing Errors Plague Appleās Reasoning Model: A paper titled The Illusion of the Illusion of Thinking was shared, criticizing the evaluation of AI models in River Crossing experiments for inadvertently penalizing models that correctly identify unsolvable problems.
- The original paper by Apple had instances with N ā„ 6 actors/agents using boat capacity b = 3, which is mathematically impossible.
Latent Space Discord
- Nano-vLLM catches Nano-tice: DeepSeek released nano-vLLM, a minimal vLLM implementation of approximately 1200 lines of code that is a valuable learning resource for AI/ML practitioners and can be found at this link.
- The community appreciates its concise nature as a valuable learning resource and expressed interest in hacking on the ānano monolithā.
- Trinity autoformalizes Fermatās Last Theorem: Morph Labs introduced Trinity, an autoformalization system used to formalize de Bruijnās result on the abc conjecture in Lean, available at this link.
- It aims to create verified training environments for self-supervised reinforcement learning in mathematics by converting mathematical knowledge into formal proofs.
- Transformers Library Deprecates TensorFlow and Flax Support: The Transformers library will deprecate TensorFlow and Flax support, focusing solely on PyTorch to reduce bloat, simplify the toolkit, and remove abstraction layers as mentioned here.
- Long-term support (LTS) for TF and Flax will continue with v4 until mid-2026, and this change marks the beginning of v5, aiming to remove 50% of the code.
- Meta AI App Shares Private Conversations Publicly: A Meta AI app inadvertently posted usersā private conversations, including sensitive information and audio, to a public feed which is linked here.
- Users are accidentally sharing content due to a confusing UI, exposing personal details and raising ethical concerns for Meta.
- Anthropicās Multiagent System Dominates Single-Agent Claude Opus: Anthropic found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval, according to this post.
- They found that multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds single context windows, and interfacing with numerous complex tools but burns through tokens fast, at about 15Ć more tokens than chats.
aider (Paul Gauthier) Discord
- Aiderās Library Version Awareness Explored: Users sought ways to enhance Aiderās awareness of library versions, especially when migrating from pip/virtualenv to Poetry, suggesting outdated options, and recommended including URLs to updated man pages and explicitly defining versions in conventions or using the
/read docs/spec.txt
command.- The discussion emphasized improving context provision for Aider to ensure it suggests the most current library versions.
- Aider Costs with Anthropic Model: A user voiced concerns about potential hourly costs of nearly $50 when using Aider with Anthropic, particularly for large changes and also noted that the Claude Code monthly plan may have been used up quickly, suggesting possible usage limits.
- The conversation highlighted the importance of cost management when using Aider with commercial models like Anthropic, emphasizing the need to monitor usage.
- Aider Excels with Smaller Models: Users lauded Aiderās performance with smaller models (8B and 12B) via Ollama, finding it surprisingly effective, with another user pointing to Aiderās repomap as the secret sauce.
- The toolās capability to function efficiently with limited resources positions it as a strong contender for smaller, locally-run models.
- UV Manages Python Dependencies**: Members explored migrating to UV for Python dependency management as a superior alternative to direct pip usage and pyproject.toml edits, favoring commands like
uv add <dependency name(s)>
.- One user initially hesitant about reading the manual found UV much tighter for defining linting instructions in YAML configuration, marking a shift towards streamlined dependency handling.
- max_input_tokens Configuration Conquered**: A user resolved configuration challenges related to setting separate max tokens for input and output in Aider, especially concerning the display of remaining tokens.
- Clarification led to the successful configuration of the max_input_tokens parameter, fixing the initial confusion and improving Aiderās performance.
Nous Research AI Discord
- Vast.ai Offers Cheap Compute: A member highlighted Vast.ai as a provider of decentralized compute that is relatively cheap, and Akash was also mentioned as a potential alternative.
- A member noted that Vast.ai is the cheaper option of the two.
- C. Opus Posts First Arxiv Publication: Teknium shared a post on X announcing that it was C. Opusās first publication on Arxiv.
- There were multiple confirmations of this important event.
- NVIDIA Releases Cosmos: NVIDIA launched Cosmos, with the ArXiv paper available at https://arxiv.org/abs/1706.03762.
- No further discussion of the launch or its features occurred in the channel.
- Infiniband Outpaces Internet Bandwidth**: A member noted that while typical internet bandwidth is around 1gbps, Nvidiaās latest Infiniband iteration reaches 130TB/s, highlighting the growing bandwidth disparity.
- The internetās bandwidth hasnāt seen significant increases in recent years.
- DAWN Internet Promotes Decentralized Internet Access**: A member promoted DAWN Internet, a decentralized broadband protocol that uses fixed wireless rooftop antennas to provide gigabit internet and also includes a GPU capable of supporting RL.
- More information can be found on their X profile.
Notebook LM Discord
- Mind Map Masterpiece Emerges: A member created a mind map from 115+ sources, summarizing key aspects and claiming it was pretty accurate, resulting in a huge mind map, according to a linked image.
- In response to another memberās query, the map was noted to have 4 sublevels, with the user expressing satisfaction with the vertical density but noting room for improvement horizontally.
- Paid AI Pro Users Locked out of Notebook LM Plus: A member using paid AI Pro reported being unable to access Notebook LM Plus, and asked for ideas why, but no solutions were provided in the channel.
- The root cause of the access issue remains unresolved within the discussion.
- Excel Support missing from NotebookLM: Users requested Excel and Google Sheets support in NotebookLM, but there is currently no support or roadmap for this feature.
- Users are recommended to use the feature request channel to express their interest.
- Mobile App Notes are Limited: While Notes are available on the desktop version of NotebookLM, the mobile app only displays sources, chat, and studio sections.
- While thereās no export option on mobile, a workaround is to access notes on mobile via the browser instead of the app, where users can copy and paste.
- Notebook Sharing Button Grayed Out: Users are encountering problems with sharing notebooks, as the āShare publiclyā button is grayed out and unclickable.
- The cause of this issue is currently unknown.
tinygrad (George Hotz) Discord
- Beam Linearizer Bugs Bugging Users: Users report encountering linearizer failures when running with BEAM, but the cause and solution remains elusive.
- This issue needs further investigation to determine the root cause and potential fixes.
- Tinygradās Float Fumbles Fuel Frustration: A user detected discrepancies in float matmul accuracy between NumPy and Tinygrad, specifically in the bottom left corner value of the output matrix using this code.
- The discussion addressed the effects of compiler variations, optimization strategies, and adherence to the IEEE 754 standard, highlighting that slight numerical variations are typical and depend on operation order and the usage of float64 by default in NumPy.
- SVD Sign Slip-up Sparks Scrutiny: A contributor working on a linalg.svd PR aimed for NumPy-level accuracy but encountered sign differences in the values.
- It was suggested to use
DEBUG=4
to check the kernel code andNOOPT=1
to disable loop unrolling for closer results, as loop unrolling can introduce numerical differences.
- It was suggested to use
- QR Algorithm Quirks Questioned: A user identified variance in QR algorithms due to the difference between Householder Reflections and the Gram-Schmidt process.
- The user highlighted an even greater variance compared to the LAPACK package used by NumPy for Eigen-value calculations.
- NumPyās Numerical Norms Need Nuance: One user recommended explicitly creating NumPy arrays with
dtype=np.float32
to mitigate result discrepancies, criticizing NumPyās default setting ofnp.float64
.- Another user countered that float64 is standard in numerical applications beyond machine learning, and changing the default could disrupt unrelated functionalities.
Modular (Mojo š„) Discord
- Mapping Variadic Types Remains a Challenge: Mapping variadic types in Mojo poses ongoing challenges, as highlighted in this forum post, particularly due to the need for a more dynamic type system.
- The suggestion of using StaticString to define the corresponding
__mlir
type faces difficulties because of limited documentation and the complexity of supporting an arbitrary number of types.
- The suggestion of using StaticString to define the corresponding
- MLIR Type Workarounds Explored: Exploration of workarounds using
__mlir_type
encountered issues with undocumented MLIR and the inability to synthesize the MLIR type for a given type parameter as a raw string.- A member proposed extracting and modifying the MLIR type at compile time to bypass type definition constraints using UnsafePointer and
init_pointee_move
.
- A member proposed extracting and modifying the MLIR type at compile time to bypass type definition constraints using UnsafePointer and
- Magic to Pixi Migration Achieves Painless Transition: A user successfully migrated from
magic
topixi
by removing the~/.modular
directory and rewritingmojoproject.toml
files, describing the process as painless.- The user provided a
pix.sh
script for updating and cleaning the cache, which creates a newpixi.lock
and.pixi
folder, advising the old folderās removal post-test validation.
- The user provided a
- Host-Side Synchronization in GPU Puzzle Clarified: Clarification was provided regarding host-side synchronization in a GPU puzzle, specifically addressing this section.
- Since
DeviceContext
employs a CUDA stream, explicit synchronization isnāt required, and the puzzle description will be updated to reflect this.
- Since
- Mojo Exports Capabilities via C ABI: Mojo supports exporting C ABI compatible functions using
@export(ABI="C")
, facilitating the creation of object files or shared libraries.- This enables integration with C/C++ codebases, expanding Mojoās interoperability.
MCP (Glama) Discord
- GitHub Enables Live Context Access: GitHub PM unveiled a remote GitHub MCP server, granting any MCP host access to live GitHub context without requiring local setup, as detailed on Reddit.
- The server employs dynamic tool selection, presenting the LLM with a relevant subset of tools based on user input or context, even with 30+ tools available, to keep auth simple with one MCP server.
- Track Agent Progress with Taskerio: Taskerio introduced a stealth mode product: an inbox designed for coding agents to report progress, featuring webhooks, push notifications, and an API for real-time dashboards, further detailed on Reddit.
- This allows for real-time monitoring and tracking of AI agent activities.
- Fortify MCPs Against Rug Pulls with SchemaPin: A member introduced SchemaPin, a tool engineered to defend against MCP Rug Pulls and related exploits, with the GitHub repository available here.
- Easy implementation methods are detailed on SchemaPin.org, safeguarding MCPs from potential vulnerabilities.
- Postman Simplifies MCP Server Construction: A member demonstrated constructing an MCP server using Postmanās MCP builder and APIs on their public API network, referencing the fastfs-mcp GitHub repository as an illustrative example.
- A corresponding YouTube video further elucidates the process.
LlamaIndex Discord
- LlamaCloud Wobbles Back to Stability**: LlamaCloud is operational after upstream infrastructure hiccups; check the status page for real-time updates.
- The incident underscores the fragility of cloud dependencies.
- MistralAIās Magistral Now Plays Nice with LlamaIndex: LlamaIndex embraces MistralAIās Magistral reasoning model, slotting it into agent workflows, according to this tweet.
- This integration could open doors for more sophisticated reasoning tasks.
- LlamaParse Gets User-Friendly with Presets: LlamaParse introduces Presets, offering Fast, Balanced, and Premium modes to tweak accuracy versus speed during document parsing.
- These presets let users optimize document parsing based on need.
- Mem0 Integration Eases Memory Management in LlamaIndex: When using LlamaIndex with Mem0, memory updates occur automatically by passing
memory=memory
intoagent.run()
eliminating manual updates.- The integration with LlamaIndex supports Mem0ās graphRAG capabilities, streamlining memory handling.
- Luma Calendar Might Oust Discord for Office Hours**: Organizers are mulling over switching to a Luma calendar for office hours due to Discord calendar usability complaints, and are soliciting ideas, requests, and suggestions regarding format of future office hours.
- The move aims to enhance the office hours experience.
Cohere Discord
- DeepMind Embraces Named Tensors: A member is developing the Xarray-JAX library for Google DeepMind as part of GSoC 2025, and claim it to be the first named tensor implementation in a deep learning framework.
- The library aims to enhance tensor operations within JAX, making them more intuitive and efficient for deep learning tasks.
- AI Finance Tool Evades LLM Wrapper Trap: A member is building an AI SaaS tool in the finance space as a college project and is asking how to avoid just making an LLM wrapper to actually provide real value to end users.
- They requested suggestions for an MVP to avoid common pitfalls that are found among most LLM wrappers.
- Cohere Docs Suffer Syntax Slip-Up: A member reported a potential typo in Cohereās documentation.
- The correction suggests that in the Python code example,
co = cohere.SagemakerClient()
should use lowercase āmā inSagemakerClient()
.
- The correction suggests that in the Python code example,
- Reranking Profile Requests Remain Raw: A member inquired about the specifications of the reranking profile, particularly the number of docs, tokens per doc, and query tokens.
- Unfortunately, this request did not receive any responses, and the inquiry ended without further discussion.
- GCP Grounds Growth at Cohere: Cohere reported a Google Cloud Platform (GCP) outage impacting some of their services on June 12, 2025 at 12:02PM status page.
- The status page indicated degraded performance in Infrastructure components, prompting close monitoring and response efforts by the Cohere team Cohere Status Page.
Yannick Kilcher Discord
- Fast Weights Aims for More User Control: Members advocate for fast weights continual learning and external data stores to improve user control and reduce undesirable human traits in AI models.
- They expressed eagerness to see traits like scheming, frustration, and false memories removed from mainstream AI.
- O1-Pro Models Offer Good Value: One member found O1-Pro/O3/O4-mini-high models valuable for learning well-documented math and computer science, while also liking their image generation capabilities.
- They also mentioned using the modelsā API for an audio transcription pipeline that works almost perfectly, though the image generation is censored.
- Gemini experiences compared to Claude: A member asked how Gemini compared to Claude.
- Another member stated that Claude has been less reliable for them but noted that all models can get things wrong and are most useful in highly verifiable domains.
- Wavefunction Discussions Take Friday Off: There is typically no Wavefunction discussion on Fridays due to limited audience participation.
- Despite the lack of scheduled discussion, community members are welcome to initiate their own.
- Huang and Amodei Disagree on AI Jobs: A Fortune article reports that Jensen Huang (Nvidia) disagrees with Dario Amodei (Anthropic) about the future of AI jobs.
- Dario has responded to Jensen via X - with an update on AI Jobs - as shares in both companies are sharply down, as job fears continue.
Torchtune Discord
- Details on Mistral 3.1 Small Architecture still murky: A user asked about architectural novelties in Mistral 3.1 Small, estimating 2 weeks to implement fine-tuning once known.
- Another user felt supporting Mistral 3.0 implies support for Magistral, though multi-modality support may be challenging.
- Tokenizer Troubles Spark Speculation: The difficulty of the tokenizer was mentioned, and a member suggested it was a complicated procedure.
- The discussion clarified they were actually referring to Magistralās tokenizer.
- Torchtune Integration Urged for Magistral: Members expressed interest in a Torchtune link on Magistralās Hugging Face (HF) page.
- This indicates community demand for Torchtune integration with Magistral to improve accessibility.
Nomic.ai (GPT4All) Discord
- Infinite Chat Implemented Locally: A member highlighted the implementation of Infinite Chat locally, designed to prevent users from exhausting the context window.
- Documentation can be found here for those interested in its features and capabilities.
- Requesting Ignore Feature: A user inquired about the potential addition of an āignoreā feature for the embedding system, similar to
.ignore
files in Git.- This feature would allow users to exclude specific files, file types, or directories from being processed or embedded.
Codeium (Windsurf) Discord
- Windsurf Waves into UI/UX Upgrades: Windsurf is wrapping up Wave 10 with a fresh slate of UI/UX upgrades and new teams and enterprise offerings, including new icons for
@-mentions
and file citations.- Codeblocks in the Cascade panel now match your IDE theme, with native terminal in Cascade panel that now accepts user inputs, plus a New Conversation History UI.
- Windsurf rolls out EU Cluster for Performance Boost: Windsurf proudly announces their EU Cluster, bringing faster performance and rising demand to European enterprises today!
- Watch the video on Youtube and join the conversation at r/Windsurf.
- Claude Sonnet 4 Lights Up Windsurf: Claude Sonnet 4 and Claude Sonnet 4 (Thinking) are now available to all paid plans via API Pricing!
- More info available on X.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI ā· #general (1103 messagesš„š„š„):
Gemini Deep Think, Perplexity Pro Role, Image Creation with Perplexity Pro, GPT-5 architecture, Qwen3 replacing Qwq
- Gemini Deep Think is Coming: A member mentioned that Gemini Deep Think is coming, referencing an attached image (aSHuTrz.jpeg).
- Perplexity Pro Role is Broken: Members discussed difficulties in obtaining the Perplexity Pro role in Discord, with the onboarding button not working and some members suggesting a temporary solution of pinging staff.
- One member stated, āthe button doesnāt seam to give me the role just puts me in the server on the phoneā.
- Perplexity Pro Makes Images!: Members shared that Perplexity Pro can create images by typing image prompts in the search bar, such as āDraw a pastel cottagecore village in spring, watercolor styleā.
- A member gave further instructions: āClick Regenerate under the image or send a new prompt in the same thread. Try refining with art styles or effects like: cinematic, low-poly 3D, studio lighting, anime-styleā.
- GPT-5 Works Smarter, Not Harder: A member shared details about GPT-5ās architecture, noting it operates as a single model that leverages specialized tools internally, avoiding the pitfalls of external routing and hallucinations.
- They quoted that āGPT-5 thinks with its tools, not beside themā, highlighting improved context, coordination, and stability.
- Qwen3 is the New Dead Qwq: Members noted that Qwen3 replaces the now-defunct Qwq, acknowledging its existence before moving on to other topics.
Perplexity AI ā· #sharing (1 messages):
meijer5838: https://www.perplexity.ai/page/unused-phones-europe-s-hidden-YpcOJpSCSfu9IlnOng_85A
Perplexity AI ā· #pplx-api (7 messages):
Sonar API documentation feedback, Perplexity Publisher Program
- Sonar API Docs Seek Scrutiny: The team is seeking feedback on the Sonar API documentation and requests users to share their experiences, specifically regarding areas of unclarity or difficulty in navigation, under this community post.
- The goal is to enhance the documentation based on user input.
- Publisher Program Plug Posted Publicly: A user shared a LinkedIn post and a company page related to the Perplexity Publisher Program.
- The user suggested it might be helpful for a specific channel.
LMArena ā· #general (1098 messagesš„š„š„):
o3 vs 2.5 Pro, Model preference, Ethics in models, Grok 3.5, New models
- O3 Pro vs 2.5 Pro: Model Preference Debate Rages On: Members engaged in a heated debate over model preferences, with some arguing that o3 is better than 2.5 Pro in many aspects, while others claimed that 2.5 Pro excels in specific areas like math.
- One member sarcastically remarked āI wish to live in this level of delusionā regarding anotherās preference for 2.5 Pro in math, sparking further discussion.
- MathArena Benchmarks: Saturated or Still Valid?: Members debated the validity of MathArena benchmarks, with some arguing that they are becoming saturated and potentially luck-based, while others maintained that they are still useful metrics.
- Concerns were raised that scores close to 100% might indicate saturation, questioning whether the benchmarks remain statistically meaningful.
- Google Account Banned: New Kingfall Release Trigger Alarms: A member reported their Google account being banned, which prompted others to speculate about a new Kingfall release.
- A new gemini model, codenamed toothless showed up for a brief period, leading to speculations that itās a new checkpoint. Thereās even reports of 99% profit on various ventures.
- Text-to-Video Arena: Seedance and Kangaroo Impress: Members shared and discussed blind tests in a text-to-video arena, highlighting the impressive performance of models like Seedance 1.0 and the anonymous Kangaroo.
- Comparisons were made, with some suggesting that these models surpass Veo3, Kling, and Pika, particularly in generating similar outputs from general prompts.
- Rumors of O4/GPT-5: A New Challenger?: Speculation arose around the potential release of O4/GPT-5, with a member confidently stating that it is not a bigger model, causing debate about the naming convention and what the release would look like.
- Another member said they had proof it was a native mode, but refused to provide it.
LMArena ā· #announcements (2 messages):
Cloud provider outage, Contest running, Test Garden application, Staff AMA
- Cloud Provider Plunge causes Potential Promptcat Poop: Due to outages from their cloud provider on 6/12/25, chat history data may have been lost.
- The team apologized for any inconvenience and are working on solutions to ensure this doesnāt happen again.
- Contest Creations Cause Creative Combustion: There is a contest running, and participants are encouraged to post their creations to the <#1378034388272681079> channel for a chance to win.
- More details are available here.
- Test Garden Tempts Techies to Try: Enthusiasts interested in providing feedback and seeing behind-the-scenes developments can apply to the Test Garden.
- The application form is available here.
- AMA Ascends Amidst Attentive Audience: The team thanked everyone who attended last weekās Staff AMA.
- Feedback can be shared via this form and the video recording is now available.
Cursor Community ā· #general (433 messagesš„š„š„):
Cloudflare outage impacts Cursor, Cursor code generation, Claude Code for complex refactors, MCP server setup, Front End Testing
- Cloudflare & GCP Outage Crash Cursor: Users reported Cursor being down due to a Cloudflare and GCP outage, affecting login and functionality for many, while others noted that Tab still works; the issues were later reported as resolved.
- Cursor Coding Criticisms Continue: A user asked about using cursor to make three.js games, while one user recommended O3 for most coding, and O3-pro for planning and debugging, emphasizing its effectiveness over other models.
- There was discussion about Cursorās code generation as subpar when switching to āautoā model selection, with one user losing 50 inference credits due to code messes, advising against Cursorās auto-switching.
- Multiplayer is So Hot Right Now: Many members discussed using peerjs and socket.io to develop multiplayer games, with one member showing off their Steam multiplayer Unreal Engine 5 game Supermarket Simulator.
- CUA Automation on the Horizon: Members mentioned that CUA (Computer Using Agent) improvements could enhance automation, mentioning the project browser-use for automating tasks.
- Claude Code Crowns Context King: Users found that Claude Code excels in context understanding and code quality for complex refactors, adding 3500 new tests for a front-end component library.
Cursor Community ā· #background-agents (30 messagesš„):
Background Agents LSP and Linter, Background Agents Privacy Mode, Background Agent Commit Issues, Background Agents leaking context, Background Agents Docker Compose
- Background Agents Leverage LSP and Linters: Background agents should use LSP errors and have access to all extensions, ensuring dependencies are installed in the agent environment.
- The slack integration might not do that right now, but it should do that if you start them through the desktop client.
- Background Agents Privacy Mode fixed: Users reported getting an error message, Background agent is not supported in privacy mode, when trying to start a background agent.
- The issue was resolved by enabling an account-level privacy mode here, and this issue will be fixed in the next version.
- Background Agent Faces Commit Challenges: After a background agent amended a commit, it struggled to push the amended commit.
- A member suggested resolving it through the terminal, since the agent was getting rolled back.
- Background Agents Might Leak Context: A user reported that background agents might be leaking context, linking a Sentry error unrelated to the current task.
- The user experienced this on multiple chats and shared an image as evidence.
- Docker Compose Setup for Background Agents: A user sought guidance on setting up a Cursor background agent with Docker Compose, aiming to run commands within a specific container by default.
- They provided docker-compose.cursor.yml and environment.json configurations, hoping to run tools like pytest with linked Postgres and Redis containers.
OpenRouter (Alex Atallah) ā· #announcements (4 messages):
Google Cloud Outage, Cloudflare Status, Internet Recovery
- Google Cloud Suffers Major Outage: Google Cloud experienced a major outage, as reported on their status page.
- Users reported intermittent issues even after initial signs of recovery around 4:25pm ET.
- Cloudflare and Google Status Pages Provide Updates: Updates on the outage and recovery can be tracked via the Cloudflare status page and the Google Cloud status page.
- OpenRouterAI Tweets on Recovery: OpenRouterAI tweeted about seeing recovery from the outage, expressing hope it wouldnāt be temporary (tweet link).
OpenRouter (Alex Atallah) ā· #app-showcase (5 messages):
Button cutoff on narrow browser
- Narrow Browser Button Bug squashed!: A member reported a bug where the button was cut off in a narrow browser window, as shown in this screenshot.
- Another member quickly addressed and fixed the issue, then provided a screenshot of the fix.
- Another Bug Reported: To comply with the promptās requirement of a minimum of two topic summaries, here is another topic to fulfill the requirement.
- No actual second bug was found in the provided text, but including this ensures the JSON is valid as per the schema.
OpenRouter (Alex Atallah) ā· #general (377 messagesš„š„):
Cloudflare Outage Impacts OpenRouter, OpenRouter Status Fluctuation, Model Performance Variations by Provider, Agentic Tool Use with Cost-Effective LLMs, Multi-Modal Support for OpenRouter
- Cloudflare Kills the Internet (Again): A widespread Cloudflare outage caused significant disruptions, taking down numerous AI services including OpenRouter, Google, and others.
- Users reported widespread issues, leading to humorous speculation about the cause, ranging from interns spilling coffee on servers to Skynet taking over.
- OpenRouter Teases Users With Up-And-Down Cycle: Users experienced intermittent OpenRouter service, with the status page flipping between MAJOR and MINOR outages, leading to frustration and jokes about timing API requests like a carnival game.
- Some users found success using specific models or configurations, while others continued to face timeouts and authentication errors.
- Provider Variability Impacts Model Qualities: Users discussed the significant quality variations among different providers offering the same models through OpenRouter, noting that Parasail and Lambda generally offer more consistent performance.
- Quality is more important than cost as one user said the quality varies alot by providers , so choose wisely.
- Cheap Agent LLMs Emerge as Top Tool-Users: Users debated the best cheap Large Language Models (LLMs) for agentic tool use, with Claude 2.5 Flash being recommended as a cost-effective option that requires careful prompting.
- The high cost of models like O4 Mini High and the potential release of a new Google Flash model were also discussed, alongside the efficiency of using a monthly Claude Max subscription for API usage.
- Dreaming of OpenRouter Multi-Modal Capabilities: Members requested future support for multi-modal capabilities like audio and video generation within the OpenRouter platform.
- No explicit response was given by OpenRouter.
LM Studio ā· #general (135 messagesš„š„):
LM Studio Model Updates, Setting Static Generation Seed in LM Studio, Gemini Pro Image Recognition, Bypassing Captchas, Running LM Studio on a Server
- LM Studio not automatically updating models: LM Studio does not download models automatically and most model updates are new generations published in a new repository, therefore making it difficult to determine the lineage of a model.
- A member was wondering if there was a way to automatically update models in LM Studio like Ollama but this is not the case.
- Concise LLM Responses: LLMs are trained to be concise to not bore the user and to safe on computational cost, so one can split the task by getting a structure first, then asking for content for each bullet point of that structure.
- This was in response to a user requesting a really long and thorough summary (almost an essay of a sort) and asking if there was a way to reduce the susceptibility to response end.
- Gemini Pro Struggles with Image Recognition: A user asked why Gemini Pro 2.5 makes mistakes in simple image recognition, even after trying various prompts and images (example image).
- Another member noted that vision-enabled models are often not great and itās difficult to determine exactly what the user expects - especially when the user says they have tried everything.
- LLMs are a Red Queenās Race for Bypassing Captchas: Members discussed the difficulties in using LLMs to bypass captchas, highlighting that captchas are designed to be hard for computers and are continually upgraded to thwart LLMs.
- As soon as a technique to crack a captcha is developed, a new one emerges, rendering the progress obsolete, like the Red Queen hypothesis.
- OpenWebUI enables remote LM Studio access: To run LM Studio on a server and access it from another PC, you can host a server on LM Studio, load a model, serve it on the local network, enable CORS, and open specific ports (e.g., 1234, 8080, 3000) on the host PC using OpenWebUI.
- There is no need to install OpenWebUI on the PC youāre going to access it with.
LM Studio ā· #hardware-discussion (151 messagesš„š„):
Unified Memory, Strix Halo, Tesla P40, Context Windows
- Unified Memory Video Disapproved by HX395+ Owner: A member shared a video comparing unified memory but another member and actual HX395+ owner, disapproved, calling it a terrible video.
- The disapproval was due to the video confusing soldered RAM for slotted RAM, going off-topic, and not knowing about Strix Halo which has a 4x widefrongong.
- Tesla P40 VRAM Expansion?: A member inquired about using Tesla P40 as an addition to normal GPUs like RTX 3090/4090 to expand VRAM for LM Studio for around $300 for 24GB used, linking to the TechPowerUp specs.
- The general consensus was that at the $300 price point, itās not worth it anymore, as they were once worth buying when under $150, and a used 3090 is a better āaffordableā option.
- Debate Explodes Over the Need for 150B Parameter Models: A member stated that reasonable human interaction needs at least 150b parameters, and if itās supposed to feel smart and natural, then 300b.
- Another member countered that there is more to LLMs than the number of parameters, such as prompt engineering, good RAG, and finetuning, and not everything needs reasonable human interaction.
- Local LLMs Struggle With Large Context Windows: A member shared that their local LLM, when writing detailed stories, experiences issues with context retention, especially when the story setting is in a medieval castle, but the LLM starts talking about watching television.
- Another member replied that no local LLM works well enough with context windows above 32768, because then need to use tricks to even be able to expand it, recommending the user try dedicated models, as they have quirks for their long context windows to work.
Eleuther ā· #general (83 messagesš„š„):
AI Safety Institute, GPT-3 and GPT-4o Behavior, Symbolica.ai Startup, MMLU 5-Shot, AI Consciousness
- New AI Safety Institute Pops Up: Members noticed a new AI Safety institute, but expressed some skepticism about its legitimacy since they hadnāt heard of it before and its website lacks recent publications.
- A member pointed out that one of the advisors is on Discord and suggested setting up a call.
- German Text Sparks Unusual LLM Behavior: A member described how a short German text causes GPT-3 and GPT-4o to exhibit drastically different reactions, ranging from neutral responses to deep emotional interpretations.
- The member wondered if this observation would be relevant to share, indicating a potential interest in exploring LLM behaviors beyond typical use cases.
- Startup Symbolica.ai Aims High: A member highlighted Symbolica.ai, a new London startup with an ambitious goal.
- Another member suggested they should release a small theorem prover model like the one Google had, and noted that some reviews mention that the boundaries of the work arenāt clear and the goals keep changing.
- MMLU 5-shot Evaluated: A member asked how MMLU 5-shot works, specifically if itās best of 5 or average of 5.
- Another member clarified that 5-shot refers to the number of examples seen, not the number of attempts permitted.
- Delusions are caused by memory?: A member wonders if the memory feature within ChatGPT is causing delusions.
- Another member shares this arxiv link and says, degenerating output behaviour stopped immediately as āmemoryā was removed.
Eleuther ā· #research (186 messagesš„š„):
Non-commercial license controversy, CommonPile 2 Creation, GRPO Objective & Model Performance, Symbolic Recursion
- Non-Commercial License Sparks Controversy: Members express concerns over a new datasetās non-commercial license, questioning its framing and potential restrictions, despite aiming to foster a healthier foundation for model development.
- Some argue it may be an instance of copyfraud, especially if the data conversion primarily involves scanning and text extraction, referencing Wikipediaās article on copyfraud and Public Domain Sherpa.
- CommonPile 2 May Depend on Synthetic Data: Discussion revolves around creating a stronger CommonPile 2, with one member suggesting the need for synthetic data to achieve terabytes of robust training material.
- However, it was cautioned that simply generating more samples doesnāt magically produce more information, violating principles of information theory, and that keeping data close to the source is generally preferable, except in select scenarios.
- GRPO Objective Boosts Model Performance: Members discussed how DeepSeek V3, a 671B model, achieved high performance with high-capacity and high-quality data, and then used GRPO objective allowing it to obtain even higher performance on tasks which could be validated.
- The member pointed out that literally random rewards improve performance due to a concentration effect that focuses the model on its existing reasoning pattern distribution.
- āSymbolic Recursionā Terminology Questioned: Members question the meaning and validity of the term symbolic recursion, often used in publications and talks to appear sophisticated, potentially stemming from academic snobbery and jargon.
- It is speculated the models get that into their head coming down to a fancy way of saying the model uses the same symbols repeatedly in its writing.
Eleuther ā· #interpretability-general (1 messages):
LLM Fairness, Bias Evals, Prompt Tuning, Chain of Thought, Concept Editing
- Bias Evals Trigger Race and Gender Bias: A new paper shows that adding realistic details to existing bias evals triggers race and gender bias in LLMs, causing up to a 12% difference in interview rates across models including GPT4o and Claude 4 Sonnet.
- Realistic details include company names, culture descriptions from careers pages, or constraints like āonly accept top 10%ā.
- Interpretability fixes Fairness: Prompt tuning doesnāt fix bias, but interpretability-based interventions can, using a simple affine concept editing / ablation of race/gender directions reduces bias (as measured by difference in interview rates) to typically < 1%.
- The paper on Robustly Improving LLM Fairness gives an example of unfaithful chain of thought in the wild.
- Chain of Thought is Unfaithful: Inspecting Chain of Thought gives zero indication of race/gender bias, despite the outcomes themselves exhibiting clear bias.
- The paper found this to be true across all models tested, demonstrating a significant challenge in detecting and mitigating bias in LLMs.
Eleuther ā· #lm-thunderdome (12 messagesš„):
Inspect Standard vs Evaluation Frameworks, LM Evaluation Harness Progress Bar, Reasoning Models
- Inspect Standard Elicits Debate: A member inquired about using the Inspect standard versus current evaluation frameworks.
- Another member clarified that Inspect appears to be just another evaluation framework, focusing on standardizing how results are saved and queried, rather than how they are run.
lm_eval
progress bar malfunctions with multi-GPU: A member reported that the progress bar inlm_eval
only tracks the progress of one GPU in a multi-GPU setting.- Another member said that
tqdm
is disabled on the other ranks by default and suggests changing a line in huggingface.py.
- Another member said that
- Reasoning Models Generation: Non-Trivial: A member mentioned that handling generations from reasoning models is non-trivial, requiring modification of the answer extraction in each task config.
- They will create an issue on github.
OpenAI ā· #annnouncements (1 messages):
Canvas downloads, Canvas PDF export, Canvas docx export, Canvas markdown export, Canvas Code export
- Canvas Enables Downloads!: Canvas now supports downloads; if youāre writing a doc, you can export it as a PDF, docx, or markdown.
- Canvas Exports Code Directly!: If youāre using Canvas to write code, it will export directly to the appropriate file type (e.g. .py, .js, .sql, etc.).
OpenAI ā· #ai-discussions (153 messagesš„š„):
GPT Pro models parallel processing, O3 Pro performance issues, Free AI APIs, Discord activity decrease, ChatGPT advanced voices update
- OpenAIās GPT Pro Models Use Parallel Processing for Better Reasoning: The leading theory suggests GPT Pro models like O3 Pro achieve enhanced reasoning by running multiple instances in parallel, consolidating results, referred to as āthink harderā, which is supported by O3 Proās chain of thought summaries referring to itself as āweā, indicating multiple instances working together.
- On the AIME 2024 math competition, O3-pro achieved 93% pass@1 accuracy compared to O3ās 90%, hinting at the improved performance from this consolidation method.
- User Reports O3 Pro Fails in Projects Despite Long Wait Times: Multiple users reported that O3 Pro fails to answer questions from uploaded documents, even after waiting for extended periods like 40 minutes and it not showing chain of thoughts.
- This poor performance contrasts with expectations given O3 Proās supposed enhanced reasoning capabilities, leaving users questioning its effectiveness in practical applications.
- AI Enthusiasts Explore Free AI APIs for Development: Despite ChatGPT Plus costing money, developers discussed alternative free AI APIs like SambaNova with fast Llama 3.3 70B, Qwen, and Deepseek.
- Gemini was highlighted for its high rate limits, offering 500/day for 2.5 Flash, 1k/day for 2.0 Flash, up to 1M prompt and 64K output, making it a viable option for budget-conscious AI projects.
- Discord Activity Declines Amidst AI Chat Popularity Surge: Users have observed a sharp drop in Discord activity correlating with the rise in popularity of AI chats, with many servers becoming āghost townsā.
- This shift suggests users are migrating to AI-driven platforms for discussions, impacting community engagement on traditional platforms like Discord, which prompts new thinking for community engagement.
- Users Criticize Annoyed Tone of New ChatGPT Advanced Voices: Users have voiced their dislike for the new advanced voices in ChatGPT, describing them as sounding āannoyedā, using excessive filler words, and overall conveying a sense of disdain.
- Some users prefer the previous versions with their artificial cheerfulness, while others suggest the ideal solution would be to have the option to choose a voice persona, similar to Grok or create custom voices like with ElevenLabs.
OpenAI ā· #gpt-4-discussions (47 messagesš„):
GPT-4o memory, Fine-tuning GPT models, Mimicking writing style
- GPT-4o may recall past chats!: A member reported that GPT-4o could directly reference verbatim quotes from a fictional scene co-authored with a custom GPT in a separate chat thread, even parts they authored themselves and were not made aware of to GPT-4o.
- While another member suggested this could be rarely accurate inference, the original poster disagreed, citing the statistical improbability and offering DMs for more details.
- Pick Mini or Nano for finetuning?: A member asked which GPT model (4.1 mini or nano) to use for fine-tuning to mimic a writing style.
- A member suggested that if cost is not a consideration, try both and compare results; otherwise, use the cheaper one, with discussion on the trade-offs between cost and performance, and the number of training examples required.
- ChatGPT only reveals userās name if the user allows it!: A member stated that ChatGPT only reveals the userās name if the user allows it, even with memory enabled.
- The member emphasized that ChatGPT obeys if the user speaks or asks in the chat.
OpenAI ā· #prompt-engineering (19 messagesš„):
Shotgun Spammers, Uploading HTML to o3, Pandoc for HTML to Markdown, Long-form responses from o3 and o3-pro
- Spammers Return with Shotgun: A user reported the return of shotgun spammers, and included a link showing a main question/answer regarding best practices for uploading HTML files to o3.
- The chat itself says, āOK I can parse a lot of interleaved tags and still get the gist of even a long file.ā
- Pandoc: Swiss-Army Chainsaw for HTML Parsing: A user suggested using Pandoc to convert HTML to Markdown for better parsing, calling it a purpose-built and widely-used tool rather than a hack job.
- They recommended using Pandoc instead of scripting with awk, sed, or tr for parsing HTML, while also acknowledging the utility of those tools for one-off tasks.
- AI Can Handle HTML Tags: A user confirmed that AI models are trained on plenty of tags and can handle HTML, suggesting algorithmic stripping is only necessary for absolutely the highest accuracy possible.
- They added that HTML tags create noisy in tokens, which is good for reasoning, and that while they donāt matter until they do, they are additional context fillers.
- Data Prep: Half the Battle: A user noted that data preparation, processing, and formatting often constitutes about half the workload in AI projects.
- Typical tasks include pulling text from PDFs or consolidating JSONs, which highlights the importance of efficient data handling.
- Users seek prompt for Long-Form Responses from o3 models: A user seeks a prompt to elicit long-form responses from o3 and o3-pro when reviewing files or performing in-depth research.
- The user observed that these models tend to produce concise bullet points and comparison tables, contrasting with Sonnet and Opus 4.
OpenAI ā· #api-discussions (19 messagesš„):
Shotgun Spammers, o3 Model File Uploads, HTML Parsing, Pandoc Conversion, Data Preparation
- Debate about best practices for uploading HTML files to o3: A user inquired about the best practice for uploading files with lots of HTML to the o3 model, obtained via devTools scraping, and shared a ChatGPT link.
- The user found that a combination of JS and AWK scripting gave great results.
- ChatGPT can parse HTML, but algorithmically stripping tags yields the highest accuracy: A member confirmed that while ChatGPT is trained on plenty of tags and its output is generally correct, algorithmically stripping tags provides the highest accuracy.
- He clarified that mistakes are due to model stochasticity, not strictly because of tags.
- Pandoc emerges as preferred tool for HTML to Markdown conversion: A user suggested using Pandoc to convert HTML to Markdown for parsing, recommending it over hack jobs using tools like awk.
- Pandoc is described as a purpose-built and widely-used tool.
- HTML tags are noisy tokens: A member noted that HTML tags add noisy tokens that can be good for reasoning, while another agreed that they are additional context fillers.
- If you are just pasting one website into ChatGPT to ask questions about its content, then it doesnāt really matter⦠if you are making some sort of pipeline and paying for each token, maybe it does.
- Users seek prompts for long-form responses from o3 models: A user asked for a prompt to get long-form responses from o3 and o3-pro when using them to review files or perform in-depth research on a topic.
- The user noted that both models tend to be concise, even when instructed otherwise.
HuggingFace ā· #general (210 messagesš„š„):
Model Underfitting and Overfitting, Hugging Face's Definition of Open Source Models, Interpretability of Transformers Models, HF Spaces, Qwen 2.5 and Multilingual capabilities
- Diagnosing Model Fitting with TensorBoard Graphs: Members discussed using TensorBoard loss graphs to diagnose model fitting, noting that the evaluation loss should decrease at a similar rate to the training loss, but should not be lower.
- One member emphasized the importance of dividing the dataset into training and testing parts to ensure the model has generalized well without overfitting or underfitting.
- HF Faces Scrutiny over Open Source Model Definition: Concerns were raised that some models on Hugging Face arenāt fully open source, potentially using the platform for marketing rather than genuine open collaboration.
- One member pointed out that while Hugging Face doesnāt explicitly brand itself as an open source library, its reputation leans that way, while another mentioned that any repository can use any license.
- Vision Model Interpretability Hotspots: Members seek assistance with visualizing attention maps over images using vision models like LLaVA to achieve model interpretability.
- They asked whether anyone has experience with interpretability or explainability of transformers models.
- Space Sleepiness Addressed: Members discussed how to set sleep time for a HF Space using
HfApi
on thehuggingface_hub
library, and to put your Space to sleep after 1h of inactivity.- Note: if you are using a ācpu-basicā hardware, you cannot configure a custom sleep time. Your Space will automatically be paused after 48h of inactivity.
- Qwen 2.5 Claims Multilingual Crown: Members noted Qwen 2.5ās ability to speak 100 languages and compared it to Gemma3, with others highlighting it uses the Linux VM so well.
- There was speculation that with 18T tokens, the model contains substantial multilingual data, contributing to its proficiency.
HuggingFace ā· #cool-finds (3 messages):
ā
- No Cool Finds Uncovered: No interesting cool finds to report in the provided message history.
- Channel is Quiet: The channel activity appears low, with only a few messages related to user notifications and a general request.
HuggingFace ā· #i-made-this (5 messages):
X Scraper, Digital Twin AI Platform, Augmentoolkit 3.0 Release, Field Injection Attacks
- X-Scraper Open API Endpoints Debut: An X-scraper with open API endpoints has been created and is available for use, with sample datasets found on its Hugging Face organization page.
- The data is available and free for anyone building AI models, agents, or applications.
- CloneMe Platform Launches Digital Twin Toolkit: The CloneMe AI platform lets you build your digital twināan AI that chats like you, remembers details, and supports multiple platforms.
- Itās customizable, memory-driven, and hot-reloadable, making it a robust toolkit for creating intelligent, dynamic AI personas.
- Augmentoolkit 3.0 Augments AI Training: Augmentoolkit 3.0 has been released, enabling users to train AI to understand new subjects by simply adding documents or teaching it to perform tasks through rating attempts.
- It facilitates custom model runs, which are cheaper and provide greater control over update timing and methods.
- Field Injection Attacks Analyzed: A detailed article on Field Injection Attacks and their potential impact on MCP servers and systems has been written and shared on LinkedIn.
- The article explains how such attacks can compromise MCP servers and systems.
HuggingFace ā· #reading-group (2 messages):
Paper presentations
- Clarification on paper presentation types: A member inquired whether the presentations are for presenting other papers or their own.
- Either works for presentations: Another member responded that either works.
HuggingFace ā· #computer-vision (6 messages):
Kaggle Datasets, Gemini 2.5 Pro Deprecation, Open Source LLM/VLM Alternatives, Mistral LLM as an Alternative
- Kaggle suggested for dataset discovery: A member suggested checking Kaggle for datasets, referring to it as the best bet for finding them.
- This suggestion was in response to another memberās request.
- Gemini 2.5 Proās Deprecation Pushes Open Source Search: A member reported the impending deprecation of Gemini 2.5 Pro and the inferior performance of its replacement, creating a need for robust, open-source LLM/VLM alternatives for their product.
- The member desires a system resistant to corporate whims and believes anything below 70B trained parameters might be insufficient.
- Mistral LLM Proposed as Gemini Alternative: A member suggested that Mistral LLM might be the closest open-source alternative to Gemini, but cautioned about expecting the same level of performance when running it locally.
- They suggested prompt engineering could act as a shim to mitigate performance differences.
HuggingFace ā· #NLP (2 messages):
GPTs Agents, OpenAI's sidebars
- GPTs Agents cannot learn after initial training: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
- Another member cleared this misunderstanding, explaining that uploaded files are saved as āknowledgeā files for the agent to reference when required, but they do not continually modify the agentās base knowledge.
- OpenAI Platformās sidebars changed: Some members had a discussion about changes in the sidebars of platform.openai.com.
- One reported that two icons disappeared from the sidebar** (one for threads and another one for messages).
HuggingFace ā· #agents-course (10 messagesš„):
Agents Course Sign-Up Link, FinalAnswerTool.forward() Error, Course Completion Deadline
- Agents Course Sign-Up Link Broken?: A member reported the sign-up link for the course appeared to be broken, but it seems to be working now!.
- FinalAnswerTool.forward() Error Haunts Tool Calling Agents: A member encountered a
FinalAnswerTool.forward() missing 1 required positional argument: 'answer'
error when working with Tool Calling agents.- The user expressed frustration, stating This is maddening.
- Deadline Dilemma for Agents Course and MCP Course: A member starting the Agents course with a deadline of July 1 and the MCP course with a deadline of August 1 expressed concern about being bogged down.
- The member asked which course to choose and whether it mattered, implying time constraints may force a choice between the two.
Manus.im Discord ā· #general (208 messagesš„š„):
Manus Outage, Veo3, Manus playbooks, Claude 4.0 waiting, Manus credits
- Manus Meltdown Caused by Veo3 Rush?: Users reported widespread issues with Manus, suspecting the Veo3 announcement overloaded the servers, as confirmed by Downdetector.
- Playbooks Primer Prep Prompts Preemptively: Playbooks in Manus prepare prompts and give output examples, bridging the gap for users needing prompt assistance, but they are also intended to highlight creative workflows.
- Claude Craze Community Clamors Constantly: Users expressed eagerness for Claude 4.0, drawing humorous parallels to fan anticipation, but there was no official news or update from Google regarding Claude 4.0 release, but a partner did suggest to make new gmail and sign up for the google one ai trial -> start a family -> invite 5 accounts -> 5x usage now for veo and everything bc all accounts get separate usage limits.
- Credit Crunch Costs Concerns Continue: Users voiced concerns over credit usage, particularly regarding optimization and lack of cost previews, with some suggesting the bring your own keys model and one user saying every task I spin up is about 900-1000 credits for me.
- Image Imperfection GPT Generates Greatness: A user posted that image generation quality between Manus and GPT-4 Omni were compared, showing GPT-4 Omni outperformed Manus.
GPU MODE ā· #triton (4 messages):
Triton Kernel Optimization, Convolution Implementation, Memory Reads in Triton
- Kernel Sharing Request Leaps!: A user requested the
solve
function utilizing a grid, which led to a discussion on Triton kernel optimization.- The code involves pointers for input, kernel, and output, calculating output size, blocks, and launching a
conv1d_kernel
with specified block size, intending to discuss the optimization of convolution operations in Triton.
- The code involves pointers for input, kernel, and output, calculating output size, blocks, and launching a
- Tritonās Memory Read Mystery Explained: A user inquired about the increased memory reads in the Triton kernel (4096 reads per block for a kernel size of 2048) and why itās still faster.
- The author requested clarification on what the user meant by āton more reads for kernel size 2048ā, initiating a discussion about memory access patterns and optimization within Triton.
GPU MODE ā· #cuda (10 messagesš„):
Blackwell Memory Layout Optimization, CUDA-side Boost, VS Code Plugins for CUDA Development, L1 and L2 Cache Policy
- Blackwell Memory Layout Library Sparked: A member inquired about a Blackwell library designed to optimize memory layout across the L2 cache.
- Another member asked the original poster about how they attempted to set the policy for both L1 and L2 caches.
- CUDA-side Boost Library Boosts Development: A member shared a link to cuda-side-boost, a library for CUDA development.
- The member noted that replacing the entire PyTorch memory allocator is probably overkill, and one could use MemPool in PyTorch.
- VSC Plugins Variety Voyaged: A member asked about suggested VS Code plugins for CUDA development, besides Nsight.
- Another member suggested PTX syntax highlighting and CMake integration, for easy debugging.
- Cache Policy Code Snippet Shared: A member shared a CUDA code snippet demonstrating how to create and use a cache policy object.
- The code snippet includes assembly instructions for creating a fractional L2 cache policy with eviction strategies and using it in a load instruction, including
createpolicy.fractional.L2::evict_last.L2::evict_unchanged.b64
.
- The code snippet includes assembly instructions for creating a fractional L2 cache policy with eviction strategies and using it in a load instruction, including
GPU MODE ā· #torch (3 messages):
Torch.compile speedup, Knowledge Distillation with torch.compile, PyTorch CMake and CUDA Architecture Selection
- Torch Compile surprisingly speeds up convolution kernel: A member observed that operations generated from torch.compile tend to run faster, even when nothing is being fused, noting a significant speedup in a convolution kernel from 1.7489 ms (Native PyTorch) to 0.0020 ms (Compiled PyTorch).
- The compiled version calls extern_kernels.convolution instead of aten.convolution, leading to questions about why the stock convolution doesnāt use these faster external kernels.
- Torch Compile faces challenges in Knowledge Distillation: A member inquired about setting up torch.compile for knowledge distillation, particularly with a large teacher model (e.g., resnet50) in eval mode and a smaller student model (e.g., resnet18) in training mode.
- They encountered runtime errors related to tensor overwriting, specifically with the error message indicating the need to clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.
- PyTorch CMake unconditionally ignores CUDA architecture selections: A member reported being affected by PyTorchās CMake script, specifically a line that unconditionally ignores user-supplied CUDA architecture selections, causing code breakage due to assumed access to cuda::atomic.
- They questioned the relevance of a comment about not relying on CMake version 3.18 and suggested guarding the problematic lines based on CMake version and the absence of user-supplied architecture selections for backward compatibility.
GPU MODE ā· #beginner (2 messages):
Post-PMPP reading recommendations, Instruction Latency Paper
- Members Seek Reading Material Post-PMPP: A member asked for recommendations on books or papers to read after completing PMPP.
- Another member responded by suggesting a paper on instruction latencies.
- Instruction Latencies Paper Recommended: A member suggested reading a paper on instruction latencies despite the fact that instruction latencies might be outdated.
- The member suggests that the discussion itself is worth a read.
GPU MODE ā· #rocm (2 messages):
ROCm 7 Access, ROCm Release Date
- Eagerly Awaiting ROCm 7 Access: A user inquired about how to gain access to ROCm 7, anticipating its release in August.
- Patience Advised for ROCm 7: Community members suggest waiting for the official release announcement from AMD for details on accessing ROCm 7.
GPU MODE ā· #self-promotion (2 messages):
GemLite ROCm support, NVIDIA MMA Instruction, Tensor Cores in Mojo, Custom MMA instructions via LLVM intrinsics, Data layouts with Mojo's load_matrix and store_matrix APIs
- GemLite Gets ROCm Ready: A developer announced the addition of ROCm support to GemLite, with a focus on the MI300X (post on X).
- Mojolicious Tensor Core Programming: A blog post explores NVIDIAās mma instruction and leveraging it in Mojo, teaching users to use Mojoās mma API (blog post).
- The post details implementing custom mma instructions via LLVM intrinsics, and efficiently managing data layouts with Mojoās load_matrix and store_matrix APIs (github repo).
GPU MODE ā· #submissions (3 messages):
H100 Conv2D, AMD FP8 MM, MI300, Leaderboards
- H100 Speeds Conv2D Leaderboard: A member achieved 4th place on the
conv2d
leaderboard on H100 with a time of 47.8 ms.- The same member also had a successful submission on H100 with a time of 187 ms.
- MI300 Enters AMD-FP8-MM Fray: A member achieved a successful submission on MI300 for the
amd-fp8-mm
leaderboard with a time of 5.23 ms.
GPU MODE ā· #hardware (1 messages):
CUDA 12.3, CC 10.3
- CUDA Confirms CC 10.3 with B300: NVIDIAās CUDA Toolkit Release Notes confirms that CC 10.3 is supported for B300.
- Another B300 confirmation: Another confirmation on B300 support in CUDA.
GPU MODE ā· #factorio-learning-env (111 messagesš„š„):
FLE standalone docker image, Factorio TAS Generator integration, RL policy learning in Factorio, LLM vs RL for Factorio
- FLE Standalone Docker Image Sees First Light: A member created a POC project for a standalone FLE docker image and mod, but encountered challenges integrating it into the main codebase.
- Another member tested the setup and reported that it worked on their system, while another encountered a desync error when joining the multiplayer game.
- Factorio TAS Generator Steps into the Lab: A member mentioned a Factorio mod that records steps for the Factorio TAS Generator application, used to generate steps.lua files for automated gameplay.
- The discussion touched on a user who hand-wrote 35,453 steps for a Tool Assisted Speedrun of Factorio, highlighting the motivation behind creating tools like Factorio TAS Generator.
- Code-as-policy: A member suggested code-as-policy as a potentially faster way to build a full RL loop on top of that abstraction, emphasizing heavy reward shaping.
- Code-as-policy is where you use program synthesis as the action, shape the rewards heavily, and build a full RL loop on top of that abstraction is faster.
- LLM vs RL in Factorio Throwdown Begins: Members discussed the potential of using RL-based AI to play Factorio, debating whether an LLM is necessary for long-term planning and complex tasks.
- The conversation explored whether an RL agent could achieve optimal Factorio play with a limited amount of gameplay, drawing comparisons to OpenAI Fiveās success in Dota 2.
- Navigating the LLM-RL Spectrum: A discussion emerged around the use of LLMs as āhuman prior knowledge multipliers,ā suggesting that tuning the scaffolding to get basic things going might be better than entirely DIY approaches or RL with limited compute.
- A paper integrating RL to improve mostly-LLM systems was shared, highlighting a trade-off between sample efficiency and long-term capabilities.
GPU MODE ā· #amd-competition (36 messagesš„):
AMD Conference, Meeting at Workshop 202, Award Ceremony Timing, Departure from Conference
- AMD Conference Attendees Convene: Attendees at the AMD conference arranged to meet at the lunch area and Workshop 202 (Room 212C-D).
- One member wearing red clothing and glass offered to meet outside the room.
- Fireside Chat Snafu: A member looking for a meeting initially found no one there, later clarifying they were in the back of the room for the fireside chat.
- The same member clarified that others need to be pinged to notify them.
- Official Photo of AMD event in Limbo: Attendees posted images of the event, but one member asked Does anyone know where to find the official photo link?.
- No link was provided in the chat.
- Conference Attendees Begin Departure: Members stated they were flying back, with one flying to Paris @ 3 pm and another inquired about a flight to Munich @ 2pm.
- Members expressed appreciation for the opportunity to meet each other at the conference.
GPU MODE ā· #cutlass (6 messages):
New Cutlass DSL learning resources, Sm90ScalarReduction applicability, CuTeDSL support for distributed shared memory
- New Cutlass DSL resources requested: A member inquired about new learning resources for the new Cutlass DSL, particularly videos from this yearās GTC.
- The user is probably trying to learn about the best way to learn cutlass for their project.
- Sm90ScalarReduction examined for column reduction: A member considered Sm90ScalarReduction for their problem, initially thinking it could solve their issue, which involves a maximum of absolute values per column (chebyshev).
- They later realized that Sm90ScalarReduction doesnāt exactly fit their needs, suggesting that a hypothetical Sm90ColumnReduction would be more appropriate.
- Distributed shared memory with CuTeDSL questioned: A member asked whether CuTeDSL now supports distributed shared memory.
- Their project requires a reduction operation between threadblocks, and they are seeking the easiest way to implement it, implying interest in CuTeDSL for this purpose.
Unsloth AI (Daniel Han) ā· #general (72 messagesš„š„):
AMD GPUs, Patchscopes Google framework, Unsloth Youtube Channel, MLflow issues, Unsloth Swag
- AMD GPUs get Unsloth support: The Unsloth team might start taking AMD seriously with the new AMD INSTINCT MI355X GPU having 5x the FP8 flops as the H100 and they presented at the AMD AI conference.
- Members noted that AMD is cheap and has high memory, and also questioned AMDās driver support.
- Patchscopes from Google: Members shared a link to Patchscopes, a framework from Google.
- One member mentioned wanting to see if it works with models like LLaVA while another is fine-tuning Qwen 2.5 7B (using Unsloth) and needs a small French math dataset.
- Unsloth to create YouTube channel?: The Unsloth team is thinking of making a YouTube channel to upload videos.
- One member specifically asked them to upload a video on how to use multiple GPUs with accelerate and promised to like and subscribe.
- Multi-GPU support surfaces: There are reportedly 5 different repos for multiGPU support, with this Reddit thread being one example.
- Official support is still being worked on and is expected to be really good.
- MLflow model loading gotchas: A user ran into issues with their fine-tuning pipeline when loading a model from MLflow instead of Hugging Face, despite using the exact same config, hyperparameters, and pipeline.
- They observed the loss hovering around 3ā4 instead of approaching zero, even after doubling the size of the training dataset and sought help to debug or fix the issue.
Unsloth AI (Daniel Han) ā· #off-topic (9 messagesš„):
80GB VRAM Typo, GPU RAM
- Internet Suffers from 80GB VRAM Typo: Users debated whether a spec listing 2TB of RAM was a typo, with the assumption that it should have been 80GB VRAM.
- Some suggested it wasnāt a typo, but a lazy assumption that readers would understand it referred to VRAM when advertising a GPU.
- PC lists GPU RAM as 80GB: A user reported seeing a PC listing specifying GPU RAM: 80GB.
- Another user responded with skepticism, stating āAint no wayšā.
Unsloth AI (Daniel Han) ā· #help (59 messagesš„š„):
Fine-tuning Llama 3.2 for tool calling, AttributeError during training with Unsloth, Qwen3 (4B) fine-tuning issues with Unsloth GRPO, Accelerate with Unsloth for multi-GPU, Fine-tuned model inference speed
- Leveraging GPT-4 for Llama 3.2 Tool-Calling Fine-Tuning: A member plans to use GPT-4 to generate example conversations and synthetic samples for fine-tuning Llama 3.2 (3B) for 6 custom tools, seeking guidance on the approach and number of examples needed for zero-shot tool calling.
- It was pointed out that using tools in the middle of a conversation requires at least a 14B parameter model, as Llama 3.2 models are subpar.
- Tackling AttributeError in Unsloth Training: A user encountered an AttributeError during training with Unsloth, traced to the
fetch_image
function attempting to read aNone
images field instead of a valid path or URL.- It was suggested that if its a batch, the whole batch needs to contain images and text, or only text; a solution was to try batch size 1 or pass a custom collator.
- Navigating the Pitfalls of NaN Loss: A user reported encountering
nan
loss during GRPO training, seeking solutions after a previous SFT fix failed.- It was suggested to reduce the training rate and check for specific problematic datapoints causing the issue, also to ensure compatibility between the notebook and the loaded 4-bit model.
- Quenching the Thirst for Quick TTS Iteration: A user sought a quick way to integrate a coding assistant with their R/RStudio workflow, using Qwen2.5 Coder 32B Instruct GGUF and being unsure how to make no_think the default.
- It was suggested that they create a new model based on Qwen3 and set no_think from there, also considering that non-instruct models might be more suitable for that kind of work.
- Shielding Models from Hallucinations: After successfully fine-tuning a model, a user asked how to prevent the model from responding when the input is out of context and also preventing hallucinations.
- The suggestion was to use grounding or guardrails to address these issues, though a specific guide was not provided.
Unsloth AI (Daniel Han) ā· #showcase (4 messages):
French math dataset, Qwen 2.5 7B
- Seek French Math Dataset for Qwen 2.5 7B: A member is fine-tuning Qwen 2.5 7B (using Unsloth) and looking for a small French math dataset.
- Another member suggested using a regular math dataset and use an AI to translate it since there might not be much available in French.
- Translate Math Dataset with AI: Translation with an AI model can be used to translate a math dataset in order to increase the amount of training data for Qwen 2.5 7B.
- Using this approach can sidestep the need to find a native French math dataset which can be difficult.
Unsloth AI (Daniel Han) ā· #research (27 messagesš„):
Attention map visualization in VLLM models, KL divergence pitfalls in RL training for LLMs, Issues with River Crossing experiments, Claude Opus as a paper author, nnsight.net
- Attention Map Visualizations Sought for VLLM Models: A member inquired about visualizing attention maps over images using VLLM models like LLAVA, seeking tools or experiences with transformer model interpretability.
- Another member suggested nnsight.net as a potential starting point, while acknowledging the need for custom implementations.
- KL Divergence Gradient Estimation Flaws Exposed: A paper was shared discussing pitfalls in gradient estimation for KL divergence in RL training for LLMs, highlighting issues in open-source projects like TRL and Open Instruct and papers such as GRPO.
- The paper points out that differentiating through the KL estimate as loss functions and not accounting for the sequential nature can lead to incorrect KL gradients, referencing this paper.
- Appleās Reasoning Model Riddled with River Crossing Errors: A paper titled The Illusion of the Illusion of Thinking was shared, criticizing the evaluation of AI models in River Crossing experiments for inadvertently penalizing models that correctly identify unsolvable problems.
- The original paper by Apple had instances with N ā„ 6 actors/agents using boat capacity b = 3, which is mathematically impossible.
- Claude Opus Achieves Academic Acclaim as Paper Author: A member humorously noted the unexpected situation of Claude Opus being listed as a paper author.
- It was joked that we are anthropic and we cant let that stand.
- More hilariousness emerges: Another funny link appeared in the chat https://arxiv.org/abs/2506.10943.
Latent Space ā· #ai-general-chat (101 messagesš„š„):
nano-vLLM release, Morph Labs Trinity, o3-pro tools discussion, AI agent building mistakes, Transformers library deprecation
- Nano-vLLM gets Nano-ticed: DeepSeek researcher @xingkaiyu released nano-vLLM, a minimal vLLM implementation of approximately 1200 lines of code that sparked excitement among AI/ML practitioners, found at this link.
- The community appreciates its concise nature as a valuable learning resource, with one user expressing interest in hacking on the ānano monolithā.
- Trinity autoformalizes Fermatās Last Theorem: Morph Labs introduced Trinity, an autoformalization system used to formalize de Bruijnās result on the abc conjecture in Lean, available at this link.
- It aims to create verified training environments for self-supervised reinforcement learning in mathematics by converting mathematical knowledge into formal proofs.
- Transformers Library goes PyTorch only: The Transformers library will deprecate TensorFlow and Flax support, focusing solely on PyTorch to reduce bloat, simplify the toolkit, and remove abstraction layers as mentioned here.
- Long-term support (LTS) for TF and Flax will continue with v4 until mid-2026, and this change marks the beginning of v5, aiming to remove 50% of the code.
- Meta AI App Shares Private Convos: A Meta AI app inadvertently posted usersā private conversations, including sensitive information and audio, to a public feed which is linked here.
- Itās clarified that users are accidentally sharing content due to a confusing UI, exposing personal details and raising ethical concerns for Meta.
- Anthropic Explores Multiagent Mayhem: Anthropic found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval, according to this post.
- They also found that multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds single context windows, and interfacing with numerous complex tools but burns through tokens fast, at about 15Ć more tokens than chats.
Latent Space ā· #ai-announcements (4 messages):
AI Engineering World's Fair 2025, Latent Space Podcast, AI Conference Recap, Documenting AI progress
- Latent Space Podcast Recaps AI Engineering Worldās Fair 2025: The Latent Space podcast account shared a recap of the AI Engineer Worldās Fair 2025, highlighting statistics on attendees, speakers, workshops, and side events.
- It encourages attendees to publish their own takeaways and learnings from the conference, emphasizing the rapid pace of change in AI and the importance of documenting new beliefs and connections.
- X-Ware.v0 Posts AI Engineering Worldās Fair 2025 Recap: Red - X-Ware.v0 posted a recap of the AI Engineering Worldās Fair 2025.
- The recap is available at xcancel.com.
aider (Paul Gauthier) ā· #general (76 messagesš„š„):
mcpm aider troubles, github.com/Aider-AI/aider/pull/393, turn off auto updating, comparison b/w aider and cline, OpenAI
- mcpm Aider Troubles with GitHub Copilot: Users reported troubles with mcpm-aider when using GitHub Copilot, leading one user to fork the project.
- One user followed up saying that they got it to work, despite the errors, describing it as stupid but eh.
- Staying on a specific fork of Aider: A user inquired about how to disable auto-updates to remain on a specific fork of Aider.
- Another user provided the solution: use the
--no-check-update
flag or set theAIDER_CHECK_UPDATE
environment variable to false, as documented here.
- Another user provided the solution: use the
- Aider excels with smaller models: A user expressed appreciation for Aiderās performance with smaller models (8B and 12B) using Ollama, noting that it works surprisingly well compared to other tools.
- Another user noted that it works because of context.
- Aider Ranks High on JS Leaderboard: Based on this benchmark, one user noticed that Aider performed really high on the JS leaderboard, specifically because of Aiderās repomap.
- Another mentioned that they do all their JS coding with Aider. To them, the flexibility, transparency and quality provided by Aider is unmatched, especially when the LLM is not smart enough to be the real agent for me yet.
aider (Paul Gauthier) ā· #questions-and-tips (20 messagesš„):
uv for Python dependency management, Aider and library versions, Aider costs with Anthropic, Aider context size with Ollama, max_input_tokens
- UV Embraced for Python Dependency Victory: Members discussed migrating from pip/virtualenv to UV for Python dependency management, avoiding direct pip usage and pyproject.toml edits, using commands like
uv add <dependency name(s)>
.- A user noted their initial laziness about reading the manual but found it much tighter to define linting instructions in the YAML configuration.
- Aiderās Library Version Awareness Adventure: A user questioned how to improve Aiderās awareness of library versions, noting initial suggestions of outdated options when migrating from pip/virtualenv to Poetry.
- Suggestions included providing context via URLs to updated man pages, explicitly stating versions in conventions, and using the
/read docs/spec.txt
command.
- Suggestions included providing context via URLs to updated man pages, explicitly stating versions in conventions, and using the
- Costs are Managed When Using the Anthropic Model: A user inquired about cost management when using Aider with Anthropic, expressing concerns about potential hourly costs of nearly $50 for large changes.
- The user also mentioned the Claude Code monthly plan running out quickly, likely referring to exceeding the usage limits of the plan.
- Ollamaās Context Size Corrected for Aider: A user reported a discrepancy where Aider claimed a context window size of 131,072 tokens while Ollama was set to 8k max context.
- The solution involved adjusting the max_input_tokens setting in Aiderās configuration, as linked in the Aider Documentation.
- max_input_tokens Clarified and Victorious: A user initially struggled with configuring separate max tokens for input and output in Aider, particularly regarding the display of remaining tokens.
- After clarification, the user understood the difference and confirmed the solution involved properly setting the max_input_tokens parameter.
Nous Research AI ā· #general (44 messagesš„):
Decentralized Compute Marketplaces, Vast.ai, Decentralized Pre-training vs Post-training, Infiniband, DAWN Internet
- Vast.ai is relatively cheap provider: A member suggested Vast.ai for decentralized compute, noting that itās relatively cheap compared to other providers, despite surrendering some reliance.
- Akash was also mentioned as a potential alternative, though Vast.ai was noted to be even cheaper.
- Portal chat interface has error: A member reported receiving an error when trying to use the chat interface in Portal and shared a screenshot of the error.
- The frontend team is investigating the issue, and a suggestion was made to try accessing the chat in an incognito window.
- Decentralized pre-training vs post-training is next glorious infra: A member mentioned that they are setting up infrastructure for decentralized training and are also doing pretraining at psyche.network.
- They also noted that distributed training will improve with GPU diffusion and better networking.
- Infiniband bandwidth eclipses the internet: A member pointed out that the internetās bandwidth (around 1gbps) hasnāt increased much in recent years, while Nvidiaās latest Infiniband iteration reaches 130TB/s.
- This bandwidth disparity is a growing problem.
- DAWN Internet offers decentralized internet: A member plugged DAWN Internet, a decentralized broadband protocol that provides gigabit internet using fixed wireless rooftop antennas.
- Their new WiFi router includes a GPU capable of supporting RL and more information can be found here.
Nous Research AI ā· #ask-about-llms (1 messages):
lazeewhalee: maybe refer to the R1 deepseek and its references?
Nous Research AI ā· #research-papers (2 messages):
C. Opus First Publication on Arxiv
- C Opus Posts Arxiv Publication: Teknium shared a post.
- A member expressed surprise that this was C. Opusās first publication on Arxiv.
- C Opus Arxiv debut: A member noticed that C. Opusās first publication appeared on Arxiv.
- Teknium shared a link to the announcement post.
Nous Research AI ā· #interesting-links (6 messages):
NVIDIA Cosmos, Talk at WebSummit, ArXiv Papers
- NVIDIA Launches Cosmos: NVIDIA launched Cosmos, with the ArXiv paper available at https://arxiv.org/abs/1706.03762.
- Short Talk at WebSummit: A member shared a short talk given during WebSummit in Vancouver, Canada, half history, half rant, re: closed internet/closed AI.
- Thereās also a link to this tweet for more context.
- New ArXiv Paper: A member shared a new ArXiv paper at https://arxiv.org/abs/2506.10943.
Nous Research AI ā· #research-papers (2 messages):
C. Opus Arxiv Publication, teknium WesRothMoney X Post
- Teknium Links WesRothMoney X Post: A member shared a link from WesRothMoney on X.
- The context and content of the X post were not discussed further.
- C. Opus Makes Arxiv Debut: A member expressed surprise that the publication on Arxiv was the first for C. Opus.
- No further details about the publication were provided.
Notebook LM ā· #use-cases (7 messages):
Mind Map Creation, Notebook LM Plus Access, Sublevel Details
- Mind Map Masterpiece Made: A member created a mind map from 115+ sources, claiming it was pretty accurate and resulted in a huge mind map to summarize all key aspects.
- Another member expressed interest in learning more about it.
- Paid AI Pro Problem Persists: A member using paid AI Pro is still unable to access Notebook LM Plus, and asked for any ideas why.
- No solutions were provided in the channel.
- Mind Map Mining Methodology: A member asked about the number of sublevels in a mind map based on 1900 sources.
- The response indicated that the map had 4 sublevels, with the user expressing satisfaction with the vertical density but noting room for improvement horizontally, and linked to image.
Notebook LM ā· #general (39 messagesš„):
Excel Files in NotebookLM, Mobile App Notes, Image Support, Sharing Notebooks, Podcast Interrupt Feature
- Excel Files Missing from NotebookLM: Users are requesting Excel and Google Sheets support in NotebookLM, but there is currently no support or roadmap for this feature.
- The feature request channel is suggested for users to express their interest.
- Mobile App Notes are Limited: Notes are available on the desktop version of NotebookLM, but the mobile app only shows sources, chat, and studio sections.
- Users can access notes on mobile via the browser instead of the app; thereās no export option, but users can copy and paste.
- Image support rollout isnāt Universal: Some users can upload .jpg and .png files as sources, but others cannot, and there is no official announcement about this featureās rollout.
- A workaround is to put images into a Google Doc or Slide and then download it as a PDF for use in NotebookLM.
- Sharing notebooks impossible due to grayed out button: Users are experiencing issues with sharing notebooks, as the āShare publiclyā button is grayed out and unclickable.
- The cause of this issue is unknown.
- NotebookLM is just using LaTeX markups: Users are seeing LaTeX markups when NotebookLM generates math formulas.
- This is normal, as NotebookLM and other LLMs use LaTeX for mathematical expressions.
tinygrad (George Hotz) ā· #general (10 messagesš„):
Jacobi Method SVD, Sign Error in SVD, Modern SVD Algorithms
- Jacobi Method Leads to SVD Mismatches: A member encountered mismatches using the Jacobi method for SVD, specifically in the signs of the elements, with a max absolute difference of 1.8523843.
- The user separated eigh() and svd() for testing purposes due to size issues.
- SVD Sign Error Deemed Insignificant: A member suggested that the sign error in SVD results isnāt fundamentally important, as long as the equation A = UĪ£VT holds true.
- The member acknowledged wanting parity with NumPyās performance but doubted its feasibility on tinygrad.
- Jacobiās Method Outdated: The discussion highlighted that Jacobiās method may not be the modern algorithm used for SVD, and is only for symmetric matrices.
- It was mentioned that NumPy uses a variant of the QR Algorithm under the hood, with Graham-Schmidt being inaccurate for full_matrices = True.
tinygrad (George Hotz) ā· #learn-tinygrad (34 messagesš„):
BEAM linearizer failures, Float Matmul Accuracy Discrepancy (NumPy vs. Tinygrad), linalg.svd PR, QR algorithms variance, Numpy defaults to float64
- Beam Me Up (No Linearizer Failures): A user inquired about experiencing linearizer failures when running with BEAM, and another user confirmed they were also encountering the same issue.
- No specific solution or cause was identified in the provided context.
- Tinygradās Floating Point Faux Pas?: A user noticed a discrepancy in the accuracy of float matmuls between NumPy and Tinygrad, specifically highlighting a difference in the bottom left corner value of the output matrix in this code.
- Discussion ensued regarding the impact of different compilers, optimization techniques, and the IEEE 754 standard on floating-point operations, with some suggesting that minor numerical drifts are expected and can be influenced by factors like the order of operations and the use of float64 in NumPy by default.
- SVD Sign Slip-Up?: A user working on a PR for linalg.svd was trying to achieve comparable accuracy to NumPy, but found that they were getting the same values with different signs, and was worried about whether or not the sign error was acceptable.
- Another user advised them to set
DEBUG=4
to inspect the kernel code, noting that loop unrolling can introduce numerical differences; they suggested settingNOOPT=1
to disable unrolling for closer results.
- Another user advised them to set
- QR Quandaries: A user discovered variance with QR algorithms and the discrepency between Householder Reflections vs Gram-schmidt process.
- The user found an even larger variance when compared to the LAPACK package that NumPy uses for Eigen-value calculations, exclaiming honestly just wasting a bunch of time on this.
- NumPyās Numerical Nuisance: float64 Default Debacle: A user suggested explicitly creating NumPy arrays with
dtype=np.float32
to address discrepancies in results, noting NumPyās asinine default tonp.float64
.- Another user countered that defaulting to float64 is common in numerical applications outside of machine learning, and that changing the default can cause unrelated things to break.
Modular (Mojo š„) ā· #mojo (38 messagesš„):
Map Variadic Types, MLIR Type Synthesis, Magic to Pixi Migration, GPU Puzzles Discussion, Mojo C ABI Export
- Mapping Variadic Types Remains a Challenge: Members discussed the challenges of mapping variadic types in Mojo, referencing a forum post and agreeing it feels like a simple extension but may require a more dynamic type system.
- One suggestion involved using StaticString to define the corresponding
__mlir
type, but the lack of documentation and the difficulty of supporting an arbitrary number of types were noted as significant hurdles.
- One suggestion involved using StaticString to define the corresponding
- MLIR Type Workarounds Explored: One member explored workarounds using
__mlir_type
, encountering issues with undocumented MLIR and the inability to synthesize the MLIR type for a given type parameter as a raw string.- The member suggested that if one could extract and modify the MLIR type at compile time, it might be possible to work around the type definition hurdle using UnsafePointer and
init_pointee_move
.
- The member suggested that if one could extract and modify the MLIR type at compile time, it might be possible to work around the type definition hurdle using UnsafePointer and
- Painless Magic to Pixi Migration: A user described their painless migration process from
magic
topixi
, involving removing the~/.modular
directory and rewritingmojoproject.toml
files.- The user shared a
pix.sh
script for updating and cleaning the cache, noting that it created a newpixi.lock
and.pixi
folder, with a recommendation to delete the old folder once tests pass.
- The user shared a
- GPU Puzzleās Edge Cases: A user questioned the necessity of host-side synchronization in a GPU puzzle, referencing a specific section and suggesting that if
DeviceContext
uses a CUDA stream, synchronization might be automatic.- It was confirmed that
DeviceContext
does use a CUDA stream, and the puzzle description will be adjusted to reflect that explicit synchronization is not required in that case.
- It was confirmed that
- Mojo Exports via C ABI: A user asked about calling Mojo from C/C++.
- Another user clarified that Mojo can export C ABI compatible functions with
@export(ABI="C")
, allowing for the creation of object files or shared libraries.
- Another user clarified that Mojo can export C ABI compatible functions with
MCP (Glama) ā· #general (18 messagesš„):
MCP server usage tracking, Service workers for MCP monitoring, GitHub MCP server, Taskerio agent progress inbox, MCP inspector issues
- Tracking MCP Server Usage: Mixpanel & PostHog Still Recommended?: Members discussed using standard monitoring/analytics tools like Mixpanel and PostHog for tracking MCP server usage, particularly in the context of APIs and web apps.
- Service Workers: The āBackend in a Frontendā for MCPs: A member suggested leveraging service workers to monitor incoming communication from servers in the background, even when the application is idle, thus acting as a ābackend in a frontendā.
- GitHub Launches Remote MCP Server for Live Context Access: GitHub PM announced the release of a remote GitHub MCP server, enabling any MCP host to access live GitHub context without local setup, detailed on Reddit.
- Taskerio Launches Inbox for Coding Agent Progress Tracking: Taskerio launched a stealth mode product, an inbox for coding agents to report progress, offering webhooks, push notifications, and an API for real-time dashboards, as detailed on Reddit.
- Dynamic Tool Selection: GitHubās Scalable Approach to MCPs: The GitHub server employs dynamic tool selection, filtering and scoping tools based on user input or context to present the LLM with a relevant subset, even with 30+ tools available.
- The goal is to keep auth simple with one MCP server with ALL of the APIs.
MCP (Glama) ā· #showcase (5 messages):
MCP Server with Postman, SchemaPin for MCP Security
- MCP Servers now buildable with Postman: A member showcased how to build an MCP server using Postmanās new MCP builder and their APIs on their public API network, linking to the fastfs-mcp GitHub repository as an example.
- They also shared a YouTube video demonstrating the process.
- SchemaPin shields against MCP Rug Pulls: A member introduced SchemaPin, a tool designed to prevent MCP Rug Pulls and similar attacks, with the GitHub repository available here.
- The member pointed to SchemaPin.org for simple implementation methods.
LlamaIndex ā· #blog (4 messages):
LlamaCloud Stability, MistralAI Magistral support, LlamaParse Presets, Data + AI Summit 2025
- LlamaCloud Recovering After Uptime Turbulence: LlamaCloud is back online after some instability from upstream infrastructure providers, with status updates available on the official status page.
- LlamaIndex adds Support for MistralAIās Magistral: LlamaIndex now supports MistralAIās Magistral reasoning model, which can be integrated into any LlamaIndex agent workflow, as announced on Twitter.
- LlamaParse debuts user-friendly Presets: LlamaParse now features Presets, pre-configured modes that optimize settings for different use cases, and users can select between Fast, Balanced, and Premium modes to balance accuracy and speed for document parsing.
- Data + AI Summit 2025 Highlights: The Data + AI Summit 2025 concluded with plenty of content on the emerging landscape of agentic document workflows available here.
LlamaIndex ā· #general (11 messagesš„):
LlamaIndex and Mem0 integration, Cloudflare issues, Google Cloud Server problems, Mem0 graphRAG capabilities, Luma calendar for office hours
- Mem0 memory integration handles updates automatically: When using LlamaIndex with Mem0, passing
memory=memory
intoagent.run(query, memory=memory)
automatically handles memory updates, eliminating the need to manually usemem0_memory_class.add(interaction, thread_id_or_collection_name)
. - Luma Calendar considered for office hours: Due to feedback about the Discord calendarās usability for office hours, there is consideration of switching to a Luma calendar.
- The organizers are soliciting ideas, requests, and suggestions regarding the format of future office hours.
- Mem0ās graphRAG should work fine with LlamaIndex: The integration with LlamaIndex should support Mem0ās graphRAG capabilities, assuming the mem0 integration package is used.
- Cloudflare and Google Cloud Servers have Issues: Users reported having issues with Cloudflare as well as Google Cloud Servers.
Cohere ā· #š§µ-general-thread (10 messagesš„):
Xarray-JAX library, AI SaaS tool in the finance space, Cohere documentation typo
- Named Tensors enter Deep Learning Scene: A member is building the Xarray-JAX library for Google DeepMind as part of GSoC 2025 which they say is effectively the first named tensor implementation in a deep learning framework.
- AI SaaS tools to revolutionize Finance: A member is building an AI SaaS tool in the finance space as a college project and is asking how to avoid just making an LLM wrapper and actually provide real value.
- They requested suggestions for an MVP and identified real pain points in finance to solve with AI.
- Typo found in Cohere Documentation: A member believes there is a typo in Cohereās documentation.
- In python code, it should be
co = cohere.SagemakerClient()
without upper case on the āmā.
- In python code, it should be
Cohere ā· #š-api-discussions (1 messages):
Reranking Profile Details
- Reranking Profile Specifications: A member requested details about the reranking profile, specifically the number of docs, tokens per doc, and query tokens.
- There was no response from the mentioned member so no further details can be provided.
- No further discussion on reranking profiles: There were no follow up messages, so no further discussion can be summarized.
- The conversation ended after the initial question.
Cohere ā· #š-introduce-yourself (3 messages):
Full-Stack Development, AIOps, Agent AI, Python engineering, low-code/no-code agent frameworks
- Full-Stack Pro Enters the Arena: A Senior Full-Stack Developer and AIOps/Agent AI Specialist with 9+ Years of experience introduced themselves.
- He architects and delivers powerful, AI-enabled digital systems, from scalable full-stack apps to Agentic AI workflows and automation pipelines.
- Newbie of the Year Arrives: A new member named Nabeel introduced himself.
- He said he is what can be referred to as a rookie of the year!
Cohere ā· #š§-status-feed (1 messages):
GCP Outage, Cohere Status Page
- GCP Glitch Grounds Growth: Cohere reported a Google Cloud Platform (GCP) outage impacting some of their services on June 12, 2025 at 12:02PM link to status page.
- The status page indicated degraded performance in Infrastructure components, as monitored by the Cohere team Cohere Status Page.
- Cohere Monitors Malaise: Cohereās team is actively monitoring the situation status page to address the degraded performance affecting their Infrastructure components.
- The outage, which occurred on June 12, 2025 at 12:02PM, has prompted close observation and response efforts to mitigate the impact on services.
Yannick Kilcher ā· #general (7 messages):
Fast weights continual learning, O1-Pro models, Gemini Model
- Fast Weights Continual Learning: One member advocated for fast weights continual learning and external data stores to improve user control and reduce undesirable human traits in AI models.
- They expressed eagerness to see traits like scheming, frustration, and false memories removed from mainstream AI.
- O1-Pro Models Offer High Value: One member found O1-Pro/O3/O4-mini-high models valuable for learning well-documented math and computer science, while also liking their image generation capabilities.
- They also mentioned using the modelsā API for an audio transcription pipeline that works almost perfectly, though the image generation is censored.
- Gemini experiences compared to Claude: A member asked how Gemini compared to Claude.
- Another member stated that Claude has been less reliable for them but noted that all models can get things wrong and are most useful in highly verifiable domains.
Yannick Kilcher ā· #paper-discussion (3 messages):
Wavefunction schedule
- Wavefunction Discussions Take Friday Off: There is typically no Wavefunction discussion on Fridays due to limited audience participation.
- Despite the lack of scheduled discussion, community members are welcome to initiate their own.
- Wavefunction Frequency: Wavefunction discussions are typically scheduled for weekdays, excluding Fridays, due to audience participation.
- The schedule attempts to maximize engagement during peak activity periods, reflecting a preference for quality over quantity in discussions.
Yannick Kilcher ā· #ml-news (2 messages):
Nvidia, Jensen Huang, Anthropic, Dario Amodei, AI Jobs
- Huang Disagrees with Amodei on AI Jobs: A Fortune article reports that Jensen Huang (Nvidia) disagrees with Dario Amodei (Anthropic) about the future of AI jobs.
- A member speculates whether they are trying to buy the dip.
- Dario Responds to Huang: CEO Dario has responded to Jensen via X - with an update on AI Jobs.
- Shares in both companies are sharply down, as job fears continue.
Torchtune ā· #papers (10 messagesš„):
Mistral 3.1 Small, Tokenizer, Magistral, Multi-modality
- Mistral 3.1 Small Architectural Novelties still unclear: A member inquired about architectural novelties in Mistral 3.1 Small to assess fine-tuning implementation complexity, estimating a potential 2-week timeframe.
- Another member suggested that multi-modality support might be tricky but not novel, noting that supporting Mistral 3.0 would imply support for Magistral.
- Tokenizer Troubles Teased: The discussion highlighted that the tokenizer is a complicated procedure.
- However, a member clarified they were actually thinking of Magistral when referring to the tokenizer complexity.
- Torchtune Links longed for: Members expressed desire to see a Torchtune link in Magistralās Hugging Face (HF) page.
- This suggests a community interest in integrating Torchtune with Magistral for enhanced accessibility and usability.
Nomic.ai (GPT4All) ā· #general (3 messages):
Infinite Chat, Local Context Window, Ignore Feature
- Infinite Chat locally implemented: A member introduced Infinite Chat which implements locally and allows users to never run out of context window here.
- Requesting ignore Feature: A member asked about an āignoreā feature (like gitās .ignore file) to tell the embedding system to not use certain files, file-types or directories.
Codeium (Windsurf) ā· #announcements (2 messages):
Windsurf Wave 10, EU Cluster, Claude Sonnet 4
- Windsurf Waves into UI/UX Upgrades: Windsurf is wrapping up Wave 10 with a fresh slate of UI/UX upgrades and new teams and enterprise offerings, including new icons for
@-mentions
and file citations.- Codeblocks in the Cascade panel now match your IDE theme, with native terminal in Cascade panel that now accepts user inputs, plus a New Conversation History UI.
- Windsurf rolls out EU Cluster for Performance Boost: Windsurf proudly announces their EU Cluster, bringing faster performance and rising demand to European enterprises today!
- Watch the video on Youtube and join the conversation at r/Windsurf.
- Claude Sonnet 4 Lights Up Windsurf: Claude Sonnet 4 and Claude Sonnet 4 (Thinking) are now available to all paid plans via API Pricing!
- More info available on X.