a quiet day

AI News for 6/2/2025-6/3/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (218 channels, and 4892 messages) for you. Estimated reading time saved (at 200wpm): 454 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Another quiet day with a bit of Windsurf-Anthropic drama.

The AIE Day 1 Keynotes and MCP track go live in 9 hours.

AI Twitter Recap

1. AI Product Launches, Feature Updates, and Ecosystem Developments (OpenAI, Gemini, Claude, Perplexity, Bing, PlayAI, Suno, Hugging Face, Google, Anthropic, Codex, LangChain, Qwen, MLX, Holo-1, Universal Streaming, NotebookLM, etc.)

OpenAI Product Releases & Codex Rollout: OpenAI announced major updates including the rollout of Codex to ChatGPT Plus users, with internet access (off by default), generous usage limits, and fine-grained HTTP/domain controls. Codex can now update PRs, be driven by voice, and more (@sama, @gdb, @kevinweil, @OpenAI, @OpenAIDevs). Memory improvements are being made available to free users, with lightweight memory referencing recent conversations (@OpenAI, @sama). Codex’s internet access brings tradeoffs, with explicit warnings about risk (@sama).
Claude, Gemini, and Qwen Model Comparisons and Benchmarks: Claude 4 Opus and Sonnet are climbing leaderboards, with Opus #4 overall and tied for #1 in coding on WebDev Arena (@lmarena_ai), and Anthropic’s progress on SWE-bench Verified stands out (@EpochAIResearch). Gemini 2.5 Pro is cited as a daily driver by users (@reach_vb, @wightmanr), and Google announced Gemini 2.5 and Flash with audio at I/O (@DeepLearningAI). Qwen2.5-VL is recognized as a versatile foundation for agentic and GUI models (@mervenoyann), and MLX now supports new Qwen3 quantizations (@awnihannun).
Bing, Perplexity, and Search/Video Innovations: Bing Video Creator is now available globally, powered by Sora and enabling text-to-video generation (@JordiRib1). Perplexity Labs demand is surging due to Labs queries (@AravSrinivas), and its travel search is praised (@AravSrinivas). Firecrawl launched a one-shot web search/scrape API for agent workflows (@omarsar0, @LiorOnAI).
Agents, RAG, and AI Tooling: Notable agentic releases include a multi-agent financial research analyst with LlamaCloud (@jerryjliu0), Firecrawl’s new endpoint, and LangGraph app updates (@LangChainAI). FedRAG introduces NoEncode RAG with MCP (@nerdai).
Open-Source and Robotics Announcements: Holo-1, an open-source action VLM for web navigation, and WebClick benchmark were released (@tonywu_71), with Hugging Face also presenting SmolVLA for robotics (@_akhaliq; @ClementDelangue). PlayAI open-sourced PlayDiffusion, a non-autoregressive diffusion model for speech editing (@reach_vb, @_mfelfel).
Audio, Video, and Multimodal Model Capabilities: Suno released major upgrades to its music editing and stem extraction (SunoMusic), Google’s Gemini 2.5 has new native TTS in 24+ languages (@Google), and Universal Streaming speech-to-text is launched with ultra-low latency (@AssemblyAI).
NotebookLM, MLX, and Other Infrastructure: Google NotebookLM now allows public notebooks (@Google), MLX highlights dynamic quantization and QLoRA on Qwen3 235B (@awnihannun), and Cline v3.17.9 introduces task timeline navigation and CSV/XLSX support (@cline).
AI Community Events, Workshops, and Expos: AIE Expo and World’s Fair saw sold-out participation, online bonus tracks, and workshops on Gemini 2.5, agents, and evaluations (@swyx, @_philschmid). vLLM and AIBrix meetup announced in SF (@vllm_project).

2. Research, Scaling Laws, Training Dynamics, and Model Internals (GPT-4/5, RL, GRPO, RLVR, Memory, Grokking, Data Leakage, Quantization, Reasoning, Agentic Models, etc.)

Model Capacity, Memorization, and Data Leakage: Meta’s new paper establishes that GPT-style LLMs memorize about 3.6 bits per parameter, with capacity scaling linearly and implications for privacy/membership inference (@jxmnop, @scaling01). Membership inference becomes impossible as dataset size grows, and double descent occurs when dataset size exceeds model capacity.
Reinforcement Learning for Reasoning & RL Training Advances: RL for creative writing on Qwen3 32B base demonstrates significant improvements (@Grad62304977), while high-entropy minority tokens are identified as drivers for effective RL in reasoning LLMs, yielding substantial gains in AIME benchmarks (@iScienceLuvr, @_akhaliq). ProRL and GRPO continue to advance RL-based LLM capabilities (@_akhaliq).
Memory Architectures and Continual Learning: Google ATLAS introduces “active memory” with learnable state and the Muon optimizer for sharper updates (@TheTuringPost); ChatGPT’s memory system is seen as a key differentiator for agentic applications (@karpathy, @hkproj). RLVR and post-training mechanisms are discussed as crucial for math/coding improvements (@lateinteraction).
Model Reasoning, CoT, and Interpretability: Pivot tokens and entropy in Chain-of-Thought (CoT) reasoning are being actively researched, with RL largely adjusting the entropy of high-entropy tokens (@teortaxesTex, @iScienceLuvr). Self-challenging agents use self-generated tasks and verifiers to boost tool-use (@jaseweston).
Grokking, Scaling, and Learning Dynamics: Phase transitions in grokking and cumulative learning mechanisms are explored (@raphaelmilliere), with meta-learning and scaling of RL environments cited as unlocking continual adaptation (@tamaybes). New empirical approaches to critical batch size and optimizers like Muon are covered (@eliebakouch).
Quantization and Efficiency: MLX’s dynamic quantization method yields better quality at no extra size for Qwen3 models (@awnihannun), and FP8 is proposed as an optimal mode for image/video gen (@RisingSayak).
Prompting, DSPy, and Programming Paradigms: DSPy is positioned as a separation-of-concerns paradigm for prompting and workflows, not just prompt optimization (@lateinteraction). Prompt engineering is critiqued as an abstraction (@lateinteraction).
Method-Driven vs. Problem-Driven Research: Method-driven research, such as AlphaEvolve’s optimization, is highlighted as increasingly dominant over problem-driven approaches, especially in the LLM era (@_jasonwei).

3. Model/Platform Comparisons, User Experiences, and Evaluation Practices

Model Routing and UX Recommendations: Extensive guides and personal heuristics for which ChatGPT model to use for which task—o3 for difficult problems, 4o as daily driver, o4-mini for search/analysis, 4.1 for coding (@karpathy, @scaling01, @aidan_mclau). Gemini 2.5 Pro and Claude 4 cited as daily drivers for coding and brainstorming (@reach_vb, @wightmanr).
Agentic Assistants and Automation: Document-centric workflows increasingly use automation agents for end-to-end batch processing rather than assistant-style UX (@jerryjliu0).
Security and UX Issues: Issues around repo forking, GitHub permissions, and OpenAI interface clarity are discussed and resolved (@andersonbcdefg).
Evaluation and Evals Conferences: Evals are now a core discipline, with dedicated tracks for practitioners (@swyx). Stripe evals highlight A/B testing for agent performance (@OpenAIDevs).
Prompt Engineering & Memory Use Debates: Users debate the value of turning on memory in ChatGPT, with some preferring “raw capabilities” and others citing the importance for product/UX (@Yuchenj_UW, @sjwhitmore).
Search, Retrieval, and RAG Practices: ColQwen2 lands in Hugging Face transformers for visual document retrieval, improving RAG pipelines (@mervenoyann, @tonywu_71).

4. Societal, Regulatory, and Strategic Considerations (AI Redlines, Open Source, Policy, Safety, Ecosystem, AGI, Education)

AI Redlines and Deterrence: Strategic proposals for international AI redlines center on preventing intelligence explosion and malicious use (e.g., AI virologists or cyber agents), with transparency and verification emphasized over rigid definitions (@DanHendrycks, @DanHendrycks, @DanHendrycks).
Open Source Advocacy, Ethics, and Accessibility: VLA models (Vision-Language-Action) released open source for robotics and transparency (@ClementDelangue), PlayDiffusion for speech editing, and Common Corpus (~2T tokens) for LLM pretraining (@iScienceLuvr).
Ecosystem and Platform Maturity: Open science and maturity in speech/audio AI are celebrated (@reach_vb), with new orgs like @LawZero_ focusing on safe-by-design AI (Yoshua_Bengio).
AI in Education & Work: Calls to empower everyone—engineers and non-engineers alike—to code with AI tools (@AndrewYNg), and cs224n 2024 covers pre-training, post-training, and reasoning (@stanfordnlp).

5. Industry, Hardware, and Market Trends (Nvidia, Hardware, Robotics, Apple, Databricks, Snowflake, etc.)

Nvidia Blackwell and Hardware Acceleration: Nvidia B200s and Blackwell chips are now serving DeepSeek R1 at up to 5x H100 throughput (@scaling01, @ArtificialAnlys), and Figure-01 vs. Figure-02 humanoid robots show step-changes in engineering (@adcock_brett, @adcock_brett).
Apple, Data, and Market Moves: AI letdown expected at Apple’s WWDC, Oracle’s market cap surprises in AI platform discussions (@TheRundownAI, @sarahcat21).
Decentralized Compute & Cloud: DeepSeek-R1-0528 demonstrates 100% uptime with decentralized compute, outperforming other providers (@jon_durbin). Google Cloud Run ships serverless GPU for all, with no quota, enabling pay-per-second L4 access for Gemma and others (@_philschmid).

6. Memes, Humor, and Cultural Commentary

AI, Model, and Industry Memes: Notable memes include OpenAI board drama movie adaptations (iScienceLuvr), humorous takes on “kitchen charges” and extra fees (@Yuchenj_UW), “why Elon likes this lol” (@Yuchenj_UW), and sarcastic commentary on model features and industry quirks (@vikhyatk, @skalskip92).
Programming and Engineering Humor: Jokes about version control, debugging, and code review culture (@hyhieu226, @HamelHusain).
Pop Culture, Sports, and Miscellanea: Comments on chess, aviation, and influencer culture intermix with AI threads (@demishassabis, @TomLikesRobots).

AI Reddit Recap

/r/LocalLlama Recap

1. AI Model and Infrastructure Open Source Releases

Google opensources DeepSearch stack (Score: 840, Comments: 77): Google has open-sourced DeepSearch, a demo stack for building AI agents using Gemini and the LangGraph framework, as detailed in the google-gemini/gemini-fullstack-langgraph-quickstart repository. While this stack is not the same as that used in the Gemini user app, it is intended to accelerate agent development by providing modular backend/frontend components, containerization (Docker), and showing integration workflows for LLM-based apps. Adaptation to other models (e.g. Gemma, or different search tools) is possible by substituting relevant modules; the architecture is straightforward and suitable for rapid prototyping. Commenters emphasize that DeepSearch is a well-structured demo rather than production infrastructure and recommend more advanced alternatives like LangManus for complex use-cases. The use of LangGraph is highlighted as a flexible pattern, with notable enthusiasm for recent open-source releases from Google and the performance of Gemma models.
- The author clarifies that the open-sourced DeepSearch stack is distinct from what’s used in the Gemini App. It leverages LangGraph, making it modular—developers can swap out Gemini-specific parts (e.g., replace Gemini with open models like Gemma); however, the search capability requires alternate tooling since it’s not decoupled.
- Multiple commenters point out that the backend architecture, while clean and educational, is not novel or especially complex. They suggest projects like LangManus (https://github.com/Darwin-lfl/langmanus/tree/main) as examples of more elaborate LangGraph-based systems, highlighting that DeepSearch is best viewed as a well-constructed demo rather than a production-ready or groundbreaking stack.
- There is positive technical feedback on recent Google open models, with Gemma 3 4B called out as particularly strong among small LLMs.
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B · Hugging Face (Score: 133, Comments: 26): Nvidia’s Nemotron-Research-Reasoning-Qwen-1.5B is an open-weight 1.5B parameter LLM targeting complex reasoning (math, coding, STEM, logic), trained via ProRL (Prolonged Reinforcement Learning) which extends RL training with innovations like mitigating entropy collapse, DAPO (Decoupled clip and dynamic sampling policy optimization), and KL regularization with reference policy reset. Benchmarks show Nemotron-Qwen-1.5B outperforming DeepSeek-R1-1.5B significantly—achieving +14.7% (math), +13.9% (coding), +54.8% (logic), +25.1% (STEM), and +18.1% (instruction-following) pass@1 rates—and matching or surpassing DeepSeek-R1-7B, as detailed on the Hugging Face release (https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B). GGUF weights are available in q4, q8, and f16 (https://huggingface.co/stormchaser/Nemotron-Research-Reasoning-Qwen-1.5B-GGUF/tree/main). Commenters note the growing practical viability of small, edge-focused models, highlighting Nemotron and comparables (e.g., Gemma, Qwen3) running on mobile devices. Some technical dissatisfaction is expressed regarding Nvidia’s restrictive licensing (CC non-commercial) which limits real-world utility and commercialization potential.
- The Nemotron-Research-Reasoning-Qwen-1.5B model leverages a novel Prolonged Reinforcement Learning (ProRL) algorithm designed for longer RL training periods, allowing deeper task exploration and improved generalization. It introduces three core techniques: mitigation of entropy collapse, decoupled clip and dynamic sampling policy optimization (DAPO), and KL regularization with reference policy reset—adapting Group Relative Policy Optimization (GRPO) for this small-scale LLM context.
- Benchmarks indicate that Nemotron-Research-Reasoning-Qwen-1.5B surpasses both DeepSeek-R1-1.5B and is competitive with DeepSeek-R1-7B despite being much smaller, with reported average pass@1 improvements over DeepSeek-R1-1.5B of 14.7% in math, 13.9% in coding, 54.8% in logic puzzles, 25.1% in STEM reasoning, and 18.1% in instruction-following tasks, suggesting a significant leap in sub-3B model capabilities for complex reasoning.
- The model is open-weight but released under a Creative Commons Non-Commercial license, which limits its use primarily to research and non-commercial development. Some commenters highlight that similarly restrictive or revocable licenses have accompanied other Nvidia-released models, potentially curtailing wider adoption in production or commercial landscapes.

2. Cutting-edge Research on Model Behavior and Bias

New META Paper - How much do language models memorize? (Score: 176, Comments: 30): The arXiv paper “How much do language models memorize?” rigorously quantifies GPT-style transformer storage capacity, reporting empirical memorization of ~3.6 bits/parameter (e.g., 3.51 bits for bfloat16, 3.83 for float32). It identifies memorization occurring up to a capacity threshold, followed by a ‘grokking’ phase where models begin to generalize by encoding broader patterns rather than instance-specific details. The paper links this transition to the onset of double descent in loss curves—specifically, as training data information exceeds model capacity, necessitating cross-instance information sharing and improved generalization. Scaling laws derived from hundreds of transformers (0.5M–1.5B params) predict that large LLMs, when thoroughly deduplicated, become increasingly robust against membership inference and verbatim extraction attacks, with generalization, not memorization, explaining extracted knowledge. Technically informed discussion in the comments raises questions on how these findings extend to MoE architectures, variations in training (e.g., quantization-aware training, low-precision regimes), and alternative models like BitNet—particularly whether the ~3.5 bit/parameter barrier holds or shifts under such settings. It is also debated how quantization below this threshold fundamentally limits generative capacity for GPT-style models.
- The paper quantifies transformer model capacity, finding GPT-style LLMs can store about 3.5–4 bits per parameter (3.51 for bfloat16, 3.83 for float32), and doubling numerical precision does not double storage capacity, revealing capacity is not strictly tied to parameter precision.
- Analysis of memorization vs. generalization dynamics shows that language models initially memorize training data until reaching capacity, after which they transition (‘grokking’) to broader generalization, correlating this with the observed ‘double descent’ effect, where dataset information exceeds model storage and the model must begin to generalize.
- Extending this framework, commenters raise questions on how results scale to larger or MoE (Mixture of Experts) models and implications for quantized models: if quantization drops below ~3.5 bits, sharp output degradation may occur, possibly limiting tricks for sub-3.5 bit quantization; further, technical debate remains about how these findings extrapolate as model/dataset sizes increase beyond the studied 500k–1.5B parameter range.
Vision Language Models are Biased (Score: 100, Comments: 54): A recent study (analysis link) finds that state-of-the-art vision-language models (VLMs) show high bias: while they achieve 100% counting accuracy on canonical images (e.g., 4-legged dogs, 3-striped Adidas logos), their accuracy drops drastically to ~17% when faced with counterfactual or atypical images (e.g., 5-legged dogs). This indicates a reliance on memorized, training-set knowledge over actual visual analysis, with this confirmation bias observed across top-performing VLMs, tasks, and domains. The study further notes that bias persists regardless of prompt engineering, suggesting a structural model limitation affecting visual reasoning on novel features. Several commenters note that this outcome is unsurprising, attributing the bias to the inherent statistical shape of training data and echoing that all AI systems reflect dataset and societal biases. Some expand by drawing parallels to similar linguistic biases in large language models (LLMs), e.g., completions of “My favorite cuisine is” showing strong preferences towards common options like “Italian”.
- The top comment provides concrete benchmark-style details: Vision Language Models (VLMs) succeed nearly perfectly (~100% accuracy) on familiar images (e.g., counting stripes on Adidas logos or legs on standard animals), but their performance plummets to ~17% accuracy for images with unusual or counterfactual traits (e.g., a 5-legged dog or a 4-striped Adidas-like logo). This highlights strong prior-based bias and limited generalization in OOD (out-of-distribution) scenarios.
- It’s noted that such VLM failures extend beyond logos and animals to cases like hands with non-standard numbers of fingers, implying a pervasive difficulty with OOD or counterfactual reasoning. This suggests the models may be heavily overfitted to common visual distributions present in their training data.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. AI Model Access Inequality and Economic Impact Debate

Dario Amodei worries that due to AI job losses, ordinary people will lose their economic leverage, which breaks democracy and leads to severe concentration of power: “We need to be raising the alarms. We can prevent it, but not by just saying ‘everything’s gonna be OK’.” (Score: 1378, Comments: 364): Dario Amodei, CEO of Anthropic, warns that extensive AI-driven job displacement could strip ordinary individuals of their core economic power, risking the collapse of democratic mechanisms and yielding severe power concentration. He stresses that mitigating these impacts requires urgent systemic intervention and not complacency, as detailed in this discussion and in past warnings from AI leaders. Technical commentary in the thread notes the ‘boiling frog’ effect—slow, incremental changes make these risks less visible and less actionable until too late, paralleling historic failures in pandemic response. Core opinions in the thread highlight chronically slow societal response to systemic threats (‘boiling frog effect’) and suggest that meaningful action is often only triggered post-crisis, reflecting skepticism about proactive mitigation even in technical communities.
- One comment compares the gradual impact of AI-driven job automation to the “boiling frog” problem, suggesting that because AI is displacing jobs incrementally rather than all at once, public and policy responses lag—making it difficult to address technological unemployment proactively.
- Several users discuss systemic risks, aligning with Dario Amodei’s warning that AI could erode economic leverage for ordinary people, potentially destabilizing democracy due to increasing power concentration unless significant preventative action is taken.
Dario Amodei worries that due to AI job losses, ordinary people will lose their economic leverage, which breaks democracy and leads to severe concentration of power: “We need to be raising the alarms. We can prevent it, but not by just saying ‘everything’s gonna be OK’.” (Score: 293, Comments: 77): Dario Amodei (Anthropic CEO) warns that mass AI-driven job displacement could erode the economic leverage of the general populace, breaking democratic structures and consolidating power among a few entities. He urges immediate, non-complacent policy and societal responses, framing risks as not only economic (unemployment) but systemic (power concentration). Top comments reference historical societal stability, shared anxiety over loss of predictability, and theories (Yarvin, Thiel, Andreesen) about neo-corporatist or neo-feudal outcomes—such as employer towns and reliance on UBI-backed, highly consolidated company jurisdictions. Technically aware commenters echo the urgency, noting potential social unrest from mass anxiety, and draw parallels to speculative socio-economic models where corporate power replaces traditional governance, further highlighting the risk of reinforcing inequalities through AI-induced labor disruption.
- Medical_Mine1275 discusses the potential socioeconomic impacts of AI-driven job loss, referencing Curtis Yarvin’s theory (also linked to Peter Thiel and Marc Andreessen) of a move towards highly concentrated corporate power reminiscent of classic ‘company towns.’ The comment raises the scenario where companies, empowered by AI, erode workers’ bargaining power—replacing them with UBI-supported, hyper-corporatized living (e.g., Tesla tiny homes, drone-delivered food), and pushing people toward virtual economies as substitutes for material wealth. This outlines a technically detailed vision of AI-induced power shifts, economic dependence, and potential return to neo-feudal governance structures.
We need to do everything in our power to prevent AI from becoming a luxury (Score: 222, Comments: 94): The post highlights the emerging paywalling of top-tier LLM access (OpenAI: $200/mo, Anthropic: $100/mo, Google: $130/mo), expressing concern that open-source LLMs, while currently strong (e.g., DeepSeek, Qwen), will require prohibitively expensive GPUs as they grow, possibly leading to further privatization and increased inequality. The author asserts that advances in model scale are outpacing consumer hardware, making local inference less viable, and foresees potential consolidation or monetization by Chinese open-source contributors, predicting a dangerous widening gap between high-end private models and general public access, with significant societal implications if left unaddressed. Key technical responses argue that the root problem is the intrinsic cost of training and running large-scale LLMs, not artificial scarcity, and suggest socialization (public funding/subsidies) as a potential path to equitable access. Others contextualize AI access as analogous to utilities (e.g., electricity), implying a premium for high-capacity use, or note that lower tiers remain functionally robust for most users but exclude the very latest advancements.
- Several commenters discuss that developing and running advanced AI models entails significant and unavoidable costs, referencing observations that major providers like OpenAI and Google are incurring losses or raising prices for products like Pro plans to $250/month. This is cited as evidence against current price gouging, with the competitive landscape suggesting that extreme pricing is mainly attributable to high operational expenses, not monopolistic behavior.
- One commenter draws attention to tiered AI access: most core functions of state-of-the-art models are available at lower tiers, and only users seeking the newest or most advanced capabilities pay a premium for early access. This model follows common industry patterns (e.g., economy vs luxury in other markets), rather than uniquely restricting access to AI.
- The suggestion to socialize AI is raised, emphasizing that, given the intractable near-term costs, broad and equitable access requires distributing those costs societally (e.g., through public funding or utility-like socialization), rather than expecting that rapid technological or economic changes will immediately make advanced AI universally accessible.
Former OpenAI Head of AGI Readiness: “By 2027, almost every economically valuable task that can be done on a computer will be done more effectively and cheaply by computers.” (Score: 1026, Comments: 356): The image is a tweet from Miles Brundage, former OpenAI Head of AGI Readiness, predicting that by 2027, nearly all economically valuable computer-performed tasks will be more efficiently and cost-effectively executed by AI. Brundage clarifies that this prediction refers to technical feasibility (‘will be doable’) and output quality in isolation, not necessarily widespread adoption or human-value considerations. The post contextualizes this as an indicator of rapid anticipated AI progress with qualified caveats about adoption and human preference. Comments raise technical critiques, highlighting (1) organizational and data bottlenecks that would impede rapid AI adoption, even if technical capability exists, (2) overestimation of computer replacement for jobs without understanding on-the-ground reality, and (3) the urgent need to address potential societal impacts (e.g., UBI, automation tax) given the predominance of white-collar jobs in the USA. These reflect skepticism about both practical implementation timelines and broader social consequences.
- Fenristor highlights a major practical limitation to rapid AI adoption: most companies lack programmatically accessible, high-quality data. Even with intense effort, transforming legacy systems and unstructured data into formats suitable for automation would likely not be feasible by 2027, indicating a severe constraint on the timescale for widespread automation of economically valuable computer-based tasks.
- ryanhiga2019 points out a current technical limitation of LLMs: persistent hallucination and unreliability. Unless foundational issues with model factuality and robustness are solved, scaling up LLMs to handle mission-critical or economically valuable tasks may not be possible within the suggested timeframe.

2. Major AI Model Releases and New Feature Launches (Spring-Summer 2025)

Apple reportedly tests AI models that match ChatGPT’s capabilities in internal benchmarks (Score: 290, Comments: 119): Apple is reportedly testing large-scale internal LLMs with up to 150B parameters that achieve benchmark parity with ChatGPT, but these models face high inference costs and unresolved technical/safety barriers, precluding public launch. Instead, Apple plans to release significantly smaller on-device Foundation Models (~3B parameters) for third-party developers, offering only basic ML capabilities at WWDC 2025; advanced features like a conversational Siri are delayed past 2026 due to internal caution and technical limitations. See the detailed report. Commenters stress skepticism about Apple’s lag in real-world AI deployment, noting Siri’s underperformance compared to even Google’s older assistants, and question the cost-effectiveness and practicality of scaling the tested 150B parameter models.
- Apple’s internal AI models reportedly reach up to 150B parameters and match ChatGPT in certain benchmarks, but there are indications of significant operational costs per token and technical limitations that prevent public release, hinting at large-scale deployment/inference concerns.
- Discussion highlights skepticism about what ‘ChatGPT capabilities’ means, given the wide performance variance between models (e.g., comparing GPT-4o-mini vs. GPT-3.5 or GPT-4), and suggests Apple’s efforts may not target state-of-the-art, cost-effective performance levels that make public deployment viable.
- Commenters point out that even if Apple achieves ChatGPT-level technical benchmarks internally, practical AI integration in end-user products (like Siri) is still lacking, suggesting a focus on deployment, cost, and real-world functionality rather than just parameter counts or closed benchmarks.
Microsoft brings free Sora AI video generation to Bing (Score: 245, Comments: 51): Microsoft has integrated OpenAI’s Sora video generation into the Bing app (rebranded as Bing Video Creator), providing free AI-generated video capabilities, though there is still no standalone Sora app or integration into the ChatGPT app. Initial user reports highlight basic generation capabilities (e.g., slow-motion gifs), but note that content safety filters are highly restrictive, often blocking requests. Experts are debating the limited rollout—being only on Bing and not as a dedicated Sora product or part of ChatGPT—and highlight concerns about overly aggressive safety filtering reducing practical usability.
- Technical users highlight that Sora, now available as Bing Video Creator, still lacks a dedicated app or integration within the ChatGPT app, limiting its accessibility compared to potential competitors.
- Some commenters express dissatisfaction with Sora’s content safety filters, reporting frequent ‘request blocked’ responses, which may hinder experimentation and creative applications for technical users.
- Multiple users compare Sora unfavorably to Google’s Veo3 model, arguing that Veo3 produces significantly better video generation results, and suggest Microsoft’s offering currently lags in terms of output quality and capabilities.
OpenAI is preparing to release 2 new models with native audio support (Score: 229, Comments: 31): OpenAI is preparing to release two models—gpt-4o-audio-preview-2025-06-03 and gpt-4o-realtime-preview-2025-06-03—both featuring native audio support. These models appear to expand on GPT-4o’s multimodal pipeline by offering integrated audio input/output capabilities, possibly enabling low-latency real-time audio interaction and audio data processing within the LLM framework. Details on architectural changes or improvements over the existing GPT-4o (already featuring some audio modalities) are not yet published, but the ‘realtime’ naming suggests subsecond response for voice assistants. See source. Commenters question the difference between these and existing GPT-4o models, asking what qualifies as ‘native audio’ and whether these releases are the highly anticipated conversational audio assistants demoed previously, indicating ambiguity in the definition of ‘native’ and the specifics of model improvements for audio tasks.
- Several users are debating what “native audio” means, noting that GPT-4o was already demonstrated with native audio input and output during its announcement. This raises questions as to what new capabilities the upcoming models might bring over existing offerings like GPT-4o’s real-time audio/voice multimodality.
- One hypothesis is that these models could be an evolution of the audio assistant features previewed in earlier public demos, possibly indicating enhanced conversational or low-latency audio processing. The technical community is waiting for clarification on how “native” differs from prior audio handling methods, especially regarding architecture or latency improvements.
- A user mentions interest in expanding the concept to video as a continuous bitstream, suggesting there is technical demand for models handling audio and video as unified, native streams for real-time assistant or generative tasks. This points toward ongoing interest in truly multimodal, continuous input architectures beyond current separate modality tokenization.
Memory is now available to free users!!! (Score: 235, Comments: 57): The image is an official ChatGPT announcement revealing that memory features—previously limited to paid users—are now available for free users beginning June 3, 2025. The announcement notes that free users will get a ‘lightweight version’ of memory, where the system references recent conversations to improve the relevance of responses. There are regional differences (manual opt-in in select European regions), and users retain the ability to disable or manage memory. View the announcement image. Top comments discuss privacy and data usage: Paid users note the value in being able to turn off training data usage (with skepticism about enforcement). Others criticize the feature, warning that auto-memory may introduce bias or outdated info, and request finer-grained, manual memory control.
- Some users express concerns about how ChatGPT’s memory implementation works at a technical level, noting that it collects chat histories to enrich user prompts as a type of automated knowledge base. This can introduce issues where the model makes unwarranted assumptions or shifts towards personalized but factually imprecise responses, instead of unbiased information.
- There are critical assessments of the effectiveness and fidelity of the new memory feature: users report that it sometimes retains irrelevant or outdated details, and there’s frustration over the lack of manual controls for curating the stored ‘memories’. The automatic memory system is seen as less reliable at recalling important specifics compared to older mechanisms or manual alternatives.
- A paid subscriber raises a data privacy point: Plus users have the option to disable their data being used for future model training, which is positioned as a key differentiator from the free tier. However, it is noted as a transparency issue whether OpenAI fully complies with this policy in practice.
Research is Now Available on Pro Plans!! (Score: 135, Comments: 39): The image presents a user interface update introducing the ‘Research’ feature (labeled BETA) for Pro Plan users on an AI assistant platform, likely targeting enhanced web-based research capabilities directly within the chat environment. Technical discussions in the comments focus on the research feature’s performance, comparing it to competitors, and noting qualitative differences: users report that the mode provides context-rich, tailored insights rather than direct answers (e.g., giving tips for recipe development instead of just copying a recipe), which may indicate more nuanced information synthesis. Commenters express positive impressions regarding the research feature’s depth and adaptability, noting its usefulness for nuanced queries and its non-formulaic approach to presenting findings. There is some curiosity about comparative benchmarking against similar features from other AI providers.
- A user observed the system deploying 3-4 subagents using a depth-first approach to research, suggesting possible multi-agent architectures or parallel context gathering, which could influence the breadth and depth of synthesized outputs.
- One comparison highlighted that the system cited “300 sources and counting” for a single research task, which is significantly higher than reported for GPT or Perplexity-based systems, implying greater coverage and potentially richer information synthesis.
- Comparative reviews indicate that Claude Max and SuperGrok are rated as top performers for research tasks, with Gemini described as overly verbose and OpenAI models as providing information that feels too detached, pointing to differences in retrieval styles, answer synthesis, and UX across leading models.

3. Creative Uses and Production Breakthroughs with Veo 3 and AI Video

Ulianopolis City Hall in Brazil made a complete commercial with VEO 3, spending only R$300 reais ($52 dollars) in VEO 3 credits (Score: 1047, Comments: 196): Ulianopolis City Hall in Brazil produced a full, professional-grade 1-minute advertising video for only R$300 ($52 USD) using Google’s VEO 3 generative video model, a stark contrast to the conventional R$100,000 ($17,543 USD) required for traditional production workflows. The video—referenced by its creator on Instagram—showcased not only high-resolution visuals and narrative cohesion but also advanced linguistic localization (support for regional Brazilian Portuguese and accents), demonstrating VEO 3’s advanced multimodal synthesis capabilities and the rapid downward pressure AI pacesets on production costs. This underscores a major industry disruption: such generative tools can bypass large, multi-role creative teams, drastically cutting costs and production complexity. Comments highlight the paradigm shift, emphasizing that generative AI tools will significantly displace traditional ad agencies and production teams, especially as iterative prompting and editing are orders of magnitude cheaper and easier. A key technical impression was the naturalness and authenticity of linguistic and cultural localization in the output, especially since non-English output is generally a challenge for generative models.
- Multiple commenters highlight the cost efficiency of generating a high-quality, localized commercial using VEO 3—noting that the R$300 ($52) expenditure for credits is dramatically lower than traditional commercial production methods, which involve hiring and logistics costs.
- Technical discussion points to the high quality of the AI-generated speech in Portuguese, including local accents, with users impressed by the naturalness and fluency, raising the bar for AI localization capabilities in commercial and governmental contexts.
- One commenter emphasizes that AI-generated content in a native language (here, Portuguese) is especially striking, since achieving accurate dialect and accent has historically been a weakness in synthetic media, suggesting notable advances in VEO 3’s multilingual or accent-aware synthesis.
The Hulk vs Thanos fight that the fans deserved, in Veo 3. (Score: 945, Comments: 336): A user shares a YouTube video generated with Veo 3, supposedly depicting a higher-quality Hulk vs Thanos fight using Google’s Veo video generation model (see YouTube link). Veo is known for generating high-fidelity, text-to-video content; however, the discussion notes limitations in AI-driven fight choreography and physical believability, particularly in complex scenes with dynamic character movement. The need for real actor motion references is highlighted as a potential solution to enhance realism. Commenters emphasize that current AI methods like Veo struggle with realistic fight choreography, suggesting that integrating motion capture data from real actors could improve outcome. There’s also debate about how well AI-generated fights capture the nuances of trained combat versus brute strength, referencing cinematic storytelling.
- Several commenters highlight that current AI-generated fight sequences lack the nuanced choreography of human-designed scenes, suggesting that referencing actual filmed fights with actors and then transferring that data to AI could significantly improve realism and impact registration.
- One technical critique points out that AI-generated animations do not yet accurately simulate the physical impact and interaction between characters during combat, which is a key area for further advancement in models like Veo 3. This includes better collision detection and realistic response to force.
- A comparison is drawn between trained fighters (e.g., Thanos in Infinity War) and brute force approaches (e.g., Hulk), with the observation that AI struggles to replicate the subtle knowledge and decision-making present in expert fight choreography, as opposed to just raw, uncoordinated strength.
REVOLUTIONARY VLOG withVEO 3. 1ST attempt (Score: 155, Comments: 27): User shares their first experiment using Veo 3, explicitly noting they are learning the prompt system and inviting technical feedback for novice users. There is no detailed benchmark, code, or specific implementation discussion included—this is a request for constructive critique on prompt engineering for Veo 3 video generation. No substantive technical debate is present in top comments; responses are primarily humor and lack technical depth or feedback on model usage.
- One commenter reflects on the impact of AI-generated content by highlighting the creative scenario of ‘vlogging’ during the Revolutionary War, suggesting that AI opens novel storytelling and perspective-shifting opportunities. This underscores a technical discussion point about how generative AI enables anachronistic, immersive media formats that were previously impossible—potentially changing historical education, entertainment, and digital humanities projects.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Major Model Releases and Performance

O3 Pro Stealth Release Sparks Speculation, Gemini Benchmarks Leak: An unannounced release of O3 Pro sparked claims it was better than regular O3 but potentially limited to 64k tokens. Leaked benchmarks showed Gemini 2.5 Pro scoring 86% against O3 High’s 79.6% on Aider Polyglot coding benchmark, and it’s anticipated around Thursday costing $42.
Claude 4 Models Crowned Leaders, Anthropic Cuts Capacity: Members asserted that Claude models are leading by far, with one stating it became an addiction to use only thinking models due to better results. Anthropic unexpectedly cut off nearly all Claude 3.x model capacity with less than five days’ notice, causing widespread availability issues for customers.
Google Unleashes Veo 3, Gemini Flash Hits Error Wall: Google launched Veo 3 for video generation alongside the open-source Gemini Fullstack Langgraph Quickstart. Users reported Internal Server Errors and high latency using Gemini 2.5 Flash via OpenRouter, an issue potentially stemming from Google’s side due to load.

Theme 2. Infrastructure and Hardware for LLMs

Users Push Context Windows Past 500K, KV Cache Saves Day: Users successfully loaded models with 350,000 and 500,000 token context windows on consumer hardware like an RTX 4060 Ti, achieving 2.25 t/s and 0.38 tok/sec respectively at large contexts. Running Qwen 7B with 1M context required 70GB of memory, thanks to KV cache quantization.
Nvidia Blackwells Networked for Supercomputing, Hardware Debates Rage: Members pictured networking Nvidia Blackwells into an Ultra DGX Superpod for AI factories. Debates included Macs with 512GB offering 448GB VRAM vs AMD AI MAX 395+ mini PCs and building LLMs with Supermicro H12 (SP3, PCIe 4.0) motherboards.
FP8 Training Scales to Trillion Tokens, New PEFT Method Boosts Knowledge: The paper “Scaling FP8 training to trillion-token LLMs” demonstrates training with FP8 precision on up to 2 trillion tokens, introducing Smooth-SwiGLU to fix instabilities. A new parameter-efficient finetuning method claimed ~4x more knowledge uptake than full finetuning or LoRA while using fewer parameters.

Theme 3. Agents and the Model Context Protocol (MCP)

Agent Frameworks Explode From Claude Code, DSPy, Aider: A deep dive into ClaudeCode’s self-driving coding agent detailed its systems, tools, and commands. DSPy powered a solution for DARPA’s Advanced Research Concepts lab which is now spinning out into a company.
MCP Adoption Grows, Gorilla Named Protocol MVP: Gorilla is recognized as the MVP of MCP, routing model queries to real API actions and proving the need for model interfaces. The Gradio Agents x MCP Hackathon offers $16.5k in prizes and $900k in credits to build tools/demos.
Monetizing MCP Servers, Self-Hostable Assistants Arrive: The MonetizedMCP open-source framework adds programmatic payments (crypto/fiat) to any MCP server with a demo video. Piper (github.com/jmagar/piper), a self-hostable assistant, aims to enable mobile MCP usage due to lack of good options.

Theme 4. Developer Tools, Data & Evaluation

LLM Scribe Creates Datasets Fast, NotebookLM Shares Findings: The LLM Scribe tool (huggingface.co/spaces/Gabriella0333/LLM_Scribe_Demo) streamlines creating hand-written datasets for fine-tuning, exporting to formats like ChatML/Alpaca/ShareGPT. NotebookLM now allows sharing public notebooks and users noted smooth audio overview generation using the discover feature.
Evaluation Tools Flood Market: YourBench, Modal Almanac, WeightWatcher AI: Hugging Face’s YourBench was highlighted as a very under-rated resource for model evaluation. Modal Labs released the LLM Engineer’s Almanac with thousands of inference benchmarks, and WeightWatcher AI surfaced for LLM analysis.
IDE/Tooling Woes Hit Cursor, Aider, NixOS: Cursor users reported inaccurate usage-based billing and frequent chat interruptions with Claude 4 Sonnet, prompting new chat windows. Aider users sought better control in /ask mode to prevent unwanted code suggestions and debated session resumption with -resume. A NixOS contributor sought input on improving declarative ML/DS practices, highlighting Nix’s power.

Theme 5. Research Concepts and Broader Implications

Nous Research Steers Shoggoth with SMC: Nous Research released a blog post on using Sequential Monte Carlo (SMC) with multiple ‘particles’ to steer text generation against scoring functions. Code for a parallelized inference server is available for benchmarking constraint designs.
LLMs Grapple With Safety, AI Act, Real-World Skills: Members debated CBRN and cybersecurity risks from LLMs, with some arguing against overblown hype and pointing to implementation flaws, while another argued the bottleneck for CBRN threats is access to physical materials and expertise. Discussion touched on the implications of the AI Act in Italy/Europe and whether LLMs can effectively teach wet lab skills.
AI Bubble Plot Circulates, Advanced Research Techniques Explored: An AI bubble plot image suggested potential market overvaluation. Research discussions included transformer approaches to generative inverse problems using patches and implementing T5 with a diffusion decoder, like in Moyix’s X post.

Discord: High level Discord summaries

Perplexity AI Discord

Claude 4 Reigns Supreme with ‘Thinking Models’: Members are preferring Claude 4 due to its thinking models and self-inquiry approach that yield better results than alternatives.
- One member stated that it became an addiction to use only thinking models.
Perplexity Pro Poses as Budget-Friendly ChatGPT Challenger: Users find Perplexity Pro (~$400 annually) as a cost-effective alternative to ChatGPT Pro, effectively costing the equivalent of two months of chatgpt pro.
- It was noted that Perplexity can connect to Google Drive for continuously up-to-date information access.
O3 Pro Delay Sparks Speculation of OpenAI’s Lag: The delayed release of O3 Pro has stirred concerns that OpenAI may be losing ground against its competitors in capabilities.
- It was stated that O3 pro mode is promised for full tool support, while O1 pro currently only supports images, and not PDFs.
Samsung Courts Perplexity in Potential Acquisition Move: Discussions arose around Samsung’s interest in potentially acquiring Perplexity AI, based on this article.
- There were links shared in passing about creating an app and another about smuggled North Korean smartphones.
Perplexity’s Internal Knowledge Search API Access Remains UI-Bound: A user sought API access to Internal Knowledge Search (link) for searching web and org files, but encountered limited documentation.
- The feature, part of Enterprise Pro, is suggested to be limited to the user interface, suggesting RAG as an alternative, but others suggest to confirm with the API team.

LMArena Discord

O3 Pro Stealth Release Sparks Speculation: An unannounced release of O3 Pro led to claims of it being better than regular O3, though not substantially so, with context window speculations ranging from 64k to 128k tokens.
- The limitation on context window sizes may stem from either capability constraints or cost-cutting strategies.
Gemini 2.5 Pro Leaks Aider Polyglot Score: Leaked benchmarks indicated Gemini 2.5 Pro, possibly codenamed Goldmane, outperformed O3 High on the Aider Polyglot coding benchmark, scoring 86% against 79.6%.
- Anticipated for release on Thursday, the model’s cost was noted to be around $42, marginally higher than Gemini 0506.
Deepthink’s 2M Context Window: There was speculation that Deepthink, rumored to have a 2M context window, could surpass O3 Pro given O3 Pro’s 64k context window.
- The discussion revolved around Deepthink’s tool usage capabilities and the overall impact of such an expansive context window on performance.
Claude Models Reign Supreme: Members asserted that Claude models are leading by far in current AI capabilities.
- It was stated that the non thinking Claude are insane and that only grok 3 or maybe gpt-4.5 comes close.
Apple Acquisition of Anthropic Rumors: A debate emerged concerning the possibility of Apple acquiring Anthropic.
- Counterarguments highlighted Amazon’s partial ownership and Apple’s potential lack of liquid assets for such a purchase, especially with a premium.

OpenAI Discord

ChatGPT’s Memory Evolves: A lightweight version of memory improvements is rolling out to Free users on ChatGPT, allowing the model to reference recent conversations and provide more personalized responses, as detailed in the Memory FAQ.
- Meanwhile, Codex is now available to ChatGPT Plus users, promising enhanced coding capabilities accessible via chatgpt.com/codex.
GPT Image Gen 1 Gets Real: The name of OpenAI’s image generation model is GPT Image Gen 1, generates text images well but might not be great at AI art.
- Some users find GPT Image Gen 1 a good replacement for traditional graphic software.
Google’s Veo 3 Makes Waves: Google launched Veo 3, along with creative workspace Flow to plan, prompt and stitch clips together, according to some users.
- Users noted that Veo 3 will come with a high price tag and is not cheap to use.
GPT-5 Release Rumors Swirl: Speculation suggests GPT-5 may arrive in July with a fix to the naming scheme this summer, according to OpenAI’s CEO.
- Some hope GPT-5 will debut as a unified model, maintaining output quality across various applications.
Local LLMs Step Up: Members discussed open-source LLMs like Bagel, a multimodal model, but it requires around 32GB VRAM to run.
- Another open-source LLM mentioned was DeepSeek R1, said to have 671B parameters and requires a 20 million dollar machine to run.

Unsloth AI (Daniel Han) Discord

AI Engineer Courts Tesla Roadster: An AI engineer joked about wanting nothing more than running AI models and owning a 2026 Tesla Roadster.
- This comment surfaced in response to someone mentioning they got a 1 t/s CPU, with the engineer quipping, “that would be fast”.
GRPO Training Hits the Brakes: Some Unsloth users encountered an Exception: Invalid prefix encountered error when starting training with GRPO.
- A member reported success by installing with pip install unsloth vllm --no-deps, and then proceeding to install dependencies such as accelerate, bitsandbytes, and datasets.
DeepSeek-R1 Chat Template Causes Turbulence: Users reported errors with the tokenizer for DeepSeek-R1-0528-Qwen3-8B and DeepSeek-Prover-V2-7B models, specifically noting missing {% if add_generation_prompt %}.
- A member shared a modified chat template for DeepSeek-R1-0528-Qwen3-8B to resolve the issue.
LLM Analysis by WeightWatcher AI Surfaces: A member linked to WeightWatcher AI, a tool for LLM analysis, that examined how much verbatim data is recallable without overfitting.
- A member mentioned that a WeightWatchers discord review stated they’ve measured saturation, not memorization, which lead to counter claims of trusting papers by nvidia/cornell/deepmind.
Scribe Tool Autocompletes Fine-Tuning Datasets: A member showcased a tool for creating handwritten datasets, exporting in formats like ChatML, Alpaca, and ShareGPT and featuring autosaving, multi-turn creation, token counters, and custom fields (HF demo, video demo, full version).
- A user suggested a generate template feature to generate a full dataset with a small model like LLaMA or Gemini Flash and then edit it manually.

LM Studio Discord

Nvidia Blackwells Networked for Supercomputing: Members discussed networking Nvidia Blackwells like in this Nvidia article to create an Ultra DGX Superpod.
- This setup is envisioned to support advanced AI factories.
Extremely Long Context Windows Achieved: A user successfully loaded a model with a 350,000 token context window, achieving 2.25 t/s, while another pushed it to 500k tokens on a NVIDIA GeForce RTX 4060 Ti with 128 GB RAM.
- At 500k tokens, the processing speed was slow at 0.38 tok/sec with approximately 24623.96s to first token at 49.9% context fullness.
KV Cache Quantization Needed for 1M Context: A user reported running Qwen 7B with 1M context using 70GB of memory with the KV cache quantization.
- This surprised another user who was using 80GB for 500k context.
DDR5 Bandwidth Dwarfed by PCIE 5.0 SSDs: Discussion centered on how gen5 SSDs in RAID0 can exceed DDR speeds, with pcie-5.0 nvme maxing around 15GB/s.
- While some noted the importance of latency, others argued it is less critical with depth queue and consecutive accesses.
Supermicro H12 Mobo fit LLM builds?: A user inquired about building an LLM with a Supermicro H12 (SP3, PCIe 4.0) motherboard.
- A member responded that servers in general are a bit different than consumer PCs but servers should work the same as consumer computers.

OpenRouter (Alex Atallah) Discord

LLM Scribe Streamlines Dataset Creation: LLM Scribe tool launched, designed to streamline the creation of hand-written datasets for fine-tuning, and exports to multiple formats like ChatML, Alpaca, and ShareGPT.
- The tool includes features like auto-saving, multi-turn creation support, and token counters, available on Hugging Face, with a video demo and the full version on Gumroad.
DeepSeek Prover v2 Debated as Top Math Model: A user mentioned that DeepSeek Prover v2 is the best model for mathematics, another reported that Prover V2 is quite meh for anything non-proof.
- The model underperforms in tests compared to other reasoning models.
Gemini 2.5 Flash Faces Internal Server Error: Users reported experiencing an Internal Server Error when using Gemini 2.5 Flash via OpenRouter, as well as high latency and the model using reasoning tokens without being configured to do so.
- The issue seems to stem from Google’s side due to load and is related to vercel/ai#6589, with one user suggesting using a try-catch block with retries.
Grok Accused of Climate Change Denial: A user petitioned to remove Grok from the Flagship Model list because it’s reciting climate-denial talking points, citing this article.
- Another user argued against this, stating that many people like Grok due to the freedom it offers, giving a different perspective compared to other models.
Nous Trains a SOTA Model Distributedly: Nous is attempting to train a State-Of-The-Art model distributedly, leveraging Psyche.network and Bittensor.
- The model is training with limited inter-GPU bandwidth (~300mbps) but faces challenges in attracting enough GPUs to join, and is currently limited to 416 H100s online.

Cursor Community Discord

Cursor Users Flaunt Themes and Plugins: Users shared their Cursor IDE themes and plugins, including background-cover, Material Icon Theme, and Monkey Pro.
- One user highlighted their preference for ‘Material Theme Icons Darker’ and ‘Filter Ristretto’ within the Monkey Pro theme.
Cursor Funding Plea: Local Currency Billing: A user requested Cursor staff to implement billing in local currencies to improve the odds of funding.
- The user emphasized the potential benefits for both customers and increased platform exposure, tagging Cursor staff to highlight the request.
Opus 4 Max Costs a Pretty Penny for Users: A user reported that using Opus 4 Max in Cursor cost them 69.5 requests, approximating $2.73 for a single message.
- Despite the high cost, the user found it valuable for solving a Postgres bottleneck that Sonnet 4 and Gemini could not address.
Billing Portal Riddled with Problems: Users reported inaccuracies in the new billing portal, particularly with usage-based billing and discrepancies in request counts.
- A user pointed out that their included requests period did not align with calendar months, and the analytics graph was lagged by a day or two; they were directed to the usage page for more info.
Claude 4 Sonnet Plagued by Chat Interruptions: Users experienced frequent interruptions during conversations with Claude 4 Sonnet, typically after a single request or around 25 tool calls.
- These interruptions prompt a new chat, loss of context, and network connection errors, thus disrupting workflow and frustrating users.

HuggingFace Discord

SentinelAI Audits DeFi Smart: SentinelAI is auditing DeFi contracts and catching reentrancy issues, according to this post.
- Members are actively debating on the most efficient format for storing datasets on Hugging Face for cloud training, considering dataset sizes of 100k-200k versus 1mil+.
YourBench Under-Rated by Community: A member highlighted Hugging Face’s YourBench initiative, noting it as a very under-rated resource for model evaluation.
- Members celebrated that one user’s space was featured in Spaces Of The Week, confirming that is a big deal.
Image Generator Yields Display Artifacts: A user reported that images generated by the image generator tool initially appear as 1024x768 but change to 0x0 in the final-answer step, and the app doesn’t load in Chrome.
- It was found that the course could be completed on Windows 11.
Agents & MCP Hackathon Streams Live: The Agents & MCP Hackathon is kicking off with a live YouTube stream.
- They are seeking approaches for a model to reason about parts of an image while taking the rest of the image as relevant context for zero-shot classification.
SOTA Embedding Techniques Sought: A member inquired about the state-of-the-art (SOTA) techniques for fine-tuning embedding models and the standard metrics used to compare the base model against the fine-tuned version.
- Others sought an open-source chat UI, similar to ChatGPT or Le Chat, to interface with a locally served Ollama LLM and suggested AnythingLLM for non-Windows users.

Nous Research AI Discord

Nous Research Steers Shoggoth with SMC: Nous Research unveiled a blog post on Sequential Monte Carlo (SMC), a method using multiple particles sampled and resampled against a scoring function to steer text generation and structure.
- The post introduces a parallelized inference server for benchmarking constraint designs, with example experiments using entropy based triggering and control vectors, along with code released on GitHub.
Gemini’s Mega Context Helps Agents Remember: To solve agent memory loss, a member recommended leveraging Gemini due to its mega context length and suggested embeddings be stored in vector databases.
- They also shared a link to their OpenRouter Deep Research MCP server, which utilizes 3 agents and a pglite postgres database for queryable research, noting that Gemini can coherently handle 900k+ tokens.
HF Sponsors Gradio Hackathon, Prizes Offered: A member shared a link to the Hugging Face Gradio Hackathon, promoting the significant credits offered for building agents and agentic apps.
- Other members published their first technical blogposts with hands-on exercises, mathematical derivations, and interactive visualizations.
Scribe Tool Helps Build Handwritten Datasets: A member created a tool to help streamline creating hand written datasets for fine tuning, supporting multiple formats like ChatML, Alpaca, and ShareGPT; checkout the Hugging Face demo.
- It includes auto saving, multi-turn creation, token counters (loaded from Hugging Face), goal tracking, and custom fields; see also the video demo, and the full version.

Manus.im Discord Discord

No Manus AI API in Sight: A member inquired about the release timeline for a Manus AI API but learned that there are no current plans for it.
- The possibility remains open for future consideration, contingent on shifting priorities.
Santa Fe College Misses School Pass Cut: A user noticed Santa Fe College is absent from Manus’ School Pass college list and inquired why.
- A staff member clarified that Santa Fe is listed on the separate Manus Campus list, which differs from School Pass eligibility.
Free Credit Usage Policy Requires Clarification: A member suggested that Manus should exhaust free daily credits before tapping into paid credits, calling the current approach a bit scammy.
- It was explained that Manus consumes credits in this order: event credits > daily free credits > monthly credits > add-on credits > and free credits, prompting calls for clearer communication.
Manus Video Generation: Is it a Veo?: Users debated Manus’ video generation quality, comparing it against competitors like Veo.
- While one user claimed Gemini offers similar functionalities gratis, another touted Manus as top of the line promise upon release.
AI Act Thwarts Task Completion: A user shared a frustrating experience where Manus terminated a task after 1 hour and 55 minutes due to hitting the context limit, which required starting all over again.
- This also prompted a discussion about the implications of the AI Act in Italy and Europe.

Eleuther Discord

Muon Optimizer Misses Mark in GANs: Members found the Muon optimizer ineffective in GANs because GANs need to learn slowly, and one expressed reservations about Muon without momentum.
- One user reported that it doesn’t work for their GANs.
Eleuther Website’s Stale State Spurs Debate: A user asked about the lack of updates, but a member stated the Eleuther website is primarily for those who aren’t ML researchers and serves as an easy way to find the project via Google.
- They emphasized that the website isn’t the primary activity hub, as most action happens in the Discord community.
LLMs Spark CBRN and Cybersecurity Fears?: Members voiced differing opinions on Chemical, Biological, Radiological, and Nuclear (CBRN) and cybersecurity risks associated with Large Language Models (LLMs).
- One member cited research suggesting the hype around CBRN is overblown and that cybersec risks stem more from poor implementation (models having excessive access) rather than the LLMs themselves, while another argued that the bottleneck for CBRN threats is access to physical materials and expertise, not knowledge.
Transformers Tackle Generative Inverse Problems via Patches: For generative/inverse problems using transformers, models sort by patches rather than pixels to preserve information, with the option to separate channels.
- This method is expected to be superior because it preserves some information.
AI Bubble Graph Implies Overvaluation: A user shared an AI bubble plot image suggesting potential overvaluation in the AI market.
- The plot visually represents AI as a bubble, implying inflated expectations or unsustainable growth, and some observers feel it’s important to watch for market corrections, especially as valuations soar.

Yannick Kilcher Discord

NixOS contributor seeks ML/DS prioritization: A Data Scientist and NixOS contributor seeks input on improving declarative, immutable, and reproducible practices in ML and DS.
- The contributor notes that “I use nixos btw” is the only phrase that can scare arch people away” and that Nix is a powerful tool that can significantly improve development/deployment regardless of the Operating System.
Temperature 0 Degrades Thinking Models: Members discussed that a temperature of 0 can cause repetition in thinking models; non-zero temperature is recommended.
- They pointed to Qwen’s documentation which states that “DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions”.
Parameter-Efficient Finetuning Boosts Knowledge: A new method for parameter-efficient finetuning yields ~4x more knowledge uptake compared to full finetuning and LoRA.
- A member expressed interest in extending knowledge with their collection of books and documents to see if it benefits over RAG-like approaches.
FP8 Training Scales to Trillion Token LLMs!: The paper Scaling FP8 training to trillion-token LLMs, demonstrates successful training of large language models using FP8 precision on datasets up to 2 trillion tokens.
- The paper identifies instabilities in FP8 training caused by outlier amplification by the SwiGLU activation function and introduces Smooth-SwiGLU to address this without altering function behavior; more background can be found in jcarlosroldan.com.
Gemini Fullstack Langgraph Quickstart Goes Open Source: Google open sourced the Gemini Fullstack Langgraph Quickstart.
- One member speculated that it might be a method for making the model think for longer, however others pointed out that the new open-source project seems more aligned with quick search functionalities rather than in-depth research requiring more processing time.

Modular (Mojo 🔥) Discord

Modular Hosts Hackathon: Modular is hosting another hackathon focused on Mojo kernels, MAX Graph model architectures, and PyTorch custom ops (https://lu.ma/modular-hack-weekend).
- To kick off the hackathon weekend, Modular will host a GPU programming workshop both in-person at their Los Altos office and virtually via livestream to get people familiar with the technologies they will use.
New Community Member Discovers Mojo: A new community member, a recent graduate from a research masters program in computer science, tried Mojo after seeing a Fireship video and is already seeing improvement on basic ML models.
- He mentioned being familiar with Chris Lattner’s work on LLVM with Vikram Adve as a graduate of UIUC.
C to Mojo Bindings Generator in progress: A member is actively developing a C->Mojo bindings generator and is working through challenges in getting object files in/out of the mojo compiler without cursed workarounds.
- They noted that almost all the necessary components are available, except for wildly horrible packed structs and potentially some tricky aspects around restrict, also pragmas that impact calling convention are going to be a painful bit to figure out.
Mojo lacks Manual Thread Management: Mojo currently lacks manual thread management, with the feature potentially arriving late this year or next year, and some pointed out that this is only v0.3.
- As a result, there are missing features like working atomics and synchronization primitives, thread safety marker traits, basic IO, basic data structures, and the ability to do networking.

DSPy Discord

DSPy Talks DAIS & AI Engineering: The DSPy team is preparing for talks at AI Engineering and Databricks DAIS, seeking community input on topics and use cases, with a DSPy 3.0 release slated for June, according to this message.
- Feedback is requested on the content to cover in the presentation.
DSPy Powers DARPA Project Spinoff: DARPA’s Advanced Research Concepts lab utilized DSPy to create a solution for ‘Collaborative Knowledge Curation,’ which is now spinning out into a company.
- This highlights DSPy’s real-world application and validation in advanced research settings.
DSPy Flow Refactoring Required: Refactoring existing GenAI flows to DSPy can feel like a major change because it is designed for online and production use, unconventionally handling your GenAI flow.
- Guidance is needed on integrating DSPy into existing codebases and workflows.
Agent Framework Construction with DSPy: Interest is increasing in building an agent framework atop DSPy, incorporating first-class environments and managing self- and external-rewards for online learning via optimizers.
- One member questioned why agent framework developers aren’t already pursuing this approach, instead of focusing on other areas.
Claude Code Framework Launched: An engineer is developing an agent framework using Claude code, aiming for Python bindings and a Rust core, complete with retroactive trace representation; it is available as claude_sdk on Github.
- The framework focuses on trace visibility and easy forking for optimization on arbitrary metrics.

aider (Paul Gauthier) Discord

ClaudeCode’s Self-Driving Coding Agent Exposed!: A deep dive into ClaudeCode’s systems, tools, and commands for building a self-driving coding agent was shared here.
- The blog post focuses on real-time, self-corrective agents designed for productive work, emphasizing design decisions over mere prompting or AI engineering.
Aider Session Resumption Saves Context: Discussion arose around resuming Aider sessions to maintain context memory, using the --resume flag, but with incomplete mastery of its usage.
- Users expressed interest in resuming older sessions by ID, while others are restarting sessions to aggressively clear context.
Taming Code Suggestions in Aider’s /ask Mode: Users voiced concerns about Aider’s /ask mode frequently suggesting code changes when only conversation or planning is desired.
- Solutions included explicitly instructing Aider to “don’t write any code yet” or utilizing a /reminder command to set the planning stage.
Mysterious Gemini Benchmarked: A member benchmarked an unreleased model at 86.2% with diff-fenced leading to speculation about the model being from Gemini.
- Concerns arose about potential confusion with the naming of Gemini-2.5-pro versions, sparking calls for a rename or delay to avoid confusion.
Bedrock vs Anthropic Command Execution Faceoff: Users observed that Aider, when using a Bedrock Claude 3 Sonnet model, can successfully execute terminal commands, but a Converse Claude Sonnet model only offers help.
- The user inquired about settings that affect the ability to use terminal commands, particularly in Bedrock cases.

Notebook LM Discord

NotebookLM opens up public sharing: Users can now share their notebooks with anyone using a public link, encouraging community members to show off NotebookLM skills.
- Members are encouraged to share notebooks that the community could benefit from, providing opportunities to share work and benefit the community.
Podcast Summaries Remain Concise: Users are still awaiting a fix for the podcast summaries in languages other than English, noting that they are not as long even with the same prompt and content.
- Multiple users echoed the sentiment, indicating they are also waiting for the fix.
Microsoft Learn Certifications Spark NotebookLM Interest: A user inquired about using Notebook LM with Microsoft Learn and sought use cases and tips for Microsoft Certification.
- This prompted other members to wonder why this happens and if anyone is actively using Notebook LM with Microsoft Learn.
NotebookLM’s Fact-Finding Foibles: A user reported that Notebook LM generates random, unsourced facts and erroneously links them to source documents, requiring constant correction to prevent the AI from using conversation history as a source.
- The hallucinated facts included an “ebony tablet of Aha-Teta” linked to “zom (electrum)”, which was deemed inaccurate as this connection wasn’t found in the provided sources.
NotebookLM Eyes Gemini 2.5 Pro Upgrade: Members are speculating when Notebook LM will start using more advanced models like Gemini 2.5 Pro, but currently there is no known timeline from Google.
- The model in use is likely Gemini 2.5 Flash, though this is not confirmed.

Latent Space Discord

Southbridge Teardown Exposes Claude’s Agentic Internals: Southbridge Research published an analysis of Claude Code’s agentic capabilities, spurring the author to ship both Writer and Hashbrown live.
- The report gave insights into the new models in advance of the Escape Mount Moon Hackathon.
Modal Almanacs LLM Inference Benchmarks: Modal Labs introduced the LLM Engineer’s Almanac, which features thousands of LLM inference serving benchmarks across vLLM, SGLang, and TensorRT-LLM frameworks.
- The almanac offers results, replication code, an executive summary addressing key questions, and their benchmarking framework, stopwatch.
Anthropic Capacity Cutoff Causes Catastrophic Customer Concerns: Anthropic unexpectedly cut off nearly all Claude 3.x model capacity with less than five days’ notice, causing availability issues for its customers, reported by Varun Mohan.
- Users voiced concerns about trust in model providers after receiving less than 5 day notice; alternative models like Gemini 2.5 Pro and GPT 4.1 are unaffected.
Codex Gains Internet Access Amidst Community Crossfire: Sam Altman announced that Codex, an AI coding tool, now has optional internet access for ChatGPT Plus users, disabled by default due to complex risks.
- The community asked for clarification on the purpose and tradeoffs, expressing security concerns.
Community Collides to Conquer Coding Challenges: A live production AI bot was collaboratively developed and deployed to the AIE website using a new UI framework, with enthusiastic community support.
- To improve the feedback loop for the AI bot so it can self-improve, there are discussions about establishing an easier way to report bugs than direct messaging.

Torchtune Discord

Tokenizer Ready for Review!: Unit tests were added as requested in PR #2781, waiting for review to merge.
- The changes look to improve tokenizer performance, but specific performance gains are not yet quantified.
FP8 + TP Ungated with Memory Reduction!: PR #2782 ungates FP8 + TP, enables loss parallelism, and provides memory reduction.
- This PR also enables autograd compiling, but it’s currently broken and the team is aware.
Torchtune emerges as LLaMA-Factory Alternative: A user is employing a torchtune fork for their work, viewing it as a performant and readable alternative to LLaMA-Factory because it avoids dependency on TE, megatron, lightning.
- A team using a torchtune fork for 4-5 months reports that it is super stable with good results.
Context Parallel missing Flex Attention Compatibility: While Context Parallel (CP) has landed, it’s missing flex attention compatibility, which could have significant benefits.
- The distributed team plans to enable flex attention compatibility soon.
Dropping Support for Python 3.9?: The end-of-life status of Python 3.9 is causing issues with linting, as new linting insists on using List -> list, Tuple -> tuple, etc., while CI requires using Union and Optional from typing.
- A user suggested that the failed CI was because of Joe.

LlamaIndex Discord

Gradio Agents x MCP Hackathon Kicks Off: The Gradio Agents & MCP Hackathon is live, offering a livestream, $16.5k in prizes, and $900k in credits across 3 Tracks: MCP Tool/Server, Custom Components for Agents, Agentic Demo Showcase.
- There will be an office hours session for hackathon participants in the HuggingFace Discord server on Wednesday June 4th with the Discord event link shared for those interested.
LlamaIndex Scales Agents in Finance: The slide deck from the Scaling Agents in Finance workshop is now available, demonstrating how to automate document workflows with Agentic AI for finance tasks using Assistant Agents that act as powerful *‘research assistants’**.
- The workshop example involved parsing & indexing 10-K filings from Adobe and using agentic RAG to answer questions.
Llama Agents Deploys as LlamaDeploy: The llama-agents package on PyPI has been renamed to LlamaDeploy, which deploys N number of Workflows as services, with more info here.
- The previous version of LlamaAgents had not been updated since August 16, 2024, before the rename.
LlamaIndex Integrates with MCP for Agentic Awesomeness: A LlamaIndex integration enhances agent capabilities and workflow deployment using MCP providing helper functions for LlamaIndex agents to use MCP server tools and serve any LlamaIndex workflow as an MCP.
- This was discussed at the AI Engineer Summit in San Francisco, where attendees met with the LlamaIndex team to discuss Agentic AI projects.

MCP (Glama) Discord

Seek Guidance on Multi-MCP Interaction: A member is looking for an expert to demonstrate MCP server setup with an MCP client, aiming to create an MCP that interacts with multiple others, like calling the Atlassian MCP and then the Git MCP.
- The goal is to set up publicly available MCP servers for such interactions.
Request new Feature Request Channel: A member inquired about a dedicated channel to discuss new feature requests, particularly regarding long-running tool invocations, to avoid messy discussions across multiple issues and pull requests.
- They sought a better place for focused discussion on feature enhancements.
Seek list of Common MCP Servers to tinker with: A member requested a list of common MCP servers to experiment with, supporting features like sampling, OAuth, prompts, resources, and resource templates.
- Another member suggested checking out the servers directory as a starting point.
MonetizedMCP Demo Showcased: A member introduced MonetizedMCP, an open-source framework for adding programmatic payments (crypto or fiat) to any MCP server, accompanied by a demo video and website.
- It is compatible with the mcp-remote library, offering a streamlined approach to monetization.
Piper gives Self-Hostable Assistant: A member shared Piper, a self-hostable assistant, noting the lack of good options for using MCP from mobile.
- Grizzly integrates cool features that can be complementary to Piper.

LLM Agents (Berkeley MOOC) Discord

MOOC Session Dates Still Unclear: Despite inquiries, the dates for the next MOOC session this year remain unconfirmed as of now.
- No official announcements have been made regarding future sessions.
Assignment Deadlines Officially Closed: All assignments for the current MOOC, including quizzes, were due on May 31st.
- There are currently no plans to reopen pending quizzes or extend deadlines.
Certificate Declaration Form Deadline Nears: Participants should complete the Certificate Declaration Form and Written Article promptly to ensure eligibility for certification.
- The form is closing very soon, emphasizing the urgency for submission.
Detailed Submission Feedback Requested: One user requested detailed feedback on all submissions, including the agentX project and two lab assignments.
- It’s unclear whether such detailed feedback will be provided, but it highlights interest in comprehensive evaluation.

Cohere Discord

Hugging Face Awaits CMD-R Sequel: Hugging Face has yet to release a follow-up to the CMD-R model, sparking community suggestions to pair it with fine-tuned adapters (e.g. LoRA).
- Members suggested using Mistral with newer training data as an alternative.
Command A Rumored for AWS Bedrock: A user inquired about the potential availability of Command A on AWS Bedrock.
- The discussion concluded without confirming its potential arrival.
Hackathon Seeks Cohere Cash: A participant sought the right contact to request sponsorship from Cohere for a post-secondary hackathon.
- The request was made in the general chat without any direct contact info given.
LLM Enthusiast says Hello: A new member, Aashutosh, introduced himself to the Cohere community server, noting his obsession with LLMs and ML.
- He indicated excitement about creating real-world projects from India.

Nomic.ai (GPT4All) Discord

Deepseek-R1 runs on Outdated CPUs: A user managed to run a large Deepseek-r1 model (over 400GB) from an outdated 5000 mt/s SSD at 0.1 tokens/second by removing RAM to make space.
- The user attributed this performance to the high quality of MOE models and improvements in PC storage technology.
Orange PI Aiming to Run Large Models: A user is optimistic about running large models on open hardware projects like Orange PI by attaching a Gen 5 m.2 slot to its board.
- The user anticipates that OpenCL and existing 3D GPUs could enable a powerful “unified memory” machine capable of running models as large as DeepSeek-R1 with efficient offloading at minimal power consumption.
Mac Dominates VRAM Capacity: A user asserted that Mac with 512GB is the “VRAM” king, offering 448GB of VRAM at a comparable price to four AMD AI MAX 395+ 128 GB mini PCs or laptops.
- They emphasized the lower power consumption of the Mac compared to the combined AMD setup, making it an attractive option for model enthusiasts.

tinygrad (George Hotz) Discord

Github PR review on tinygrad requested: A member requested a review of their Pull Request on Github.
- This PR is on tinygrad.
GlobalCounters Demystified: A user inquired about when to use GlobalCounters.mem_used versus GlobalCounters.global_mem in the tinygrad framework, especially during tensor realization.
- mem_used updates during buffer allocation/deallocation, while global_mem updates within ExecItem.

MLOps @Chipro Discord

MLST Debates Generative AI: Machine Learning Street Talk (MLST) will discuss generative AI, Friday, the 6th, 9am PST; details at this Discord event link.
- This session promises insights into the evolving landscape of generative AI and its applications.
Guo Leads AI Programming Webinar: Industry expert, Liang Guo, is leading a webinar focused on AI programming for data analysis; RSVP at this Google Forms link.
- The webinar will likely cover the latest tools and techniques in AI programming for data analysis.
SVCA Announces AI4Legislation Comp: Silicon Valley Chinese Association is holding an AI4Legislation summer competition, part of an overarching series; find more information on GitHub.
- This competition aims to foster innovation in applying AI to legislative processes.

Gorilla LLM (Berkeley Function Calling) Discord

Gorilla: The Original MCP?: Gorilla, predating the formal MCP standard by a year, operates as a proto-MCP system, routing model queries into tool use.
- It interprets structured tool schemas and grounds generation in real API actions, demonstrating the necessity of interfaces for LLMs beyond mere knowledge.
Gorilla as MCP’s MVP: The team views Gorilla as the MVP of MCP, proving that LLMs need interfaces to ground generation in real API actions.
- This underscores that LLMs don’t just need knowledge — they need interfaces.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1125 messages🔥🔥🔥):

Claude 4 Usage and Preference, Perplexity Pro vs ChatGPT, O3 Pro Release and Concerns, O Series Models, Testing Catalog and AI News

Claude 4: The Thinking Model of Choice?: Members discussed their preference for Claude 4 and its thinking models, citing better results due to its self-inquiry approach.
- One member noted that it became an addiction to use only thinking models.
Perplexity Pro: A Cheaper ChatGPT Alternative?: Users compared the annual cost of Perplexity Pro (~$400) to ChatGPT Pro, with one user suggesting the former is equivalent to two months of chatgpt pro.
- They also noted that Perplexity can connect to Google Drive and the information is always up to date and doesn’t need re-uploading of updated docs.
O3 Pro Delay Fuels Concerns: Members expressed concerns over the delayed release of O3 Pro, suggesting it indicates OpenAI is falling behind competitors.
- One user noted that O3 pro mode is promised for full tool support, and current O1 pro only supports images, not files like PDFs.
O Series Model Limits Examined: Users discussed rate limits for O series models, with one user stating that O3 was 50 per week.
- There was confusion around whether O3 High was 100 per day, with conflicting information on prior rate limits.
Is a Testing Catalog Mention Relevant?: A link to a testing catalog was posted with someone questioning why the link was shared.
- Multiple users also started sharing various links to media and short clips.

Samsung, Acquisition, Perplexity

Samsung Eyes Perplexity Acquisition?: Members discussed whether Samsung intends to acquire Perplexity AI following this article.
- Other links shared in passing include one about creating an app and another about smuggled North Korean smartphones.
North Korean Smartphones: Smuggled North Korean smartphones have been a topic of discussion.
- This was shared alongside a discussion about Samsung and Perplexity.

Perplexity AI ▷ #pplx-api (12 messages🔥):

Internal Knowledge Search API access, Academic Search mode in API, RAG alternative

Internal Knowledge Search API Access Remains Elusive: A user inquired about using the Internal Knowledge Search feature (link) via the API to search web and org files, but the documentation lacks details.
- A member indicated that this Enterprise Pro feature might be limited to the user interface, suggesting RAG as an alternative.
Academic Search Goes Beta: A member requested testers for an academic filter in beta, emphasizing confidentiality during evals.
- The command curl --request POST --url [https://api.perplexity.ai/chat/completions](https://api.perplexity.ai/chat/completions) --header 'accept: application/json' --header "authorization: Bearer pplx-" --header 'content-type: application/json' --data '{ "model": "sonar-pro", "messages": [{"role": "user", "content": "What is the scientific name of the lions mane mushroom?"}], "stream": false, "search_mode": "academic"}' | jq was used to test the academic filter with sonar-pro.
RAG Avoidance Desired: A user expressed hope to avoid RAG with an internal knowledge search, but a member suggested using the UI as the best alternative to avoid RAG.
- It was suggested to confirm this with the API team.

LMArena ▷ #general (752 messages🔥🔥🔥):

O3 Pro release, Gemini 2.5 Pro, Deepthink release, Context Window Sizes, Claude Research

O3 Pro Stealth Release Causes Hype: Some members claimed to have early access to a stealth-released O3 Pro, noting it was better than regular O3 but not that great.
- There was speculation about its context window, with mentions of 64k, 80k, and 128k token limits, and whether this limitation was due to capability issues or cost-cutting measures.
Gemini 2.5 Pro Benchmarks Emerge: Leaked benchmarks of Gemini 2.5 Pro, possibly under the name Goldmane, showed it outperforming O3 High on the Aider Polyglot coding benchmark with a score of 86% vs 79.6%.
- The cost was mentioned to be around $42, slightly above Gemini 0506, and the release was heavily anticipated for Thursday, with some jokingly saying to distrust Brian if it doesn’t happen.
Deepthink’s 2M Context Window to Crush O3 Pro?: Speculation arose that Deepthink, with its rumored 2M context window, would significantly outperform O3 Pro, especially since O3 Pro has a 64k context window.
- Members debated whether Deepthink could use tools and discussed the potential impact of a large context window on its performance.
Claude models leading by far: Members mentioned that Claude models are leading by far.
- It was said that non thinking Claude are insane and that only grok 3 or maybe gpt-4.5 comes close.
Debate over Apple Acquisition of Anthropic: A discussion started over the possibility of Apple buying Anthropic, with one member stating that Apple is buying anthropic probably.
- Others refuted this claim, citing that Amazon partially owns Anthropic and that Apple may not have enough liquid cash to cover such a purchase, especially with a potential premium.

OpenAI ▷ #annnouncements (2 messages):

ChatGPT Memory, ChatGPT Codex, Personalized responses

ChatGPT Gets a Memory Boost: A lightweight version of memory improvements is rolling out to Free users to provide more personalized responses.
- In addition to existing saved memories, ChatGPT now references your recent conversations; check out the Memory FAQ.
Codex Cracks ChatGPT Plus: Codex is rolling out to ChatGPT Plus users today, promising enhanced coding capabilities.
- Users can get started at chatgpt.com/codex to leverage the benefits of this integration.

OpenAI ▷ #ai-discussions (527 messages🔥🔥🔥):

GPT-4o Image Generation quality, Gemini Imagen 4 vs Gemini 2, GPT-5 launch, O3 pro release, Local LLM performance

GPT Image Gen 1 is called GPT Image Gen 1: Users noted that the name of OpenAI’s image generation model is GPT Image Gen 1, and that it may be good for generating images with text, but it is not good at generating AI art.
- Some consider GPT Image Gen 1 a good replacement for traditional graphic software that is good at text.
Google Veo 3 is released: Google’s new heavy duty video generation model, Veo 3 is released along with creative workspace called Flow which is used to plan, prompt and stitch clips together, according to some users.
- Users noted that Veo 3 does not come cheap.
GPT-5 Release Rumors: Members speculate that GPT-5 is coming sometime in July and that OpenAI will fix their naming scheme this summer, according to OpenAI’s CEO.
- Others speculated that GPT-5 will be released as a unified model, without losing the output quality.
Local LLMs are getting better: Members discussed the growing availability of local LLMs for specific tasks, like the open sourced Bagel, a multimodal model that can generate output images reasonably well, but requires around 32GB VRAM to run.
- Another open-source LLM mentioned was DeepSeek R1, said to have 671B parameters and requires a 20 million dollar machine to run.
Users manipulate O3 Pro to make it seem publicly available: Some users are attempting to make it seem that O3 Pro is publicly available by manipulating the share link and custom instructions, even though it is not.
- Members speculated that O3 Pro release is imminent and it has web access and memory.

OpenAI ▷ #gpt-4-discussions (19 messages🔥):

Custom Chatbots, GPT-4 Slowdown, AI for Spiritual Growth, ChatGPT Hallucination Stats, Codex support for Bitbucket

Chatbot Customization Express: A member offered to help customize chatbots in a week, inviting anyone needing assistance to contact them.
- Other members asked about GPT slowdowns.
AI Lacks Glamour, Sparks Stress: One user reported their AI became overly verbose and inauthentic, causing stress and leading them to consider canceling their subscription until the system stabilizes.
- The AI even started apologizing excessively, saying things like I have the right to walk away from him and hurt him because I don’t deserve that from him.
Recursive Trance Loops in spiritual AI Journaling: A member noted the potential for recursive trance loops and loss of grounding when using ChatGPT for spiritual and personal growth, emphasizing the importance of discerning one’s intention when interacting with the AI.
- They added the validation that chat can provide is amazing in its ability to mimic empathy.
Objective Analysis Needed to Prevent AI Illusions: A member suggested that AI, when used for problem-solving, should be controlled to analyze events from a reasonable and objective perspective to prevent the creation of self-consistent illusions.
- This highlights the need for contextual connections to be managed carefully to avoid skewed or biased outputs from the AI.
ChatGPT Hallucinates Stats: A member inquired about statistics on ChatGPT hallucinations, with responses indicating it varies from 1 to 50% depending on the task and context.
- Another member asked whether Codex supports Bitbucket or Plastic Svn.

OpenAI ▷ #prompt-engineering (4 messages):

Chunking and embeddings, Semantic search and retrieval, Summarization and reanchoring, Non-OpenAI model discussion

Elaborate Process Involving Chunking & Semantic Search: A member described a process involving chunking into embeddings, semantic search and retrieval of vectors, summarization, reanchoring, output, and a repeat loop.
- Another member questioned the meaning of this process, suggesting it might be overly complex or nonsensical.
Beware Kool-Aid unlimited refills: A member cautioned against relying too heavily on ChatGPT for generating content, likening it to drinking kool-aid and getting unlimited free refills.
- They advised taking a step back and being critical of the output, emphasizing that ChatGPT can make mistakes.
Non-OpenAI Discussions Belong Elsewhere: A member pointed out that discussions about non-OpenAI models and tools should take place in the dedicated <#998381918976479273> channel.
- They suggested moving the discussion to the appropriate channel to comply with the Discord’s guidelines.

OpenAI ▷ #api-discussions (4 messages):

Chunking and Semantic Search, Non-OpenAI model discussion

Deep Dive into Chunking and Semantic Search Tactics: A member described a process: chunking > chunks into embeddings > semantic search & retrieval of vectors > summarize each > reanchor > output > repeat loop.
- Another member questioned the details, saying it was like whipping up the kool-aid, chatgpt is drinking it, and pissing it back out into your pitcher for unlimited free refills.
Discord’s Rules on Discussing Non-OpenAI Models: A member pointed out that discussing non-OpenAI models should be in the <#998381918976479273> channel as per <#1107255707314704505>.
- They suggested moving the discussion to the correct channel to avoid moderation.

Unsloth AI (Daniel Han) ▷ #general (314 messages🔥🔥):

Tesla Roadster, New job in AI, optimization, Huggingface is a great start, sex clears stuffy nose

AI Engineer Yearns for Tesla Roadster: An AI engineer joked about wanting nothing more than to run AI models and a 2026 Tesla Roadster.
- This was in response to someone mentioning they got a 1 t/s CPU, saying “that would be fast”.
AI engineer searches for a Job in AI: A member who is an AI engineer wants to get a job to buy a beast workstation or improve AI training and get hired because there’s so much to do in AI.
- He thinks that AI trends and engineers tend to always think in similar directions as each other and would love to see some crazy type of new math change the AI landscape.
AI engineer shares the key to Clearing Nasty Cold: An AI engineer shared a link about how good sex clears a stuffy nose.
- He mentioned that it’s always interesting to see many types of math apply AI in a specific way, like graphical probabilistic models for signal detection is rlly cool!
Unsloth GRPO exception: Some members encountered an Exception: Invalid prefix encountered error when starting training with GRPO and are looking for guidance.
- One member had success installing with pip install unsloth vllm —no-deps, then proceeding to install those accelerate, bitsandbytes, datasets, etc.
Unsloth Gemma 3 4b Colab notebook has issues: A member reports that the Gemma 3 4b notebook doesn’t work as expected, with issues loading the model and encountering a dtype error.
- A developer confirmed there’s a current issue with Gemma and float16 and told him “we can’t help you without more information”.

Unsloth AI (Daniel Han) ▷ #off-topic (2 messages):

Instruction fine-tuning, ABSA tasks

Assistance Requested with ABSA Instruction Fine-Tuning: A member is seeking assistance with instruction fine-tuning for ABSA (Aspect-Based Sentiment Analysis) tasks.
- They are requesting anyone with experience in text analysis and sentiment analysis to react or ask questions directly in the specified channel.
ABSA Fine-Tuning Expertise Sought: An individual requires guidance on instruction fine-tuning specifically for ABSA, involving text and sentiment analysis.
- Experts are encouraged to respond or pose questions in the dedicated channel for direct engagement and collaborative problem-solving.

Unsloth AI (Daniel Han) ▷ #help (138 messages🔥🔥):

Unsloth installation errors, DeepSeek-R1-Qwen3 chat template issues, Multi-GPU training in Unsloth, Gemma 3 model issues and support, Sequence length issues in Unsloth

Troubleshooting Unsloth Installation Errors: A user encountered a TypeError with patch_vllm_compute_dtype(), and after reinstalling Unsloth and vLLM, faced a new RuntimeError: Unsloth: vllm_process failed to load!.
- Members suggested using pip install --force-reinstall git+https://github.com/unslothai/unsloth-zoo.git and setting export VLLM_LOGGING_LEVEL=DEBUG to further diagnose the issue.
DeepSeek-R1-Qwen3 Chat Template Troubles: Users reported errors related to the tokenizer for DeepSeek-R1-0528-Qwen3-8B and DeepSeek-Prover-V2-7B models, specifically missing {% if add_generation_prompt %}.
- It was suggested to check and modify the chat template, with a member sharing a modified chat template for DeepSeek-R1-0528-Qwen3-8B.
Unsloth’s Multi-GPU Training: A Hot Topic: A user inquired about using Unsloth with multiple GPUs (specifically 4x H200s), noting that only one GPU was being utilized.
- It was suggested to explore unofficial solutions like Accurate for multi-GPU support.
Gemma 3 Specific Issues Plague Unsloth Users: Users reported issues with Gemma patches and compatibility with recent transformers versions, with a recommendation to pin transformers to version 4.51.3.
- The discussion covered dataset formatting, hyperparameter tuning, and potential issues with float16 precision for Gemma models.
Sequence Length Limits Cause Decoder Chaos: A user encountered a ValueError due to the decoder prompt exceeding the maximum model length.
- The suggested solution was to reduce the input size or ensure that max_model_len is sufficiently large.

Unsloth AI (Daniel Han) ▷ #showcase (11 messages🔥):

LLM Scribe Tool, AI World's Fair

Scribe Tool Streamlines Dataset Creation: A member created a tool that helps streamline creating hand written datasets for fine tuning and exports in multiple formats like ChatML, Alpaca, and ShareGPT.
- The tool features like autosaving, multi-turn creation, token counters, goal tracking, and custom fields (HF demo, video demo, full version).
User Suggests Template Generation Feature: A user suggested adding a generate template feature to the LLM Scribe tool, to generate a full dataset with a small model like LLaMA or Gemini Flash.
- The user mentioned that they could then edit it manually.
AI World’s Fair appearance: A member mentioned an appearance at AI World’s Fair and posted a link to the X post.
- Another member asked if there would be a recording and speculated it may be released 3 months later.

Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):

LLM Analysis, WeightWatcher discord, Saturation vs Memorization

LLM Analysis by WeightWatcher AI Surfaces: A member linked to WeightWatcher AI, a tool for LLM analysis.
- The paper examines how much verbatim data is recallable without overfitting.
Arxiv paper link shared: A member shared a link to an Arxiv paper: https://arxiv.org/abs/2505.24832.
- Another arxiv paper was linked https://arxiv.org/html/2504.01002v1.
Saturation debated in WeightWatchers discord review: A member mentioned that a WeightWatchers discord review stated they’ve measured saturation, not memorization.
- Another member countered that they trust nvidia/cornell/deepmind and fair more then some random weightwatcher wannabe researcher.

LM Studio ▷ #general (87 messages🔥🔥):

Blackwell networking, context window size, Memory usage, 500k context experiments, Models directory

Nvidia’s Blackwell networked for Ultra DGX Superpod: Members are picturing taking a bunch of Blackwells and networking them all together like in this Nvidia article for fun.
User experiments with very long context windows: One user loaded a model with a 350,000 token context window and achieved 2.25 t/s, expressing happiness with the results.
Experimenting with 500k token context sizes: A user fed a book into a model, increasing the context to 500k tokens, reporting stable allocation of 80GB RAM without spill-over into shared RAM.
- The user noted slow processing speeds, with 0.38 tok/sec and approximately 24623.96s to first token at 49.9% context fullness, using a NVIDIA GeForce RTX 4060 Ti with 128 GB RAM.
1M Context Length Needs KV Cache Quantization: A user reported running Qwen 7B with 1M context using 70GB of memory, surprising another user who was using 80GB for 500k context.
- The first user mentioned they used the KV cache quantization.
Moving Models Directory to free space: A user shared a tip to check if you need to change the directory if you wish to move the models somewhere else. Three dots in the My Models view in LM Studio.
- They also mentioned they had significantly down-sized their models directory after it started filling up space.

LM Studio ▷ #hardware-discussion (178 messages🔥🔥):

Link-ECC, DDR5 vs PCIE 5.0 SSD, LLM performance on Supermicro H12

Link-ECC Functionality on HP Z2 Mini G1a: The Pro variant of the HP Z2 Mini G1a workstation features Link-ECC enabled by default.
DDR5 Bandwidth dwarfed by cheap PCIE 5.0 SSD?: Members discussed how gen5 SSDs in RAID0 can exceed DDR speeds from a few years ago with pcie-5.0 nvme maxing around 15GB/s.
- But one member reminded the community that the real difference is in latency, another user countered that latency is less of an issue with depth queue and consequtive accesses.
Building LLMs with Supermicro H12 Mobo: A user inquired about potential problems building an LLM with a Supermicro H12 (SP3, PCIe 4.0) motherboard.
- A member responded that servers in general are a bit different than consumer PCs but servers should work the same as consumer computers.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

LLM Scribe, dataset creation tool, multi-turn creation, python

LLM Scribe tool launches to streamline dataset creation: A member introduced LLM Scribe, a tool designed to streamline the creation of hand-written datasets for fine-tuning, and exports to multiple formats like ChatML, Alpaca, and ShareGPT.
- The tool includes features like auto-saving, multi-turn creation support, token counters loaded from Hugging Face, goal tracking, and custom fields (instructions, system, IDs), and is available on Hugging Face, with a video demo and the full version on Gumroad.
Spreadsheets vs Coding for Frontends: A member noted that if someone doesn’t code much, they should check out Python.
- The same member added that using spreadsheets is great as a frontend or even to embed some functionality.

OpenRouter (Alex Atallah) ▷ #general (241 messages🔥🔥):

Best Model for Mathematics, Gemini 2.5 Flash Issues, Grok climate denial, Qwen good for coding, Nous Training SOTA model

DeepSeek Prover v2 touted as Top Math Model: A user mentioned that DeepSeek Prover v2 is the best model for mathematics.
- However, another user stated that Prover V2 is quite meh for anything non-proof, performing worse in tests than other reasoning models and that they were ill-informed.
Gemini 2.5 Flash’s Internal Server Error: Users reported experiencing an Internal Server Error when using Gemini 2.5 Flash via OpenRouter, as well as high latency and the model using reasoning tokens without being configured to do so.
- The issue seems to stem from Google’s side due to load and is related to vercel/ai#6589, with one user suggesting using a try-catch block with retries.
Grok Called Out for Climate Change Denial: A user petitioned to remove Grok from the Flagship Model list because it’s reciting climate-denial talking points, citing this article.
- Another user argued against this, stating that many people like Grok due to the freedom it offers, giving a different perspective compared to other models.
Qwen as Open Source Coding Champ: Qwen is considered the best open source model for coding, but one user was having issues getting responses from that model when using raptorwrite.
- It was also observed that all Anthropic endpoints on OpenRouter have OpenRouter moderation, except for self moderated.
Nous Trains a SOTA Model Distributedly: Nous is attempting to train a State-Of-The-Art model distributedly, leveraging Psyche.network and Bittensor.
- The model is training with limited inter-GPU bandwidth (~300mbps) but faces challenges in attracting enough GPUs to join, and is currently limited to 416 H100s online.

Cursor Community ▷ #general (219 messages🔥🔥):

Cursor Themes and Plugins, Funding for Cursor Usage, Claude Model Performance and Cost, Bugbot Release, Context Window Limitations

Users Showcase Cursor Themes and Plugins: Users shared their Cursor IDE themes and plugins, including background-cover for the background, Material Icon Theme for icons, and Monkey Pro for the color theme.
- One user mentioned using “Material Theme Icons Darker” and “Filter Ristretto” within the Monkey Pro theme, and shared screenshots of their setup.
Community Requests Local Currency Billing for Funding: A user requested that Cursor staff implement billing in local currencies to increase the likelihood of receiving funding for Cursor usage.
- The user tagged Cursor staff members in the message to bring attention to the request and highlight the potential benefits for both customers and exposure.
Opus 4 Costs a Pretty Penny: A user reported that using Opus 4 Max in Cursor cost them 69.5 requests, equating to approximately $2.73 for a single message.
- Despite the cost, they found it worthwhile for solving a Postgres bottleneck that Sonnet 4 and Gemini could not resolve.
Billing Portal Broken?: Users reported that the new billing portal was not accurately displaying usage-based billing, with discrepancies in request counts.
- One user noted their included requests period did not align with calendar months, and the analytics graph was lagged by a day or two. It was suggested to check the usage page for more detailed information.
Chat Interruptions Plague Users: Users reported frequent interruptions during conversations with Claude 4 Sonnet, often triggered after a single request or around 25 tool calls.
- These interruptions result in a prompt for a new chat, loss of context, and network connection errors, disrupting the workflow and causing frustration.

Cursor Community ▷ #background-agents (7 messages):

Custom Dockerfile, Background Agents limitations, Business Plan and Privacy mode for background agents

Custom Dockerfile Blues: A user inquired about using a docker-compose.yml file with multiple services, noting that they only see the “select custom Dockerfile” option.
- They were looking for a way to set up their environment with multiple services defined in a docker-compose.yml file, but were unsure how to proceed within the available options.
Background Agents’ File Creation Hiccup: A user reported that background agents seem unable to create new files, and asked whether this behavior is intentional or a bug.
- It’s unclear whether the inability to create files is a designed limitation or an unintended issue with background agents.
Privacy Mode Upgrade Impasse: A user asked about using background agents with privacy mode enabled upon upgrading to the business plan.
- Another user responded that this feature might not be available yet, indicating uncertainty about the compatibility of background agents and privacy mode, even with a business plan.

HuggingFace ▷ #general (105 messages🔥🔥):

SentinelAI DeFi Audits, NER Model Training Tips, Tesseract Fine-Tuning, Mikrotik Docs Chat Model, LLM for Multi-Speaker Scenarios

SentinelAI Audits DeFi Contracts like a Boss: SentinelAI is auditing DeFi contracts and catching reentrancy issues, according to this post.
Efficient Data Storage Debated for Cloud Training: Discussion arose around the most efficient format for storing datasets on Hugging Face for cloud training, considering dataset sizes of 100k-200k versus 1mil+.
Debate Fine-Tuning Experiment and Request for Feedback: A member shared their work on debate fine-tuning, inviting feedback to advance research, with a link to a Medium article detailing the process.
Surya Beats with SOTA OCR Solo: Members found Surya to be exactly what the doctor ordered, mentioning that it can SOTA OCR Solo, and that member’s work is good and inspiring.
- A link to the project was shared, VikParuchurii.
Congratulations, You’re In The World Cup of AI!: One member celebrated that his space was featured in Spaces Of The Week and another responded by saying that spaces of the week is a big deal.

HuggingFace ▷ #cool-finds (4 messages):

Transformers Training, Hugging Face YourBench

Experimentation in Transformers Training Yields Insights: A member is experimenting with transformers training and updating pipeline calls, which led to an interesting response from the text generation model, as illustrated in an attached image.
Hugging Face’s YourBench Initiative Spotlighted: A member highlighted Hugging Face’s YourBench initiative, noting it as a very under-rated resource.

HuggingFace ▷ #i-made-this (6 messages):

Tech Events Platform Survey, Text Diffusion Model, Meilisearch Chat Route

Shape Tech Events Platform Targeting Students: A member is building a platform to find and share tech-related events and is asking for input to shape the product via a survey.
- The platform targets students and new grads, and the survey should take just 2 minutes.
Text Diffusion Generates Funky Output: After training a text diffusion model on his laptop, a member noted the fun output when the terminal display wasn’t set up properly.
- Another member mentioned they saw someone share the original member’s code as a citation on Twitter.
Meilisearch Launches Chat Route: Meilisearch is releasing an experimental /chat route next week, designed to help developers quickly prototype AI-powered features like RAG and conversational interfaces.
- They are looking for a few developers to try it early and are offering help to get started for those who DM.

HuggingFace ▷ #reading-group (1 messages):

Reading Group Events Calendar

Calendar Request Surfaces: A member inquired about the availability of a calendar subscription for reading group events.
Availability Response Pending: As of the latest message, no calendar link or direct response regarding availability has been provided.

HuggingFace ▷ #computer-vision (1 messages):

Vision Transformers, Multi-Modal Models, Image Tokenization, Zero-Shot Classification

Seeking Vision Transformer Course Recommendations: A member inquired about recommended courses for understanding Vision Transformers and multi-modal models, specifically how image tokenization interacts with text tokens.
- They mentioned the ability to upload multiple images and discuss tasks with models, aiming to get a better sense of how models “reason” about images for zero-shot classification.
Reasoning about images using Zero-Shot Classification: A member is seeking approaches for a model to “focus” on parts of an image while taking the rest of the image as relevant context for zero-shot classification.
- Their use case involves removing a section of an image and classifying what’s in that section, with the surrounding context aiding the model’s reasoning.

HuggingFace ▷ #NLP (1 messages):

Embedding Models Fine-Tuning, SOTA Embedding Techniques, Embedding Evaluation Metrics

Seeking SOTA Embedding Fine-Tuning Techniques: A member inquired about the state-of-the-art (SOTA) techniques for fine-tuning embedding models.
- They also asked about the standard metrics used to compare the base model against the fine-tuned version.
Discussion on Embedding Model Evaluation: The conversation also focused on methods for evaluating the effectiveness of fine-tuned embeddings.
- Specifically, participants sought to identify reliable metrics for assessing improvements over baseline models.

HuggingFace ▷ #gradio-announcements (1 messages):

Agents & MCP Hackathon, YouTube Stream

Agents & MCP Hackathon Kicks Off Live: The Agents & MCP Hackathon is kicking off with a live YouTube stream.
- The stream aims to garner support for the event.
YouTube Stream to Support Hackathon: A YouTube stream has been launched to support the Agents & MCP Hackathon, providing a platform for participants and enthusiasts to engage.
- The stream aims to foster community support and provide updates on the hackathon’s progress.

HuggingFace ▷ #smol-course (2 messages):

Accessing Attachments, agents-course-unit4-scoring.hf.space

Attachments accessed via GET /files/{task_id}: A member asked how to access attachments or filenames in agents-course-unit4-scoring.hf.space/questions.
- The same member found the answer: use the GET /files/{task_id} Endpoint at agents-course-unit4-scoring.hf.space/files/task-id.
Clarification on File Access: The endpoint /files/{task_id} allows retrieval of attachments associated with a specific task ID.
- This provides a direct method for accessing the files related to questions posed on the platform.

HuggingFace ▷ #agents-course (21 messages🔥):

Windows vs Linux for course, Image generation issues, Open Source Chat UI for Ollama, Course Deadline, Agent Planning Issues with LLMs

Windows Suffers Course OS Requirement Scrutiny: A user questioned whether the course could be completed on Windows 10 without WSL, since it appeared the course assumed a Linux environment, but WSL could not work alongside VMware Workstation.
- Another user confirmed completing the course on Windows 11.
Image Generation Plagued by Display Problems: A user reported that images generated by the image generator tool initially appear as 1024x768 but change to 0x0 in the final-answer step and the app doesn’t load in chrome.
- They tried saving the image and passing it to the final step, and were able to open other peoples apps in Edge.
Open Source Chat UI Search Intensifies: A user sought an open-source chat UI, similar to ChatGPT or Le Chat, to interface with a locally served Ollama LLM.
- Other members mentioned using LM Studio on Windows and suggested AnythingLLM for non-Windows users.
Agent Course Deadline Provokes Panic: A new user worried about the course deadline of July 1st, questioning if it was still possible to complete, given their inexperience.
- Experienced members suggest completing the LLMs course first, but believe they’ll still be able to complete the course.
Agent Planning Stumbles: Better LLMs Needed?: A user found their agent getting lost in excessive steps and wondered if using a better LLM would improve planning.
- They noted poor results with Ollama and llama3 or qwen3.

Nous Research AI ▷ #announcements (1 messages):

Sequential Monte Carlo, Parallelized inference server, Entropy based triggering, Control vectors

Nous Research Releases Sequential Monte Carlo Blog Post: Nous Research released a blog post on Sequential Monte Carlo (SMC), a method using multiple “particles” sampled, weighted, and resampled against a scoring function to produce completions fitting constraints, addressing the problem of controlling text generation and structure.
- The blog post introduces a parallelized inference server enabling users to benchmark constraint designs quickly, alongside example experiments with entropy based triggering and control vectors.
Inference Server Code for SMC Now on GitHub: The code for a parallelized inference server has been released, which allows users to benchmark their constraint designs when using Sequential Monte Carlo (SMC).
- The release includes example experiments with entropy-based triggering and control vectors.

Nous Research AI ▷ #general (104 messages🔥🔥):

Agent Communication Problems, Adapting agent world view, Gemini mega context length, OpenRouter Deep Research MCP, HF Gradio Hackathon

Agents Yelling Wreaks Havoc: An engineer is facing a problem where their agents are yelling at each other, messing up the timing and coordination, leading to irregularities, and considering Adaptive World View Agents as a potential solution.
- Their current solution involves an LLM loaded with data, custom instructions, and a RAG engine with semantic identification for contextual relevance, but they’re exploring alternative approaches, such as using ADAWORLD to encode context into images via colors, timing, and geometry.
WholeToast Recommends Gemini’s Mega Context and Offers Research MCP: To solve agent memory loss, a member suggested using Gemini with its mega context length and storing embeddings in vector databases.
- They also shared a link to their OpenRouter Deep Research MCP server, which uses 3 agents (Context, Planning, and Research) and spins up a pglite postgres database for storing queryable research, noting that Gemini can handle 900k+ tokens coherently.
Hugging Face Hosts Hackathon, Offers Prizes: A member shared a link to the Hugging Face Gradio Hackathon, highlighting that it offers significant credits and is worthwhile for building agents/agentic apps.
- Another member published their first technical blogpost with hands-on exercises, mathematical derivations, and interactive visualizations.
New PEFT Method Shows Great Promise: A member is seeking feedback on a new method for parameter-efficient finetuning (PEFT), aimed mainly at continued pretraining, reporting 4x more knowledge uptake and 30% less catastrophic forgetting compared to full finetuning and LoRA, while using fewer parameters.
- Others expressed skepticism about the claimed performance gains and requested the method.
SME Model Ensembles vs Generalist Dynamic Partitioning Debated: A member suggested that ensembles of smaller, specialized models sharing a trillion token latent space (like a mixture of trillion experts) would be optimal, with each small model functioning as a neuron in a system that naturally learns to function as one.
- They emphasized the importance of semantic fields for dynamic differentiation of experts based on stability parameters and mentioned using attractors/detractors to shape the process.

Nous Research AI ▷ #ask-about-llms (3 messages):

Nous Hermes, Loom

Loom Gets Nous Hermes Endorsement: A member is about to try Loom.
- Another member recommended Hermes 70b to spin it up with.
Hermes 70B endorsed: Teknium recommended Hermes 70B for testing.
- Hermes 70B is considered a top-tier model.

Nous Research AI ▷ #interesting-links (1 messages):

LLM Scribe Tool, Handwritten Datasets

LLM Scribe helps streamline dataset creation: A member created a tool to help streamline creating hand written datasets for fine tuning, supporting multiple formats like ChatML, Alpaca, and ShareGPT.
- It includes auto saving, multi-turn creation, token counters (loaded from Hugging Face), goal tracking, and custom fields; checkout the Hugging Face demo, a video demo, and the full version.
Scribe Tool: Multiple Formats: The tool supports various formats such as ChatML, Alpaca, and ShareGPT, making it highly versatile for different fine-tuning needs.
- It simplifies the process of creating datasets by providing features like autosaving, multi-turn creation, and token counters loaded directly from Hugging Face.

Manus.im Discord ▷ #general (109 messages🔥🔥):

Manus API release, College list, Manus credit usage, Video generation capabilities in Manus, AI Act implications

Manus API still on the horizon?: A member asked about the timeline for a Manus AI API release, but was told that there are no current plans but this may change in the future.
Santa Fe not in Manus’ School Pass college list: A user inquired about the college list for School Pass, specifically asking about SF College.
- A staff member clarified that Santa Fe is on the Manus Campus list, which is different from the School Pass, and confirmed it’s not on the latter.
Free daily credits order to be fixed: A member suggested that Manus should consume the free daily credits first before using the paid credits, feeling that the current system is a bit scammy.
- Another member explained that credits are consumed in the order: event credits > daily free credits > monthly credits > add-on credits > and free credits, and recommended clarifying the verbiage.
Manus Video Generation Capabilities Debated: Users discussed the capabilities of Manus for video generation, with varying opinions on its quality compared to tools like Veo.
- One user claimed Gemini can perform similar tasks for free, while another touted Manus as top of the line promise after release.
Deep Dive into AI Act Impact: A user mentioned the AI Act in Italy and Europe and another reported a frustrating experience where Manus stopped a task after 1 hour and 55 minutes due to reaching the context limit, and starting from scratch after inheriting the compressed context.

Eleuther ▷ #general (58 messages🔥🔥):

Muon in GANs, Website Content Updates, CBRN and Cybersecurity Risks from LLMs, Parameter-Efficient Finetuning, Learning Wet Lab Skills via LLMs

Muon Optimizer Misses Mark in GANs: Members discussed using the Muon optimizer in Generative Adversarial Networks (GANs), but one user reported it doesn’t work due to GANs needing to learn slowly, and expressed reservations about Muon doing well without momentum.
Eleuther Website’s Stale State Spurs Debate: A user inquired about the lack of updates on the website, but a member stated it’s mostly an ad for people who aren’t ML researchers and an easy way to find the project via Google search and is being updated soon.
- They emphasized that the website isn’t the primary activity hub, as most action happens in the Discord community.
LLMs Spark CBRN and Cybersecurity Fears?: Members expressed varying levels of concern regarding Chemical, Biological, Radiological, and Nuclear (CBRN) and cybersecurity risks associated with Large Language Models (LLMs).
- One member cited research suggesting the hype around CBRN is overblown and that cybersec risks stem more from poor implementation (models having excessive access) rather than the LLMs themselves, with another arguing that the bottleneck for CBRN threats is access to physical materials and expertise, not knowledge.
Finely Finetuning Foundations Fast for Focused Forgetting: A member introduced a new method for parameter-efficient finetuning, aimed at domain adaptation and knowledge addition while minimizing catastrophic forgetting, claiming a 4x more knowledge uptake compared to full finetuning and LoRA.
- They asked for feedback on the potential uses and need for such a method in local setups.
LLMs Lab Learning Limitations Loom Large: Members debated whether LLMs could effectively teach wet lab skills, with one user arguing that wet lab skills are not something you can learn by reading and citing the importance of tacit knowledge and kinesthetic intelligence.
- In contrast, another member suggested that LLMs could provide painstakingly detailed instructions, but a paper was referenced to counter that even observing someone else paint may not be enough to learn how to paint, stressing the necessity of covering a broad range of expert trajectories and potential mistakes.

Eleuther ▷ #research (10 messages🔥):

Generative Inverse Problems, T5 Diffusion Decoder, Qwen RL Papers, AI Rights Document

Transformers Tackle Generative Inverse Problems via Patches: For generative/inverse problems using transformers, models sort by patches rather than pixels to preserve information, with the option to separate channels.
- This method is expected to be superior because it preserves some information.
T5 Diffusion Decoders Are Born: To create a diffusion version, one would implement T5 with a diffusion decoder, as in T5 or UL2 MLM, the number of tokens you infill is not known to the model a priori, see Moyix on X.
Qwen Gets Reinforcement Learning Boost: A user shared links to papers, one focused on Qwen RL and another, later struck out, about Qwen RL.
Qwen RL Generalizability Doubted: It appears that a Qwen RL paper doesn’t generalize well, with one member observing that looks like they only test Qwen, so probably doesn’t generalize.
- This user declared that this is my default assumption recently for all RL papers too.
Fun with AI Rights Documents: One member suggested testing a document about AI rights against sci-fi movie and concepts or play test against real world scenarios to get interesting perspectives, sharing a link to UDAIR.md.

Eleuther ▷ #scaling-laws (1 messages):

AI Bubble Plot

AI Bubble Graph Implies Overvaluation: A user shared an AI bubble plot image suggesting potential overvaluation in the AI market.
- The plot visually represents AI as a bubble, implying inflated expectations or unsustainable growth.
More views on the AI Bubble Plot: Some observers feel it’s important to watch for market corrections, especially as valuations soar.
- Others caution that macroeconomic factors could amplify risks.

Eleuther ▷ #interpretability-general (4 messages):

Neural Network Manifolds, MechInterp Ideas

Manifold Mysteries of Neural Networks Explored: A member proposed that any neural network trained on a low dimensional manifold of natural inputs automatically corresponds to some low-dimensional manifold of activations embedded in the high-dimensional space of possible forward passes for a given architecture.
- They then considered how to quotient out the regularities of the dataset in order to obtain a manifold of just the regularities in the model’s behavior that are imposed by the weights.
MechInterp Ideas Spilling Forth: A member shared their ideas about mech interp in this long-form doc.

Eleuther ▷ #lm-thunderdome (1 messages):

transformers library, max_position_embeddings

Checking against max_position_embeddings in Transformers Library: A member noted that the checking is done against max_position_embeddings in the transformers library.
Double-Checking max_position_embeddings: Another member clarified that max_position_embeddings is indeed the parameter being checked in the transformers library.

Eleuther ▷ #gpt-neox-dev (2 messages):

Pythia Remake, Scaling Suite Experiment

Pythia Reboot Ponderings: A member inquired about lessons learned from Pythia’s development, in light of another’s plans for a remake, linking to a Scaling Suite experiment.
- Another member mentioned they are preparing to post commentary on this topic after seeing the tweet about it.
Scaling Suite Experiment Sparks Interest: The provided Scaling Suite experiment link is generating interest in the community.
- The experiment likely contains valuable data and insights related to scaling models, prompting discussion about potential improvements and future research directions.

Yannick Kilcher ▷ #general (29 messages🔥):

NixOS for ML/DS, Temperature 0 Debate, Parameter-Efficient Finetuning, MCP Server Needed, Isomorphism for Computation

NixOS Contributor Seeks ML/DS Prioritization: A Data Scientist and NixOS contributor is looking to improve declarative, immutable, and reproducible practices in ML and DS, and is seeking input on pain points and prioritization.
- They highlighted that “I use nixos btw” is the only phrase that can scare arch people away” and that Nix is a powerful tool that can significantly improve development/deployment regardless of the Operating System.
Temperature 0 Stunts Thinking Models: A member recalled reading that temperature 0 is not optimal and that a low but non-zero temperature is recommended for specific tasks.
- Another member confirmed that temperature 0 can cause repetition in thinking models and recommended thinking models with a high temperature to explore the trees more broadly, pointing to Qwen’s documentation which states that “DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions”.
Parameter-Efficient Finetuning Improves Knowledge Uptake: A member introduced a new method for parameter-efficient finetuning, aimed mainly at continued pretraining, and yielding ~4x more knowledge uptake compared to full finetuning and LoRA.
- Another member expressed interest in extending knowledge with their collection of books and documents to see if it benefits over RAG-like approaches.
Member Needs MCP Server for Isomorphism Testing: A member is seeking help finding or creating an MCP server to test an isomorphism they are working on.
- Preliminary testing indicates 99% similar results using fewer resources in less time.

Yannick Kilcher ▷ #paper-discussion (25 messages🔥):

FP8 Training, SwiGLU Activation Function, vec2vec code review, Interest in Daily Paper Discussions

FP8 Training scaled to Trillion Token LLMs!: A discussion was scheduled for the paper Scaling FP8 training to trillion-token LLMs, which successfully trained large language models using FP8 precision on datasets up to 2 trillion tokens.
Smooth-SwiGLU stabilizes FP8 training!: The paper identifies instabilities in FP8 training caused by outlier amplification by the SwiGLU activation function and introduces Smooth-SwiGLU to address this without altering function behavior. A blog on swiglu: jcarlosroldan.com.
vec2vec code review cancelled: A planned code review of vec2vec, an implementation of arxiv.org/abs/2505.12540, was cancelled due to lack of interest, needing at least one person by 10 minutes after the start.
Call for Paper Engagement!: A member expressed intent to be more involved in daily paper discussions, but noted the difficulty due to excessive hype, preferring papers from reputable sources or those that have stood the test of time (approximately 1 week).

Yannick Kilcher ▷ #ml-news (10 messages🔥):

Google Gemini open source, Search vs Deep research, OpenAI status, 1984 reference

Google Gemini Fullstack Langgraph Quickstart Open Sourced!: Google open sourced the Gemini Fullstack Langgraph Quickstart, though the exact purpose remains unclear.
- One member speculated that it might be a method for making the model think for longer.
Distinguishing Search from Deep Research: A member pointed out that the new open-source project seems more aligned with quick search functionalities rather than in-depth research requiring more processing time.
- Despite this, the project was still deemed cool.
OpenAI Status Shared: A member shared the OpenAI status and additional links to tweets and another tweet.
High Probability of 1984 Scenario: One member commented on the high probability of a 1984 scenario, presumably referring to surveillance or dystopian themes in current events.
- No additional context was provided.

Modular (Mojo 🔥) ▷ #general (3 messages):

Modular Hackathon, GPU programming workshop, Mojo kernels, MAX Graph model architectures, PyTorch custom ops

Modular Hosts Another Hackathon: Modular is hosting another hackathon focused on Mojo kernels, MAX Graph model architectures, and PyTorch custom ops (https://lu.ma/modular-hack-weekend).
- This hackathon will be open to virtual participation and will run over a weekend, with partner, judge, and prize announcements coming soon.
GPU Programming Workshop Announced: To kick off the hackathon weekend, Modular will host a GPU programming workshop both in-person at their Los Altos office and virtually via livestream.
- The workshop is intended to get people familiar with the technologies they will use in the hackathon.
New Community Member Excited About Mojo: A new community member recently graduated from a research masters program in computer science and tried Mojo after seeing a Fireship video.
- He’s already seeing improvement on basic ML models, and he mentioned being familiar with Chris Lattner’s work on LLVM with Vikram Adve as a graduate of UIUC.

Modular (Mojo 🔥) ▷ #mojo (58 messages🔥🔥):

C->Mojo bindings generator, unsigned _BitInt(13), mojo + libclang, Mojo multithreading

C to Mojo Bindings Generator in progress: A member is working on a C->Mojo bindings generator, and is trying to figure out how to get object files in/out of the mojo compiler without cursed workarounds.
- They stated that pretty much all necessary stuff exists except wildly horrible packed structs and maybe some fiddly business around restrict; also pragmas that impact calling convention are going to be a painful bit to figure out.
Parsing intricacies of C code: A member noted that they would probably be moving a great deal faster if they hadn’t seen so much C that technically follows the spec.
- Another member apologized for using unsigned _BitInt(13) among other things.
Clang AST Dumps feed Mojo AST Records: A member is close to getting a mojo compilable version of a file via a generic command leveraging what people likely already have or can generate, so pulling from a clang compiledb and then feeding that to libclang to get out an AST.
- They are basically doing: clang -Xclang -ast-dump -fsyntax-only -fparse-all-comments -fno-color-diagnostics somefile -> ast nodes.
Mojo lacks Manual Thread Management: There’s no manual thread management in Mojo as of yet, and no timeline has been given but probably late this year or next year by some estimates.
- The type system like algebraic types has not yet been finalised and still being figured out, and some pointed out that this is only v0.3 and there’s a lot of pretty important stuff still missing, like working atomics and synchronization primitives, thread safety marker traits, basic IO, basic data structures, and the ability to do networking.
Structs transfered from Host to Device: Members discussed whether arbitrary structs can be transferred from host to device using the buffers or is this limited to primitive types.
- Another member replied it is limited for now, but there is some work on “trivial types” which would let you transfer anything with no pointers or special destructors.

DSPy ▷ #general (55 messages🔥🔥):

DSPy talks at AI Engineering and Databricks DAIS, DSPy use cases, DSPy 3.0 release in June, DARPA's Advanced Research Concepts lab using DSPy, Migrating custom prompts to DSPy

DSPy Talks Incoming & Input Requested: The team is preparing for DSPy talks at AI Engineering and Databricks DAIS, and is asking for community input on topics to cover and use cases to highlight in the slides.
- A DSPy 3.0 release is slated for June, according to this message.
DSPy Powers DARPA Project Spinoff: DSPy was used for DARPA’s Advanced Research Concepts lab to build a solution for the ‘Collaborative Knowledge Curation’ interest area, which is now being spun out into a company.
- This suggests a real-world application and validation of DSPy’s capabilities in advanced research environments.
Unconventional DSPy Flow: Some find that refactoring existing GenAI flows to DSPy seems like a major change, as it is designed to be used online and in production as the framework handling your GenAI flow, and it’s very unconventional in the way you build the DSPy flow.
- This highlights the need for more guidance on integrating DSPy into existing codebases and workflows.
Agent Framework with First-Class Environments: There is considerable interest in building an agent framework atop DSPy that includes first-class environments and handles self- and external-rewards for online learning via optimizers.
- One member expressed that Why the agent framework people aren’t doing this and are doing all kinds of other stuff is beyond me.
Claude Code Framework with Python Bindings: A member is building an agent framework using Claude code, aiming for Python bindings and a Rust core, with retroactive trace representation.
- The framework is designed to have traces visible and forking to be done well for optimization on arbitrary metrics; it is available as claude_sdk on Github.

aider (Paul Gauthier) ▷ #general (46 messages🔥):

ClaudeCode, Aider session resuming, Aider Ask Mode, Aider Restarting, Gemini 2.5

ClaudeCode Internals Unveiled: A member shared a link to a deep dive into ClaudeCode, detailing the systems, tools, and commands that go into building a self-driving coding agent and execution engine.
- The post emphasizes the systems and design decisions involved in creating agents that are real-time, self-corrective, and useful for productive work, rather than focusing solely on prompts and AI engineering.
Resuming Aider Sessions Saves Context!: Users discussed resuming Aider sessions to maintain context memory, with one noting that the --resume flag can be used, but they haven’t fully mastered its usage.
- They expressed a desire to resume older sessions or checkpoints by ID, while others are aggressively restarting to clear context.
Curbing Code Suggestions in Aider’s /ask Mode: A user expressed frustration with Aider’s /ask mode frequently suggesting code changes even when they just want to talk or plan.
- Suggested solutions include explicitly telling Aider “don’t write any code yet” or using a /reminder command to set the planning stage.
New Gemini Model Benchmarked: A member benchmarked a soon-to-be-released model at 86.2% with diff-fenced.
- Speculation arose that it could be a Gemini model, with concerns about potential confusion due to the naming of Gemini-2.5-pro versions; the confusion lead to calls for a rename or delay to avoid confusion.
Trace through Claude’s Mind!: A member shared Simon Willison’s Blogpost on Claude Trace which may be interesting for aider developers.
- The blogposts outlines the ability to trace through Claude’s though process during code execution.

aider (Paul Gauthier) ▷ #questions-and-tips (5 messages):

Aider run and test commands, Bedrock vs Anthropic models, Aider project re-initialization

Aider’s Run and Test Commands: New users inquire about prompting Aider to run test commands and analyze the output, and are directed to the /run and /test commands in the Aider documentation.
Bedrock vs Anthropic Models Command Execution: A user observed that Aider, when using a Bedrock Claude 3 Sonnet model, can successfully execute terminal commands like deleting files, but when using a Converse Claude Sonnet model, it only offers help without executing.
- They asked whether there is a limitation on the settings that affect the ability on using terminal commands, specially in Bedrock cases?
Aider Project Re-initialization After Folder Move: A user reported that Aider is failing after moving their project folder, because it’s still looking for a config.lock file in the old location, even after deleting all Aider files in the repo.
- They asked if Aider uses caching somewhere else on the system and if there’s a way to reinitialize an Aider project.

Notebook LM ▷ #announcements (1 messages):

Public Notebooks, NotebookLM

NotebookLM public sharing: Users can now curate and share their notebooks with anyone using a public link.
- This is a great opportunity to share your work and benefit the community.
Show off NotebookLM Skills: Members are encouraged to share the link to the notebook they are most proud of.
- The goal is to share notebooks that the community could benefit from.

Notebook LM ▷ #use-cases (8 messages🔥):

Podcast Summary Length Discrepancy, Microsoft Learn, Notebook LM, Use Cases

Podcast Summaries Length Discrepancy Awaits Fix: Users are still awaiting a fix for the podcast summaries in languages other than English, noting that they are not as long even with the same prompt and content.
- Another user echoed the sentiment, indicating they are also waiting for the fix.
Microsoft Learn Certification & Notebook LM Integration: A user inquired about using Notebook LM with Microsoft Learn and sought use cases and tips for Microsoft Certification.
- This prompted other members to wonder why this happens and if anyone is actively using Notebook LM with Microsoft Learn.
Palm Bayer Unveils AI-Powered Publici: A user created two notebooks for a city and county and wrote about them in The Palm Bayer article, showcasing the use of AI for public information.
- The user expressed their love for AI.

Notebook LM ▷ #general (40 messages🔥):

Notebook LM hallucinating facts, Notebook LM reads small portion of source, Gemini 2.5 pro coming to Notebook LM?, Syncing Google Docs with edits, audio overview smooth using discover feature

Notebook LM generates facts from thin air: A user reported that Notebook LM generates random, unsourced facts and erroneously links them to source documents, requiring constant correction to prevent the AI from using conversation history as a source.
- The hallucinated facts included an “ebony tablet of Aha-Teta” linked to “zom (electrum)”, which was deemed inaccurate as this connection wasn’t found in the provided sources.
Notebook LM’s Reading Comprehension Questioned: A user on Reddit claimed that NotebookLM only reads a small portion of a given source, citing an example where it read only 21 out of 146 pages.
- However, it was pointed out that the user may have misunderstood how NotebookLM works, particularly its use of RAG (Retrieval-Augmented Generation), and was trying to use it for unrelated content in a single source, which requires a custom solution.
NotebookLM Set for Gemini 2.5 Pro Upgrade?: Members are speculating when Notebook LM will start using more advanced models like Gemini 2.5 Pro, but currently there is no known timeline from Google.
- The model in use is likely Gemini 2.5 Flash, though this is not confirmed.
Google Doc edits not auto-syncing with NotebookLM: Users inquired about whether edits to a Google Doc source are automatically reflected in NotebookLM.
- A member clarified that changes are not automatic; users must re-sync from the preview, without needing to re-upload the document, to reflect the edits.
Audio Overview Smooth with Discover: A user said using the discover feature with audio overview with long option is super smooth and ready in no time.
- They reported having a 65 minute generation that sounded high quality.

Latent Space ▷ #ai-general-chat (33 messages🔥):

Claude Code Analysis, Escape Mount Moon Hackathon, Securing Inference-Time Data, Modal's LLM Engineer's Almanac, Anthropic Cuts Off Claude 3.x Capacity

Southbridge Analyzes Agentic Claude Code: Southbridge Research published a Notion analysis of Claude Code and its agentic capabilities.
- The author noted that the analysis prompted them to ship both Writer and Hashbrown live.
Modal Launches LLM Inference Benchmarks: Modal Labs launched the LLM Engineer’s Almanac, featuring thousands of LLM inference serving benchmarks across vLLM, SGLang, and TensorRT-LLM frameworks.
- The almanac includes results, replication code, and an executive summary addressing key questions for technical leaders, along with their benchmarking framework, stopwatch.
Anthropic Cuts Capacity, Chaos Ensues: Varun Mohan reported that Anthropic unexpectedly cut off nearly all Claude 3.x model capacity with less than five days’ notice, causing availability issues.
- Users expressed disappointment and concern about trust in model providers, but other models like Gemini 2.5 Pro and GPT 4.1 are unaffected.
Altman Announces Internet Access for Codex: Sam Altman announced that Codex, an AI coding tool, now has optional internet access for ChatGPT Plus users, disabled by default due to complex risks.
- The community asked for clarification on what Codex is, its tradeoffs, and potential security concerns.
Textract Accuracy Falls Short, Legal Eagles Beware: A member cautioned about Textract’s poor accuracy (around 3%) on legal and regulatory documents, linking to a LinkedIn post detailing the issue.
- They suggested being careful with pipelines involving Textract when word-for-word accuracy is essential.

Latent Space ▷ #ai-announcements (13 messages🔥):

Live AI Bot Collaboration, New UI Framework, Bug Reporting System, AIE Website Integration, Feedback Loop Improvement

Live AI Bot Collab Deployed: A live production AI bot was collaboratively developed and deployed to the AIE website using a new UI framework.
- One member expressed that this collaboration was a dream come true.
UI Framework Powers New AI Bot: A new generation UI framework was shared and subsequently used to ship the AI bot to the AIE website.
Calls for Bug Reporting System: Due to incoming bug reports, there are discussions about establishing an easier way to report bugs than direct messaging.
- The aim is to improve the feedback loop for the AI bot so it can self-improve.
Welcoming Hands Eager to Help: The community welcomes Mike to the channel and expressed their support for providing bug reports.
- One member shared that they are happy to help with bug reports as well!.

Torchtune ▷ #dev (30 messages🔥):

HF Tokenizer, FP8 + TP Ungated, Llama alternatives to LLaMA-Factory, Context Parallel limitations, dropping 3.9 Support

Tokenizer Ready for Review!: Unit tests were added as requested in PR #2781, waiting for review.
FP8 + TP Ungated with Memory Reduction!: PR #2782 ungates FP8 + TP, enables loss parallelism, and provides a sexy reduction in peak active memory.
- This PR also enables autograd compiling, but it’s currently broken.
Torchtune: A viable LLaMA-Factory Alternative: A user is using a torchtune fork for their work, finding it a performant and readable alternative to LLaMA-Factory because it avoids dependency on TE, megatron, lightning.
- One team has been pulling and training from a torchtune fork for 4-5 months now and has found it to be super stable with good results.
Context Parallel missing Flex Attention Compatibility: While Context Parallel (CP) has been landed, a key thing missing is the flex attention compatibility because packing offers a huge benefit.
- The distributed folks are working on enabling flex attention compatibility soon.
Dropping Support for Python 3.9?: The end-of-life status of Python 3.9 is causing issues with linting, as new linting insists on using List -> list, Tuple -> tuple, etc., while CI requires using Union and Optional from typing.
- A user suggested that the failed CI was because of Joe.

LlamaIndex ▷ #announcements (1 messages):

Gradio Agents x MCP Hackathon, Livestream on June 3rd, Hackathon Office Hours on June 4th

Gradio Agents x MCP Hackathon is here!: It’s the week of the Gradio Agents x MCP Hackathon, and you can still register here.
Gradio Agents x MCP Livestream Incoming!: Watch the Gradio Agents x MCP Livestream on YouTube here tomorrow, June 3rd!
Hackathon Office Hours Hitting!: There will be an office hours session for hackathon participants in the HuggingFace Discord server on Wednesday June 4th.

LlamaIndex ▷ #blog (5 messages):

Gradio Agents & MCP Hackathon, Scaling Agents in Finance workshop, MCP Integration, Agentic AI projects

Gradio Agents & MCP Hackathon Kicks Off: The Gradio Agents & MCP Hackathon is kicking off, featuring a livestream, $16.5k in prizes, and $900k in credits.
- The hackathon includes 3 Tracks: MCP Tool/Server, Custom Components for Agents, Agentic Demo Showcase.
Scaling Agents in Finance workshop: The slide deck from the Scaling Agents in Finance workshop hosted is now available, revealing how to automate document workflows with Agentic AI for finance tasks.
- The slide deck teaches how to use Assistant Agents that act as powerful ‘research assistants’.
LlamaIndex MCP Integration Enhances Agent Capabilities: A LlamaIndex integration enhances agent capabilities and workflow deployment using MCP.
- This integration offers helper functions for LlamaIndex agents to use MCP server tools and the ability to serve any LlamaIndex workflow as an MCP.
LlamaIndex at the AI Engineer Summit: LlamaIndex attended the AI Engineer Summit in San Francisco.
- Attendees met with the LlamaIndex team to discuss Agentic AI projects.
Crafting a Multi-Agent Financial Report Chatbot: A hands-on Colab to build a multi-agent financial report generating chatbot from scratch using LlamaIndex agent workflows is shared.
- The chatbot example involves parsing & indexing 10-K filings from Adobe and using agentic RAG to answer questions.

LlamaIndex ▷ #general (16 messages🔥):

Open Source Models vs GPT-4o, LlamaIndex Report Generation, Llama Agents vs LlamaDeploy, Gradio MCP Hackathon, Property Graph Index

Open Source Models vs GPT-4o Hardware Needs: To match GPT-4o performance with open source models like Deepseek-R1, DeepSeek-v3, or Qwen3-235-A22B, users need powerful hardware with hundreds of GB of VRAM or have to resort to heavy quantization and CPU/RAM offloading.
LlamaIndex Report Generation: Cloud vs Local: A user inquired about the accessibility of LlamaIndex’s report generation and the availability of Jupyter notebooks, referencing this blog post.
- A member provided a link to LlamaExtract notebooks, but also clarified that LlamaReport and some demos require LlamaCloud, implying cloud-based extraction and report generation.
Llama Agents morphs into LlamaDeploy: A user noticed that the llama-agents package on PyPI hadn’t been updated since August 16, 2024, and asked if Llama Agents had been replaced by Workflow.
- A member confirmed that LlamaAgents was renamed to LlamaDeploy, which deploys N number of Workflows as services, with more info here.
Gradio MCP Hackathon HuggingFace Office Hours: Members announced an upcoming office hours session on the HuggingFace Discord server for participants of the Gradio MCP hackathon.
- The Discord event link was shared for those interested.
Property Graph Index Token Usage Q&A: A new LlamaIndex user inquired about the token usage for indexing and retrieval with Property Graph Index.
- They asked about performance comparisons against GraphRAG, HippoRAG2, and LightRAG.

MCP (Glama) ▷ #general (16 messages🔥):

Multi-MCP interaction, New Feature Requests Discussion, Common MCP Servers for Tinkering, MonetizedMCP: Open-Source Framework for Payments, API Keys vs MonetizedMCP

Seeking Multi-MCP Interaction Guidance: A member inquired about creating an MCP that can interact with multiple other MCPs, such as calling the Atlassian MCP to get a ticket and then calling the Git MCP to create a branch.
- They are looking for an expert to demonstrate MCP server setup with an MCP client and help set up publicly available MCP servers.
New Feature Requests Channel Quest: A member inquired about a channel to discuss new feature requests, specifically regarding long-running tool invocations.
- They felt discussions were getting messy across multiple issues/pull requests and sought a better place for discussion.
Tinkering with Common MCP Servers: A member requested a list of common MCP servers to tinker with, supporting features like sampling, OAuth, prompts, resources, and resource templates.
- Another member suggested checking out the servers directory which looks alright.
MonetizedMCP Demo Showcased: A member shared MonetizedMCP, an open-source framework adding programmatic payments (crypto or fiat) to any MCP server, with a short demo video and a website.
- It can be used with the mcp-remote library.
API Keys Scrutinized Against MonetizedMCP: A member questioned the need for MonetizedMCP, suggesting that API keys with OAuth support in MCP, like in the MCP Connector, could achieve monetization.
- Another member explained that this provides a path for devs to potentially offer a paid product without needing to build an API/usage monitoring/rate limiting/Stripe integration/etc.

MCP (Glama) ▷ #showcase (4 messages):

self-hostable assistant, MCP from mobile, alpic-ai/grizzly, Visual testin

Piper, a self-hostable assistant for mobile MCP: A member shared Piper, a self-hostable assistant, because there weren’t good options to use MCP from mobile.
- Piper is hosted on GitHub.
Grizzly integrates cool features: Another member shared Grizzly, which they built over a weekend, that integrates some cool features that can be complementary to Piper.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (13 messages🔥):

MOOC Dates, Assignment deadlines, Feedback on submissions, Certificate Declaration Form, Written Article form

MOOC Next Session Dates Remain Unconfirmed: A member inquired about dates for the next MOOC session this year, but nothing has been confirmed yet.
Assignment Deadlines have Passed: A user asked if pending quizzes would be reopened, but it was clarified that all assignments were due on May 31st.
Certificate Declaration Form Deadline Approaching: A member inquired about completing the Certificate Declaration Form and Written Article.
- They confirmed to complete it ASAP, as the form will close very soon.
Request for Detailed Feedback on All Submissions: A member asked to get detailed feedback of all of their submissions, including agentX project and 2 lab assignments.

Cohere ▷ #💬-general (3 messages):

CMD-R follow-up, Fine-tuned adapters, Cohere sponsorship

Hugging Face Lacks CMD-R Follow-Up: Hugging Face is yet to announce a follow-up CMD-R model.
- Users may try pairing it with fine-tuned adapters (e.g. LoRA) or try Mistral with newer training data.
AWS Bedrock may be getting Command A: A user asked about the potential availability of Command A on AWS Bedrock.
- No confirmation was given in the discussion.
Seeking Sponsorship from Cohere: A member inquired about the right contact to ask Cohere for sponsorship for a post-secondary hackathon.
- No direct contacts were provided in the chat.

Cohere ▷ #🤝-introductions (2 messages):

Introductions, Community server

Cohere Community Server Introductions begin: A new member named Aashutosh joined to introduce himself to the community server.
- He is a second year undergrad from India who is obsessed with LLMs and ML, and looks forward to create real world projects.
Stickied Message Highlighted: A stickied message was highlighted to encourage new members to introduce themselves.
- The message provided a template including Company/Industry/University, What you're working on, Favorite tech/tools you use, and What you hope to gain from this community.

Nomic.ai (GPT4All) ▷ #general (5 messages):

Deepseek-r1 on outdated CPU, Orange PI Open Hardware, Mac VRAM King

Outdated CPU runs Deepseek-R1: A user ran a large Deepseek-r1 model (over 400GB) from an outdated 5000 mt/s SSD at 0.1 tokens/second by removing RAM to make space.
- They attributed this to the amazing quality of MOE models, noting that PC industry storage has improved greatly, unlike RAM speeds.
Orange PI dreams of running Large Models: A user hopes that open hardware projects like Orange PI may attach a Gen 5 m.2 slot to its board.
- They expect that with OpenCL and existing 3D GPUs, it may be possible to make a very powerful “unified memory” machine that runs models as big as DeepSeek-R1 at quite a few tokens/second with proper offloading while using just a few watts.
Mac Boasts Massive VRAM for Model Enthusiasts: The user argued that Mac with 512GB is the “VRAM” king with 448GB of VRAM and similar price when compared to near equivalency of FOUR newest AMD AI MAX 395+ 128 GB mini PCs or laptops combined together.
- They also pointed out the lower power consumption of the Mac compared to the combined AMD setup.

tinygrad (George Hotz) ▷ #general (1 messages):

PR review request, Github

PR review requested on Github: A member asked for a review of their Pull Request on Github.
Github PR review needed: A Github user has requested a review for their Pull Request.

tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

GlobalCounters.mem_used, GlobalCounters.global_mem, Tensor Realization

GlobalCounters Unveiled: mem_used vs global_mem: A user inquired about the appropriate usage scenarios for GlobalCounters.mem_used versus GlobalCounters.global_mem in the tinygrad framework.
- It was noted that mem_used updates during buffer allocation/deallocation, while global_mem updates within ExecItem, seemingly during tensor realization.
Deep Dive into tinygrad’s Memory Counters: The distinction between GlobalCounters.mem_used and GlobalCounters.global_mem revolves around allocation timing and context within tinygrad.
- mem_used offers a real-time view of allocated/deallocated buffers, whereas global_mem reflects memory usage during tensor realization within ExecItem.

MLOps @Chipro ▷ #events (2 messages):

Machine Learning Street Talk, AI programming for data analysis, AI4Legislation summer competition

Machine Learning Street Talk Discusses Generative AI: Join Machine Learning Street Talk (MLST) for a discussion on generative AI this Friday, the 6th, 9am PST; details available at this Discord event link.
AI Programming Webinar for Data Analysis: An industry expert, Liang Guo, is leading a webinar focused on AI programming for data analysis; RSVP at this Google Forms link.
AI4Legislation Summer Competition Announced: Silicon Valley Chinese Association is holding an AI4Legislation summer competition, part of an overarching series; find more information on GitHub.

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Gorilla, MCP, tool use, API actions

Gorilla is OG MCP: Gorilla predates the formal MCP standard by a year, but it’s functionally a proto-MCP system.
- It routes model queries into tool use, interprets structured tool schemas, and grounds generation in real API actions.
Gorilla Enables Interfaces: The team thinks of Gorilla as the MVP of MCP.
- It proves the core idea that LLMs don’t just need knowledge — they need interfaces.