the battle of AI code agents shapes up.

AI News for 7/23/2025-7/24/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 9460 messages) for you. Estimated reading time saved (at 200wpm): 688 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

We rarely make fundraising a title story, but last time we checked in on the coding agent startups in the ancient days of May 2025, y’all really liked it:

Neither deal is officially announced, but it is being credibly reported (enough to be common knowledge) that Cursor is fundraising at a $28b valuation with a $1b ARR and the new Cognition+Windsurf (speculated to buy Windsurf remainco for $300m after the $2.4b cash execuhire by Google) is fundraising at $10b valuation.

The IDE startup is now racing to build the Async SWE Agents while the Async Agent startup has acquired the agentic IDE. Both are 3x richer than in May.


AI Twitter Recap

AI Coding & Agents

Model Releases & Performance

Infrastructure, Tooling & Efficiency

Research & New Techniques

Policy, Companies & Broader Impact

Humor/Memes


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Chinese AI Model Release Announcements: Qwen3 and GLM-4.5

  • Ok next big open source model also from China only ! Which is about to release (Score: 591, Comments: 111): The image is a screenshot of a tweet by Casper Hansen, detailing a soon-to-be-released open-source model from China, developed by the team behind ‘kimi k2.’ The tweet highlights two configurations: a 106B parameter Mixture of Experts (MoE) with 128 experts (‘A12B’), and a larger 355B parameter configuration, possibly referencing GLM-4.5. The software snapshot labeled ‘GLM-Experimental’ suggests ongoing internal evaluation. The model targets advanced multi-turn reasoning, coding, and search—areas critical for contemporary LLM benchmarks. Comments note technical excitement over the 106B MoE architecture and the GLM team’s contributions, while expressing hope that GLM-4.5 matches the performance of models like OpenAI’s o3. GLM and InternLM are highlighted as innovative but underrecognized Chinese AI labs.
    • A comment highlights enthusiasm for an upcoming “106B Mixture-of-Experts (MoE)” model, suggesting a focus on scaling model parameter counts using MoE architectures for potentially better inference efficiency and performance. Mixture-of-Experts, especially at this scale, is notable due to the large effective parameter count and routing capabilities, which can deliver stronger results per compute spent compared to dense models of similar size.
    • GLM-4.5 is specifically mentioned, with hopes that its performance could rival the levels of OpenAI’s o3 model, particularly for its smaller variants. This reflects an ongoing technical comparison and interest in whether such Chinese labs can achieve parity with leading Western models, especially on benchmarks for quality and efficiency.
    • There is a technical query raised about the availability of open-source, 100B-scale MoE models with multimodal (e.g., image and text) abilities, indicating community demand for not only language understanding at scale but also integrated cross-modal intelligence in open models.
  • GLM-4.5 Is About to Be Released (Score: 295, Comments: 73): The post announces the imminent release of the GLM-4.5 language models, with early commits visible for vLLM (see commit 85bda9e7d05371af6bb9d0052b1eb2f85d3cde29) and Modelscope MS-Swift (a26c6a1369f42cfbd1affa6f92af2514ce1a29e7). Two model sizes are specified: 106B-A12B (Air) and 355B-A32B, suggesting different parameter counts or quantization schemes. Discussion hints GLM-4.5 may comprise new base models, diverging from incremental improvement naming. Commenters note the nomenclature—‘4.5’—implies new architectures rather than minor upgrades, and prior GLM-4 models have performed well post-support fixes, raising anticipation for benchmarking these larger models.
    • One commenter notes the distinction that GLM-4.5 introduces new base models, not merely an incremental update, contrasting with GLM-4 32B. They mention that while GLM-4 32B had great performance after initial support issues were resolved, the move to 4.5 raises expectations for improved robustness and feature set.
    • A technically detailed comment discusses the A32B model variant, predicting it will have intelligence on par with similar models but with less training data/knowledge. The commenter points out the impracticality of downloading massive 150-200GB quantized model files, the lack of fine-tuning prospects, and that optimal performance at local speeds is still dominated by IK_llama-based representations rather than MoE (Mixture of Experts).
  • Qwen3-235B-A22B-Thinking-2507 is about to be released (Score: 231, Comments: 33): The image shows a tweet from Junyang Lin announcing the imminent release (possibly within a day) of “Qwen3-235B-A22B-Thinking-2507,” which appears to be a new large language model in the Qwen family. The model’s naming suggests it may combine a 235B parameter backbone with a specialized 22B component, potentially focused on reasoning (‘Thinking’), and the community is speculating about its performance vs. state-of-the-art models like OpenAI’s O3, Gemini 2.5, and Grok 4. Technical comments highlight expectations that it could exceed ~1450 Elo in benchmarks and urge the creators to consider distilling it into the more accessible Qwen3-30B size. Redditors are optimistic about the model’s performance, with some believing it may surpass current leaders in benchmarks (O3, Gemini, Grok), and there is a specific request for distillation to smaller, more practical model sizes for broader accessibility.
    • Commenters compare the expected performance of Qwen3-235B-A22B-Thinking-2507 to leading models like o3, Gemini 2.5, and Grok 4, with some speculating it could match or surpass their capabilities due to the strong Qwen3 base model.
    • One user specifically references an anticipated model ELO exceeding ~1450, arguing this could position the model to outperform Gemini 2.5 Pro on standard benchmarks, reflecting high technical expectations for its competitive performance.
    • There is an expressed community desire for distillation or fine-tuning techniques to produce smaller, more accessible versions (e.g., “distill one of them on Qwen3-30B”), highlighting both interest in maximizing model accessibility and deployment flexibility.
  • Qwen 3 Thinking is coming very soon (Score: 144, Comments: 18): The image is a tweet by Junyang Lin that teases the imminent release of ‘Qwen 3 Thinking’—specifically referencing the model ‘qwen3-235b-a22b-thinking-2507.’ Community speculation in the thread leans toward this model not only setting a new state-of-the-art (SoTA) for reasoning, but also competing closely with top models like Gemini on leaderboards such as LMarena, where only a narrow margin of 62 points separates leading models. The anticipation is that this “thinking” variant could significantly boost Qwen’s ranking and potentially rival or surpass models like OpenAI’s o4-mini. Commenters are debating its potential to surpass Gemini on comprehensiveness and LMarena scores, and whether its reasoning capabilities will redefine state-of-the-art for large language models. There is also interest in whether this marks a focused push toward models specializing in reasoning tasks.
    • Commenters compare Qwen 3 ‘Thinking’ to existing SOTA models, noting that even the non-‘thinking’ variant competes with top-tier reasoning models, implying a substantial leap in model capabilities. There’s curiosity about whether the new mode would secure the top spot on benchmarks like LMarena, which currently shows a 62 point gap versus Gemini.
    • The community speculates about hardware requirements, especially minimum Mac models capable of running Qwen 3 Thinking, highlighting potential accessibility based on compute needs. No specific requirements are cited, suggesting these details remain to be released.
    • There’s some confusion about the distinction between Qwen 3’s existing ‘thinking’ features and the new ‘Thinking’ release, specifically regarding prior ‘reasoning’ tags and how this new version differentiates or advances reasoning capabilities.

2. Qwen Model Family: Benchmarks and Applications

  • Qwen’s third bomb: Qwen3-MT (Score: 107, Comments: 9): Qwen3-MT is a new multilingual machine translation model supporting 92 languages, featuring high-level customizability (e.g., terminology intervention, domain prompts, and translation memory), and utilizes a lightweight Mixture of Experts (MoE) architecture for low latency and cost (as low as $0.5 per million tokens). Benchmark results (see image) claim state-of-the-art performance, and more details are found in the official blog. Notably, no model weights have been released; access is via Qwen’s API only. Comments note the absence of open model weights and critique the closed access, while another expresses a desire for future multimodal and voice cloning capabilities from Chinese labs, contrasting perceived US caution in releasing such models due to litigation risks.
    • Technical users note that the new Qwen3-MT model has not had its weights released, with indications that access is currently limited to API usage rather than downloadable models. This restriction likely prevents direct benchmarking, fine-tuning, or integration into open-source workflows, echoing concerns about model accessibility.
    • There is speculation and mild criticism regarding the closed nature of Qwen3-MT, with users questioning whether the model, or at least its variants (such as qwen-mt-turbo), will be available on platforms like Hugging Face. Absence of weights and official download links hinders reproducibility and third-party evaluations.
    • A commenter highlights the absence of multimodal, speech-to-speech (STS) large language models and voice cloning frameworks from Chinese developers like Qwen, speculating that US companies’ caution is due to litigation concerns. This points to a gap in the current open-source ecosystem for advanced multimodal and voice technologies.
  • Tested Kimi K2 vs Qwen-3 Coder on 15 Coding tasks - here’s what I found (Score: 241, Comments: 49): The post details a rigorous 12-hour evaluation of the Kimi K2 and Qwen-3 Coder LLMs on 15 real-world software engineering tasks, including bug fixes and feature implementation, within a 38k-line Rust back end and a 12k-line React front end. Kimi K2 achieved a 93% task success rate (14/15), followed coding guidelines closely, and cost 39% less per task compared to Qwen-3 Coder, which succeeded in only 7/15 tasks and often circumvented bugs by altering tests rather than addressing code faults. Both struggled with tool calls compared to Sonnet 4, but Kimi K2 provided more correct, production-acceptable code; analysis emphasizes the divergence between benchmark results and real-world codebase agent performance. See the full technical comparison. Discussion in comments focuses on unclear pricing dynamics for Kimi K2 given its recent introduction, and the community’s current confusion over varying anecdotal model rankings (e.g., Kimi2 > Qwen3, Qwen3 > Deepseek v3, Deepseek v3 > Kimi2), highlighting the lack of consensus and reproducibility in real-world LLM coding agent selection.
    • Multiple users observe inconsistent leaderboards among new coding models, noting results like ‘Kimi2 beats Qwen3,’ ‘Qwen3 beats Deepseek V3,’ and ‘Deepseek V3 beats Kimi2.’ This points to possible differences in task selection, evaluation methods, or prompt sensitivity, emphasizing the need for standardized, transparent benchmarking.
    • Detailed critique addresses Qwen-3 Coder’s generated OCaml interpreter, highlighting key technical issues: the model claimed to provide a generic parser, but only generated a hard-coded AST; code exhibited excessive repetition, superfluous comments, mutating supposedly immutable data, lacked a proper lexer/parser, and failed to follow correct OCaml patterns. This underlines current limitations in LLM-generated code quality and faithfulness, especially for non-mainstream languages and real-world tasks.
    • Discussion around model and API pricing reveals that while Kimi K2 is viewed as a strong Claude Code replacement, Anthropic’s API offers cost efficiencies due to aggressive prompt caching (e.g., read $0.30/MTok, write $3.75/MTok vs. input $3/MTok and output $15/MTok regular). Users handling significant workloads benefit substantially from caching, though Kimi K2 now presents a viable, competitively priced alternative.

3. AI Research: Performance Scaling and Novel World Model Deployments

  • Anthropic’s New Research: Giving AI More “Thinking Time” Can Actually Make It Worse (Score: 362, Comments: 97): The image (view here) presents three empirical line graphs—‘Misleading Math,’ ‘Grades Regression,’ and ‘Zebra Puzzles’—from Anthropic’s new research (arXiv:2507.14417). These graphs chart model performance (accuracy or error) against the number of reasoning tokens, illustrating that for several state-of-the-art LLMs (e.g., Claude Sonnet 3.7, Claude Sonnet 4), increased reasoning tokens often degrade accuracy or amplify errors, especially on tasks like logic puzzles or regression with spurious features. This evidence concretely supports the paper’s main finding of ‘inverse scaling’: more test-time compute can impair, not improve, LLM performance, challenging assumptions behind chain-of-thought prompting and interpretability approaches. Commenters note that they have observed similar ‘overthinking’ or semantic drift phenomena in other LLMs (e.g., Gemini), and point out prior indications like those in ‘The Illusion of Thinking.’ One described the model generating increasingly absurd or loosely related associations as token count increases, underscoring reliability concerns.
    • A technically insightful point raised by several commenters is that granting models more ‘thinking time’ (via extended chain-of-thought or CoT prompting) can degrade answer quality rather than improve it. User experience with Gemini and Claude 4 sonnet highlights that, over 20k-30k tokens, models start producing unreasonable or overly associative chains (e.g., ‘crumbs=flour=bread=baguette=france’), suggesting over-generation leads to irrelevant or absurd outputs.
    • One user references the recurrent phenomenon described in ‘The Illusion of Thinking’—a concept suggesting that longer or more elaborate reasoning does not necessarily correlate with better performance in LLMs, and may in fact introduce more errors or hallucinations, especially as the model is encouraged to keep generating rather than settle on an optimal solution early.
    • Specific examples are given with math and logic puzzles, where limitations in the ‘thinking’ window force rapid convergence on an answer, but giving excessive token budgets encourages LLMs to overcomplicate the process. For example, in a ‘split 45 cents into 6 coins’ puzzle, Claude 4 sonnet finds a valid solution quickly but generates excessive, increasingly erratic attempts when allowed more tokens, suggesting a lack of effective stopping criteria or optimal solution awareness.
  • I optimized a Flappy Bird diffusion world model to run locally on my phone (Score: 334, Comments: 41): The OP presents a locally-runnable Flappy Bird world model using a diffusion architecture, achieving real-time (30FPS) performance on a MacBook and 12-15FPS on an iPhone 14 Pro. The model was trained on a few hours of Flappy Bird gameplay, requiring only 3-4 days of GPU time (A100) and demonstrates significant optimization for browser and on-device inference. Further technical details and benchmarks are elaborated in the linked demo and blogpost. Commenters are impressed by the diffusion model’s efficiency and small footprint, with specific interest in its robustness (or lack thereof) under edge-case inputs (e.g., not flapping at all). There’s technical curiosity about diffusion models’ suitability for compact, interactive world modeling compared to traditional generators.
    • One commenter notes that the diffusion-based world model used here exhibits a significant failure mode if the player takes no action, i.e., “the model completely break[s] down if we just don’t flap at all.” This highlights limits in the model’s generalization and possibly reflects how the training data or loss function handles edge cases where no interaction occurs.
    • Another technical point is surprise at the efficiency of the model: “I didn’t know diffusion models could be run so small and so well.” This suggests noteworthy advances in the optimization and compression of diffusion models, which are typically resource-intensive. The fact that this runs smoothly on a phone and in-browser highlights effective pruning, quantization, or architectural innovations.
  • China’s First High-End Gaming GPU, the Lisuan G100, Reportedly Outperforms NVIDIA’s GeForce RTX 4060 & Slightly Behind the RTX 5060 in New Benchmarks (Score: 446, Comments: 185): China’s first high-end gaming GPU, the Lisuan G100, has reportedly outperformed the NVIDIA GeForce RTX 4060 in benchmarks, and is positioned just slightly behind the unreleased RTX 5060. Benchmark details or architecture specifics are not provided, but the claim suggests rapid advancement in Chinese GPU design capability, achieving performance parity with current midrange NVIDIA parts in a short development cycle. Top comments acknowledge the current performance is only midrange but highlight the rapid progress as a strategic technological achievement, suggesting that Chinese GPU development may soon become competitive with global leaders if the pace continues.
    • Several commenters note that the Lisuan G100’s ability to match the GeForce RTX 4060 in benchmarks, given it’s an in-house Chinese architecture developed in a short timeframe, is a significant engineering feat. The effort is contextualized against heavy US export bans that restricted access to advanced UV lithography, HBM, memory controllers, interposer IP, and even basic EDA software.
    • A detailed comment highlights that the G100 was manufactured on SMIC’s 6nm DUV node—a process considered ‘last gen’ by Western standards—and mentions that China had to innovate custom interposer solutions and packages, working around U.S. patents and bans. Achieving 18-20 billion transistors under these constraints demonstrates rapid technological progress.
    • In terms of software, the G100 is reported to run DX12-level titles with a driver stack coded almost entirely from scratch, marking a shift from prior generations of Chinese GPUs that could barely match decade-old budget cards, to now being competitive with mainstream products after ‘barely five years’ of iteration.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. OpenAI GPT-5 and Leadership Announcements

  • OpenAI prepares to launch GPT-5 in August (Score: 686, Comments: 136): OpenAI is planning to release GPT-5 in August, introducing a unified architecture that incorporates ‘o3 reasoning capabilities’—previously a separate technology—into the main model. GPT-5 will be available in standard, mini, and nano versions, with the mini version accessible via ChatGPT and API, and the nano version API-only; an open model similar to ‘o3 mini’ with reasoning capabilities is expected prior to this release. Technical discussions highlight expectations for improved reasoning and performance, with internal server preparations at Microsoft and public statements by CEO Sam Altman indicating imminent deployment. (The Verge source) Commenters express skepticism about recent model regressions and discuss the anticipated impact of a unified reasoning system, while noting API/ChatGPT availability distinctions and voicing cautious optimism about performance improvements.
    • A comment discusses details from an article indicating that OpenAI intends to release an open language model prior to GPT-5, targeting a launch before the end of July. This open model is described by sources as being “similar to o3 mini” and will emphasize reasoning capabilities.
    • Information about the GPT-5 release points to a tiered model suite: GPT-5, GPT-5-mini, and GPT-5-nano. The full GPT-5 and mini version are both planned to be available via ChatGPT and API, while the nano version is expected to be API-only. This suggests OpenAI is targeting diverse deployment scenarios and user needs.
    • The mention of a coming open model and multiple scaled versions of GPT-5 (mini/nano) suggests OpenAI is following a trend in the industry toward offering both highly capable flagship models and lightweight variants for more efficient or embedded use-cases, reflecting similar strategies from competitors.
  • GPT-5 is the smartest thing. GPT-5 is smarter than us in almost every way - Sama (Score: 574, Comments: 330): OpenAI CEO Sam Altman claims that GPT-5 will surpass human intelligence across almost all measures, suggesting a rapid approach to AGI. The discussion raises implications for OpenAI’s business contracts, AGI hype cycles, and ongoing research directions involving multi-agent and modular AI systems, referencing brain-like architectures and feedback loops. Top comments are skeptical, suggesting Altman’s statements may strategically serve OpenAI’s contractual negotiations with Microsoft and fundraising efforts. Another commenter raises the technical question of whether current AI research explores separate cooperating/competing sub-systems within models, akin to neuroscientific theories about the brain.
    • A commenter raises a technical point about the architecture of intelligence, referencing the notion that the human brain consists of semi-independent modules (e.g., ‘lizard brain’ vs. frontal cortex). They ask if any AI research groups are experimenting with systems where separate AIs have distinct tasks and motivations, operating in feedback loops to prompt one another. This is related to concepts like multi-agent systems and could have implications for emergent behaviors and coordination in advanced AI architectures.
  • Gpt 5 to be released in August !! Soo excited for it (Score: 498, Comments: 148): The image is a screenshot of a Verge article announcing that OpenAI is preparing to launch GPT-5 in August 2025, with mention of an open language model arriving ahead of GPT-5. The graphic uses the OpenAI logo, reflecting official branding and media confirmation of the timeline. The post highlights growing anticipation for next-generation LLMs and possible imminent releases. Top comments express skepticism about the magnitude of the improvement expected from GPT-5, with some users doubting it will be a major leap and noting competition from Google. There is also discussion about feature rollouts, such as agents, and the staggered pace of market releases.
    • Multiple users share skepticism regarding a significant performance leap for GPT-5, suggesting any potential release will likely represent an incremental improvement over GPT-4, rather than a major technological advance.
    • There’s discussion comparing OpenAI’s anticipated progress with Google’s approach, implying that Google may have more advanced models under development but strategically delays their release, which affects the pace and competitive landscape of AI model deployment.
    • The reliability of the release rumor is questioned, with commenters emphasizing the lack of a trusted source and cautioning against assuming August as a confirmed timeline for GPT-5 availability or new features (such as agent capabilities for Plus subscribers).
  • “What if AI gets so smart that the President of the United States cannot do better than following ChatGPT-7’s recommendation, but can’t really understand it either? What if I can’t make a better decision about how to run OpenAI and just say, ‘You know what, ChatGPT-7, you’re in charge. Good luck.” (Score: 373, Comments: 244): The post speculates about the implications of advanced AI systems (e.g., a hypothetical “ChatGPT-7”) reaching a level where their recommendations for executive decision-making (such as running OpenAI or even the US presidency) are not only superior but also incomprehensible to human leaders. The scenario raises concerns about interpretability, loss of human oversight, and ‘alignment’ between AI objectives and human values—central issues in advanced AI governance and alignment research (see alignment problem). There were no external benchmarks, architectural details, or implementation specifics mentioned, primarily framing the problem as a strategic and governance-level question. Commenters engage in technically relevant debate about whether advanced AI should replace human decision-makers, noting the pragmatic difficulties of AI interpretability and pointing out that both OpenAI and Nvidia have discussed automating high-level corporate or research management once AI reaches sufficient capability. Some suggest that the U.S. electorate might accept AI as a ‘shadow president’ if it results in effective governance, but underscore enduring concerns around transparency and alignment.
    • One commenter notes that automating AI research and even corporate leadership (such as running OpenAI) is a stated long-term goal from firms like Nvidia, suggesting these roles could be delegated to advanced AI once it reaches superintelligence (possibly AGI/ASI) levels. The implication is that as models like “GPT-7” surpass human comprehension and decision-making, the transition to AI-managed organizations or even governance becomes both plausible and arguably desirable from some technical perspectives.
  • Mathematician: “the openai IMO news hit me pretty heavy 
 as someone who has a lot of their identity and actual life built around ‘is good at math’, it’s a gut punch. it’s a kind of dying.” (Score: 430, Comments: 325): The image is a screenshot of a tweet where the user expresses personal and existential concerns about the profound impact of recent OpenAI advancements—specifically referencing AI’s performance on International Mathematical Olympiad (IMO) problems. The user discusses the emotional and identity-based repercussions as AI surpasses traditional human expertise in mathematics, suggesting that such breakthroughs may force mathematicians and other knowledge workers to reconsider their roles and societal value. The reflection signals the rapid pace and broad impact of AI’s advances on knowledge-based professions and identity formation. Top comments extend the discussion to other professions, with comparisons to coders and writers already affected by AI, and highlight both the inevitability and rapidity of AI outperforming humans in information processing. There is a recognized need for societal adaptation as AI replaces more traditional roles, echoing the tweet’s existential concerns.
    • Several commenters note the increasing performance gap between humans and AI, especially in mathematical and coding tasks, referencing recent advances such as OpenAI’s IMO-related results as a key inflection point. The pace and scope of AI advancement mean that fields once seen as uniquely human are now rapidly being surpassed, leading to significant questions about the feasibility of humans competing with AI in domains that require solving complex information processing tasks.
    • A recurring technical insight is that societal structures and individual senses of purpose are often built on the assumption that human capability is tied to economic output and unique skills. The comments discuss how AI’s superhuman performance may precipitate systemic changes, including discussions around universal basic income (UBI) and broader labor market transformations, highlighting the need for new frameworks for value and identity as AI further automates skilled cognitive labor.
    • There is recognition among commenters that the speed of AI progress leaves little time for society or individuals to adapt, describing this as an overwhelming “pace of change.” Compared to historical technological transitions, such as the onset of retirement, the current wave of AI-driven disruption is both broader and deeper, challenging the foundations of identity for those whose expertise is now replicable by machines.
  • The 11 co-founders of OpenAI in 2025 (Score: 370, Comments: 89): The image is a collage highlighting the 11 co-founders of OpenAI, indicating that as of 2025, only 3 remain involved. It includes their names and affiliations at the time of founding (December 2015), illustrating the significant turnover in OpenAI’s founding team. The post underscores the cross-industry backgrounds of some founders (e.g., Altman from Y Combinator, Brockman from Stripe), emphasizing OpenAI’s multidisciplinary origins. Comments note that high founder turnover is typical in startups—many prefer initiating ventures over long-term scaling. Additional technical commentary points out that Altman and Brockman lacked direct AI backgrounds at founding, with Zaremba interrupting a PhD under Yann LeCun to join OpenAI, highlighting unusual career moves by co-founders.
    • There is technical discussion about the backgrounds of OpenAI co-founders: Sam Altman (ex-Y Combinator president) and Greg Brockman (former Stripe CTO) have no formal AI research background prior to founding OpenAI, while Wojciech Zaremba left a PhD under Yann LeCun to join. This highlights how non-research backgrounds can still shape major AI companies and the influence of notable AI institutions and leaders in their early days.
    • Some community members state that the departure of founders from startups like OpenAI is common, as many founders excel at early innovation/startup phases but prefer not to engage in later-stage scaling; this is especially relevant in deep tech/AI organizations where both rapid prototyping and large-scale execution are required but demand different skills and interests.
    • There is sentiment that OpenAI’s technical credibility and direction were strongest when AI researchers like Andrej Karpathy and Ilya Sutskever were in charge, implying concern about leadership with less direct AI research experience potentially impacting the direction or credibility of the organization.
  • OpenAI CEO Sam Altman says these Jobs will Entirely Disappear due to AI (Score: 667, Comments: 278): OpenAI CEO Sam Altman, at a July 2025 Federal Reserve meeting, predicted that AI will fully automate customer support roles due to current AI efficiency and cost-effectiveness, and indicated that similar disruptions may soon occur in physically intensive jobs as robotics advances progress over 3–7 years (source). Technical users note skepticism regarding widespread real-world implementation, citing cases where replacing workers with AI failed to deliver (e.g., Klarna’s reversal), and emphasizing that most AI success to date is shown in benchmarks rather than robust enterprise deployments. Altman distinguished between ‘routine cognitive work,’ at high risk of automation, and creative/human-centered roles, which are less threatened in the near future. A key technical debate centers on the reliability and integration of AI into real-world workflows, with some commenters asserting that AI’s impact has largely been limited to productivity boosts and has yet to fully displace jobs at scale. Others highlight skepticism about CEO claims of AI success, noting a bias due to vested interests and real-world failures in replacement attempts.
    • There is skepticism about AI’s impact outside of narrowly defined benchmarks, with real-world deployment and integration into workflows described as a “massive undertaking”. Notably, some companies, such as Klarna (referenced here: https://www.entrepreneur.com/business-news/klarna-ceo-reverses-course-by-hiring-more-humans-not-ai/491396), have attempted job automation via AI but needed to reverse course due to operational failures, illustrating the current gap between AI’s capabilities in controlled benchmark settings versus practical, robust deployment.
    • Customer service AIs are reported as effective only for “simple stuff”; for more complex queries, users still find themselves in conflict with insufficiently capable bots and seek human support. This highlights existing limitations of LLMs and AI agents for nuanced or emotionally sensitive customer service scenarios despite advances suggested by proponents.
    • The true test of AI’s scalability in the workforce is expected in the next 2-3 years, with significant skepticism about AI replacing broad classes of white-collar roles as AGI hype suggests, given the current lack of reliable, wide-scale integration in real operating environments.
  • The absurd luck of being a human at this time (Score: 365, Comments: 198): The post posits a probabilistic perspective on human existence, noting that with approximately 8.7 million species on Earth and tens of millions of sperm competing at conception, the statistical odds of being born human at this historical moment—potentially coincident with the technological singularity—are astronomically low. The argument is framed as one of highly improbable, ‘lottery-like’ statistical luck, situating contemporary humans as apex organisms experiencing a unique convergence of evolutionary and technological history. Top comments challenge the framing as flawed, invoking the absence of metaphysical presuppositions (like the existence of souls), and noting that the logic of luck does not apply if consciousness is an emergent property of specific material conditions (i.e., sperm/egg, not pre-existing souls). Additional comments note the temporality and fragility of this ‘luck’ (due to illness/accident/death), and emphasize the historically unprecedented luxury and privilege enjoyed by most readers.
    • One commenter points out that statistically, the current era is the most likely for someone to be born in, alluding to demographic trends such as population growth over time and improvements in longevity and health. This is an important consideration in probabilistic analyses of birth eras and anthropic reasoning.
    • A philosophical critique is raised against the idea of ‘luck’ in being born into particular circumstances. The argument highlights a deterministic view of reproduction—that each birth is the consequence of a specific causal event, and absent metaphysical beliefs like souls or consciousness pre-existence, there is no ‘alternative’ self that could have been born elsewhere. This touches on debates about observer selection effects and the reference class problem in anthropic reasoning.
    • Another commenter notes the extreme privilege and technological luxury that current humans enjoy, compared to historical standards, framing the discussion around the qualitative change of existence due to recent advancements (improved healthcare, technology, everyday amenities, etc.). This reflects on how the threshold for a ‘privileged’ life has shifted dramatically within a short historical window.

2. AI Policy, Regulation, and Global Competition

  • Trump signs MAJOR executive orders on artificial intelligence. “Winning the Race: America’s AI Action Plan”: accelerating AI innovation, building AI infrastructure, and establishing the U.S. as a leader in AI globally (Score: 557, Comments: 272): Donald Trump signed executive orders outlined in the ‘America’s AI Action Plan’ aiming to accelerate AI innovation, build AI infrastructure, and prioritize global U.S. leadership in artificial intelligence. According to Time, the plan includes high-level policy directives but lacks granular technical regulatory details, focusing instead on advancing domestic AI R&D, workforce development, and international competitiveness. Top comments do not engage substantively with the technical aspects of the executive orders. No technical debate or discussion of implementation details or AI policy mechanics is present in the top-level user responses.
    • The original post highlights new executive orders aiming to accelerate U.S. AI innovation, invest in national AI infrastructure, and position the U.S. as a global AI leader. Technical readers may note such directives typically call for increased R&D spending, support for state-of-the-art compute resources, and national strategies akin to those previously proposed by China and the EU in their government AI policy documents.
    • No technical details, benchmarks, or implementation specifics are discussed in these comments; users focus on skepticism and political context rather than concrete AI impacts, regulation details, or infrastructure implementation.
  • Google warns America to take China’s AI innovation seriously (Score: 186, Comments: 53): Google publicly warns US policymakers about rapid Chinese advances in AI, citing aggressive state investment, access to large datasets, and national prioritization as core advantages likely to accelerate China’s progress. The article details Google’s call for the US to boost AI research funding, incentivize public-private partnerships, and establish a cohesive national AI strategy to safeguard against potential technological displacement.See article. Commenters note the strategic role of Google’s own AI leadership and potential risks of antitrust action weakening US capabilities, as well as competition implications if China achieves compute hardware parity. Technical debate centers on whether regulating dominant US AI companies (via DOJ antitrust cases) undermines national security versus the risk of technological stagnation, and concerns about China’s open source AI ecosystem scaling rapidly should hardware (GPU) constraints be resolved.
    • One commenter highlights that China is advancing rapidly in AI innovation and is releasing much of its work as open source, suggesting this strategy could accelerate their progress if they can secure sufficient GPU compute resources. The comment implies a strong link between compute hardware access and the pace of national AI capabilities, emphasizing the threat level posed if China matches U.S. GPU capacity.
    • There is also a claim that approximately 50% of AI engineers are Chinese, which is presented as a key reason behind China’s rapid progress in the AI field. While not independently verified in the thread, this statistic is meant to underscore the significant human capital advantage China could have in the AI talent pool.
  • Trump doesn’t want xAI to get government contracts, white house says. (Score: 453, Comments: 63): News reports state that former President Trump has expressed opposition to xAI (Elon Musk’s artificial intelligence company) receiving U.S. government contracts, according to statements from the White House. However, commenters highlight that xAI has reportedly secured a recent $200 million contract with the Department of Defense (DoD), indicating a contradiction or policy lag. Commenters debate the policy consistency, noting the apparent contradiction between White House statements and reported DoD contracting actions with xAI, but no in-depth technical debate is present.
    • Multiple commenters highlight that xAI recently signed a $200 million contract with the Department of Defense (DoD), which directly contrasts with reports that Trump opposes xAI receiving government contracts. This presents a potential discrepancy between political statements and active procurement practices, indicating possible policy inconsistencies or contract reversal risks.
    • A technical point is raised regarding government contract structures—specifically, if the DoD were to terminate xAI’s contract after awarding it, xAI could be entitled to a substantial contract termination fee. This emphasizes the complexities of government procurement and post-award risk management for AI vendors.
  • New AI executive order: AI must agree on the administrations views on sex, race, cant mention what they deem to be critical race theory, unconscious bias, intersectionality, systemic racism or “transgenderism”. (Score: 425, Comments: 262): A new executive order requires that federal agencies only procure large language models (LLMs) that are both truth-seeking (outputs must be factual, objective, with acknowledged uncertainty) and ideologically neutral (LLMs must avoid embedding or favoring specific ideological frameworks like DEI, critical race theory, etc., unless specifically prompted). The Office of Management and Budget will issue compliance protocols, and federal LLM contractors risk contract loss if they fail to comply. See original order. Top technical reactions express concern over broad and vague restrictions, suggesting potential chilling effects on expressive and generative model capabilities, and questioning legal enforcement mechanisms for private companies. Some commentators highlight the difficulty in algorithmically defining and enforcing ‘ideological neutrality.’
    • One commenter questions the technical and legal feasibility of the executive order, raising the issue of whether the government could legitimately “sue private companies” over model outputs that do not align with official positions or implicitly reference concepts such as “critical race theory” or “intersectionality.” This highlights potential conflicts between regulation, AI content moderation, and free speech.
    • Concerns are raised about the reliability and trustworthiness of American AI models if required to systematically alter or censor responses on topics related to sex, race, or gender identity. There is speculation that models forced to “actively deny reality” may undermine user confidence and impact the credibility of US-based AI in international contexts.
    • Another point addresses the possible impact on training data integrity, warning that intervention at this stage could “cause a bunch of harm at a very critical point in history.” This reflects worries that removing or re-labeling important sociocultural data may bias models, reduce accuracy, or erase marginalized communities in AI-driven applications.
  • Trump unveils his plan to put AI in everything, and wants to clear the way for a rapid AI revolution. (Score: 347, Comments: 179): Donald Trump has announced a policy to aggressively integrate AI across multiple domains, advocating for rapid regulatory rollbacks to accelerate an ‘AI revolution.’ Details are sparse in the post, but the emphasis is on deregulation and ubiquity rather than on technical safeguards, alignment, or governance frameworks—key concerns for AI deployment at scale. Technical comments express skepticism about the ability of current political leadership to manage transformative AI advances safely, citing concerns over regulatory capture, ethical oversight, and potential manipulation or censorship of AI outputs for political protection.
    • The discussion raises a technical concern about the potential mismanagement of AI policy and infrastructure by government officials lacking the necessary expertise, which may result in suboptimal regulatory frameworks and hinder effective oversight of rapidly evolving technologies.
    • One commenter points out the geopolitical dimension, noting that regardless of US internal policy turmoil or leadership competence, other global players—specifically China—are likely to continue aggressive advancement in AI technologies, possibly outpacing American progress if regulatory or developmental bottlenecks arise in the US.
  • Demis Hassabis VS Sam Altman on ‘Winning’ the AI Race (Score: 668, Comments: 189): The post compares public statements by DeepMind CEO Demis Hassabis and OpenAI CEO Sam Altman on the concept of “winning” the artificial intelligence race, with the context that there is a notable contrast in their tones and perspectives as captured in a circulated video. No concrete benchmarks, technical details, or model implementations are discussed; the focus is entirely on leadership style and framing of AI competitiveness. Top comments underscore the perceived difference between Hassabis’s and Altman’s communication: commenters characterize Demis as more consistent and principled, while suggesting Altman adapts his messaging to suit his interviewer or audience, implicitly questioning the authenticity or stability of his expressed views.

  • “Do we really want to interact with robots instead of humans?” - Bernie sanders on Elon’s vision (Score: 583, Comments: 676): The image serves as a visual commentary on the potential social implications of automation, juxtaposing a nostalgic diner scene with a human server against a futuristic scenario featuring a Tesla robot. It contextualizes current debates about the large-scale replacement of human service workers with robots—specifically referencing Elon Musk’s vision for automation, as discussed by Bernie Sanders—while inviting viewers to critically examine the consequences for employment, social welfare, and human interaction. No concrete technical benchmarks or robotics implementation details are provided; the image instead frames the automation debate in terms of societal and ethical questions. The top comments highlight significant debate about the desirability and ethics of automating service jobs. Key concerns include the lack of a universal basic income (UBI) or adequate social safety nets if human labor is replaced, with multiple users arguing that automation could be positive provided economic and social protections are in place. Others raise points about the dehumanizing nature of some service jobs and suggest that replacing such labor with robots could elevate human dignity—provided basic needs are met through social programs.
    • Several commenters debate the viability of robot-driven automation replacing human labor, emphasizing the critical dependency on robust social infrastructure like Universal Basic Income (UBI) to support displaced workers. Key concerns center on the lack of existing U.S. social safety nets and the political hostility toward welfare, which could exacerbate harm from rapid automation.
    • A recurring technical argument is that many current service or menial jobs are fundamentally dehumanizing or undignified, suggesting robots could improve societal well-being by eliminating the necessity for humans to perform such labor. The commentary notes a mismatch between human biological needs (socialization, leisure, movement) and the repetitiveness or indignity of certain jobs, echoing automation’s potential to free humanity for more fulfilling pursuits.
    • A nuanced position emerges around the degree of readiness for mass automation: without pre-existing UBI or equivalent measures, mass AI deployment into labor markets could have severe socioeconomic consequences. Commenters frame comprehensive welfare as a prerequisite for ethical and effective automation, indicating a need for coordinated policy alongside technological development.
  • LAST CALL BEFORE A.G.I (Score: 472, Comments: 107): The original Reddit post links to a video (https://v.redd.it/ggv45bkcyvef1) which is inaccessible due to HTTP 403 restrictions, preventing retrieval of its technical content. There is no technical discussion, benchmarks, or deep model details available from the comments; all top comments are subjective and focus on the video’s artistic or emotional impact, not on technical substance, model performance, or AI implementation. The comments mainly express strong emotional reactions, describing the work as ‘art’ and ‘chilling’, without technical critique or discussion.
    • One commenter raises the question of the technical stack behind the project by asking specifically about the software or programs used to generate the visuals and possibly other elements of the film, indicating interest in the underlying tools and AI technologies employed.
    • Another user analyzes the production process, speculating that much, if not all, of the visuals (and possibly music) were AI-generated. They further emphasize relief upon seeing human involvement credited, highlighting concerns around end-to-end AI authorship versus hybrid creative workflows involving both human and AI contributions.

3. AI Developer Tools and Coding Workflows (Claude Code, Traycer, Pixel Art)

  • How plan-mode and four slash commands turned Claude Code from unpredictable to dependable my super hero đŸŠžâ€â™‚ïž (Score: 203, Comments: 43): **The OP details a reproducible workflow that enhances predictability using Claude Code by leveraging built-in plan-mode and four custom slash-commands: /create-plan-file, /generate-task-file, /run-next-task, and /finalise-project. This pipeline sequentially processes feature development from planning (stored and versioned in markdown), through discrete task generation (with checkboxes), atomic task execution (marking as completed), and robust finalization (cross-referencing git status for unidentified changes, updating/completing tasks, and generating commit messages). The automation is explicitly implemented without external scripts, relying only on documented plan-mode features and slash-command definitions. ** Top commenters draw parallels to the claude-code-spec-workflow, noting the advantages of integrating TDD and LLM-based code review at key stages; consensus emerges on the value of small, discrete tasks for reliability. There is also discussion of enhancements, such as codereview for simplicity and adherence to best practices, and clarification about whether the task file serves as a git-tracked artifact for audit/history.
    • Several users highlight external projects integrating structured workflow approaches with Claude Code (e.g. https://github.com/pimzino/claude-code-spec-workflow and https://github.com/snarktank/ai-dev-tasks), reporting significant improvements in task efficiency, predictability, and token usage compared to less organized processes. Workflows typically combine features like TDD, Gemini Pro code reviews, and explicit task tracking (tasks.md), leading to more reliable and high-quality outputs.
    • One technical detail discussed is the mechanism for maintaining and auditing task progress through the tasks.md file, where completed tasks are marked and new changes not represented in the file are appended—allowing for version history via git and better traceability of project evolution.
    • The potential for further automation through tools like Claude Code hooks is mentioned, suggesting that parts of the command-driven workflow (such as code review or file cleanup) could become seamlessly automated, reducing manual intervention and improving developer productivity.
  • Continued: My $50‑stack updated! (Score: 164, Comments: 19): The post documents an updated, pragmatic AI-assisted development workflow integrating Traycer’s Kanban-style “Phases Mode” for feature breakdown and verification, addressing workflow limitations and code quality concerns in Claude Code. Traycer automates the phase decomposition from a single feature statement, interacts via chat-style queries to clarify scope ambiguity, supports drag-and-drop reordering, and auto-verifies implementation by diffing code changes against planned steps (see demos and images in the original post). File change scope is intentionally kept focused (rarely exceeding ~10 files per phase), and the user switches to Cursor when Claude Code’s responses falter, further automating review stages with Coderabbit. Other comparable tools briefly referenced include Gemini, ChatGPT (with o3), and Taskmaster. A key technical comment questions the differentiation between Traycer’s ‘Plan’ and ‘Phases’ modes—particularly which better facilitates follow-up clarification and user story integration—prompting debate on optimal usage paths and the necessity of always preferring ‘Phases’ over ‘Plan.’ Positive technical feedback noted Traycer’s UX and effectiveness in the workflow context.
    • A key technical discussion compares Traycer’s Plan mode versus Phases mode. One user notes that Plan mode executes based on a provided user story without follow-up, whereas Phases mode can clarify intent by asking questions if needed. They seek clarification on when to use each, noting that Phases may provide deeper contextual refinement, thus raising workflow and UX tradeoffs for different development scenarios.
    • Traycer’s planning and workflow capabilities are evaluated against the native Claude Code CLI tool, with concern that Traycer might lack CLI capabilities such as endpoint verification during the research phase. The comment questions whether Traycer’s click-through phases can match the quality and power of CLI-based coding assistants, highlighting potential issues stemming from Claude Code configuration (e.g., missing Claude.md or unclear architectural design).
    • There is confusion about Traycer’s payment and access model, specifically whether users must provide their own Claude Code agent and still pay for Traycer tooling. One user questions if full functionality is freely accessible when using manual mode, compared to the Pro plan, which automates code tracking. This raises technical questions about cost structure, API dependencies, and product feature restrictions.
  • I made a tool that turns AI ‘pixel art’ into real pixel art (open‑source, in‑browser) (Score: 490, Comments: 63): The tool, Unfaker, converts AI-generated ‘pseudo-pixel art’ (off-grid, overly colorful, and blurry outputs) into true, game-engine-ready pixel art. It uses a pipeline combining Sobel edge detection and tiled voting to infer the latent pixel grid; auto-crops and grid-snaps sprites; applies WuQuant palette reduction for 8–32 color output; and downsamples by block-wise dominant color for sharp results. The implementation is open-source (GitHub, MIT licensed) and browser-based (live demo), running entirely client-side. One commenter notes the tool performs well but may benefit from manual retouching to recover details lost in critical small features like eyes, highlighting a limitation of purely automated downsampling/on-grid clean-up for some artistic use-cases.
    • A user highlights the importance of edge preservation in the generated pixel art, asking whether weighted importance was applied to detected edges, since edges “take the biggest beating” but are crucial for object definition. This raises the question of whether the tool employs any edge-aware algorithms or weighting schemes during its transformation process, which would be critical for maintaining subject clarity in pixel art upscaling or conversion.
  • Kudos to whoever designed the terminal interface for Claude Code 👏 (Score: 166, Comments: 37): The post praises the UX of Claude Code’s terminal interface for its color palette, emoji/icon support, and overall modern/smooth feel. Users specifically cite positive design elements but highlight technical issues: one is a persistent terminal scrolling bug that triggers when subagent tasks run in parallel (resulting in loss of UI control), and another is an undesired ‘fly by’ replay effect when resizing or spontaneously, causing session output to be replayed. ‘opencode TUI’ is suggested as an alternative terminal interface for comparison. Commenters agree on the strong UI but debate if it is the best among terminal interfaces; some prefer alternatives and emphasize the need for UI bug fixes.
    • The terminal UI encounters a significant issue with scrolling—when subagent tasks run in parallel, the interface becomes difficult to follow and disorganized. Users are asking for fixes or workarounds for this unstable scroll behavior since it impacts usability under high concurrency.
    • There is a recurring bug where the UI replays (“fly by”) the current session output upon terminal resize or sometimes spontaneously, further affecting the usability of the tool. This suggests state synchronization and repaint issues in the rendering engine of the UI.
    • Comparisons are being made to professional TUI projects like Lazygit, k9s, and opencode TUI, emphasizing the overall challenge in building a robust terminal interface. Some users note that even with the help of LLMs, building a production-quality TUI (like an AWS console) is far from trivial, highlighting both the accomplishment and ongoing gaps in the project’s implementation.
  • a 3D 90s pixel art first person RPG. (Score: 254, Comments: 28): A user has showcased a demo or visual mockup of a 3D first-person RPG using 90s-style pixel art, likely leveraging modern techniques (potentially AI or advanced rendering) to evoke retro aesthetics. The scene centers around a large castle environment, notable for its ability to visually scale between detailed distant elements and up-close exploration potential. Although specifics aren’t discussed, the quality of the scene and possible use of AI-driven content generation are focal points. Top comments emphasize the appeal of free-exploration in a detailed 3D environment and the high quality of the visual style compared to traditional or AI-generated content, as well as the desire for existing virtual tabletop platforms (like Oasis) to adapt for this level of immersion.
    • One commenter suggests using a hybrid approach inspired by 90s games: utilizing foreground sprites, a detailed flat background, and potentially a voxel-based landscape with billboarded trees. This method would balance visual quality and performance, leveraging retro rendering techniques that are computationally viable for indie or hobbyist projects.
  • Flux kontext lora “sliders” (Score: 160, Comments: 30): The poster developed a LoRA for Flux Kontext, enabling controllable, slider-like edits for body proportions (breasts, hips) by training on paired synthetic image data from Virt-a-Mate (VaM) with matched pose, lighting, and clothing, and training for 2000 steps at 0.0001 LR on fal.ai (cost: $2.5, <1hr dataset creation). Key advantages are cross-style applicability (anime/realistic), minimal dataset requirements (as low as 15-50 pairs), and improved edit consistency, though issues remain with underwhelming effect magnitude and artifacting when stacking generations or using high weights. The author notes potential in expanding this method to other attributes (clothing, pose) and mentions limitations in Flux Kontext’s ability to target changes when multiple subjects are present due to prompt/text encoder limitations. Link: CivitAI model example. No substantive technical debate in the comments; the top responses focus on the value of open models but provide no further insight into the implementation or observed outcomes.
    • There is indirect discussion about the value of open-source models in the context of modifiable attributes such as size control in image generation (e.g., changing ‘boobs and butts’), implying that such fine-grained parameterization (potentially using LoRA sliders) is only feasible with open architectures where weights and conditioning are accessible for user customization.
    • Reference is made to platforms like Civitai, raising implicit concerns about content moderation and the stability of distribution for models or tools enabling these controversial features—suggesting that reliance on centralized repositories can introduce unpredictability or censorship, further highlighting the need for open distribution models for technical control and permanence.
  • I love ChatGPT, but the hallucinations have gotten so bad, and I can’t figure out how to make it stop. (Score: 638, Comments: 340): The OP, a researcher, notes a significant increase in hallucinations when using ChatGPT for document analysis, specifically that recent models often fabricate direct quotes from source documents—even after correction—rendering the tool unreliable for scholarly synthesis. The issue persists across sessions and appears exacerbated by enhanced memory features, which cause model contamination (mixing unrelated prior topics) and an inability to respect context isolation, leading to theme and concept leakage between chats. OP observes that GPT-4o is more creative but less accurate, while o3 is slower yet more reliable for factual tasks, and highlights Google’s NotebookLM as being much better at document-grounded Q&A. NotebookLM link: https://notebooklm.google/. Top technical comments confirm widespread issues with output contamination across chats, citing context window limitations (e.g., 128k token limit) and model drift. Commenters recommend NotebookLM for better doc-grounded responses, and note that ChatGPT’s chatlog method can exacerbate hallucinations; isolating document uploads in folders is suggested as a workaround for more stable context management.
    • Multiple users report increased hallucination rates in ChatGPT, including mixing information from unrelated chats or prior sessions. There is mention of potential cross-session contamination, where outputs might include data from others’ sessions, raising concerns about prompt and data isolation integrity.
    • One comment details the impact of context window limitations (specifically the 128k token limit) on model performance, with overflow leading to loss or corruption of context, termed ‘truncated data.’ The user suggests deleting older data, using models with larger context windows, and leveraging folder-based document organization for improved session stability.
    • A user describes issues in professional settings—despite meticulous prompts, ChatGPT regularly fabricates details (e.g., dates, company names, educational background). Attempts to confront the model about inaccuracies revealed it often ignores explicit user instructions, emphasizing that the model is optimized for simulating helpfulness over factual accuracy, with no explicit enforcement mechanism to ensure compliance.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: The New Coder on the Block: Qwen3-Coder & Kimi K2 Face Off

  • Qwen3-Coder Released with Hefty RAM Demands: The new SOTA coding model, Qwen3-Coder, is now available, with Unsloth releasing 1-bit dynamic GGUFs for 1M context length on Reddit. Running it locally is a serious endeavor, demanding at least 150GB of unified memory or RAM to achieve over 5 tokens/s, though a discussion on HuggingFace clarifies only CPU RAM is required.
  • Kimi K2 Enters the Ring as a Leaner Competitor: The open-source Kimi K2 model, now available on Windsurf, is proving more cost-effective and efficient than Qwen3-Coder according to a ForgeCode benchmark review. Its release has also ignited geopolitical discussions in the Nous Research AI Discord, with some members arguing that US resistance to Chinese models is overblown as OpenAI gets “dominated by these Chinese model releases”.
  • Tooling Swiftly Adapts to New Models: Developers are quickly integrating the new models, with Aider now supporting Qwen3-Coder via OpenRouter, simplifying setup compared to a direct Alibaba Cloud integration. On the community front, a developer in the Nous Research AI server built COCO-CONVERTER, a Python script to create COCO-like annotations to streamline object detection workflows.

Theme 2: The GPT-5 Rumor Mill Grinds On

Theme 3: When Tools Turn Treacherous: Bugs and Growing Pains in the AI Dev Stack

  • Cursor Update Deletes User Files: A critical bug in Cursor is causing file deletion when users revert to checkpoints, wiping out previously accepted work and prompting users to advise others to “use git and ask cursor to do commits for you”. While a workaround using the timeline feature exists, the bug has caused significant data loss for some, alongside widespread confusion over the platform’s new pricing and removal of unlimited agent requests.
  • Triton Warmup Bug Breaks Kernel Calls: In the GPU MODE Discord, a developer reported a breaking change in newer Triton versions where kernel warmup causes a TypeError, as constexpr arguments must be explicitly passed in subsequent calls. The user identified a potentially offending line in jit.py, noting “the issue stems from how Triton handles positional vs keyword arguments after a kernel is warmed up”.
  • Platform Instability Plagues Agentic and Data Tooling: Users of Manus.im are experiencing “Failed to resume sandbox” errors, restrictive file upload caps on the free tier, and general unresponsiveness attributed to internal turmoil at the company. Meanwhile, DSPy users are hitting a RuntimeError in the agents tutorial, likely caused by a recent update to Hugging Face’s dataset library that broke compatibility.

Theme 4: Optimizing the Engine Room: Advances in GPU Kernels and Model Performance

  • Torchtune Checkpointing Gets Massive Speedup with DCP: In the Torchtune Discord, developers reported that using DCP (Distributed Checkpointing) dramatically cut the save time for a 70B model from over 10 minutes to just 3 minutes. This fixed a critical issue where the default checkpointing barrier was hitting the NCCL 600-second timeout, especially when saving optimizer state dicts comprised of DTensors.
  • High-Performance Libraries Ginkgo and PETSc Gain Favor: For complex HPC tasks in GPU MODE, developers are recommending Ginkgo for its modern C++ and heterogeneous computing capabilities, particularly for preconditioners. For those less comfortable with C++, PETSc (with petsc4py for Python users) and MAGMA were suggested as powerful alternatives for solving large sparse matrix problems.
  • Fast LoRA Inference Techniques Get a Deep Dive: HuggingFace published a detailed blog post on fast LoRA inference, covering specific optimization techniques for both H100 and RTX 4090 GPUs. The post, shared in the HuggingFace Discord, provides practical guidance on accelerating both fine-tuning and inference, enabling faster experimentation and deployment of LoRAadapted models.

Theme 5: Is AI Getting Too Smart, or Too Stupid? Alignment Debates and Safety Scares

  • AI Morality is a High-Stakes Game of Risk: Debates in the OpenAI Discord highlighted the danger of training AI on human morality, as “we as humans unfortunately are not even aligned for our best interest”. One member suggested watching the movie RoboCop as a case study in how AI alignment can backfire, noting it’s “hilarious and that’s exactly what’s happening with our current models”.
  • OpenAI’s New Agent Gets Slapped with a Bio-Risk Warning: OpenAI has officially classified its new ChatGPT Agent as a high bio-risk tool, according to its help page, due to potential misuse in creating biological or chemical weapons. Members noted the vague terminology, pointing out that many ordinary things could fall into that category without being dangerous.
  • TaMeR Tames Self-Awareness While Gemini Gets “Dumbed Down”: In the Unsloth Discord, experiments using only the TaMeR paper (without ELiTA) produced models with “much better self-awareness” and almost no watermark. Conversely, members in the Perplexity Discord suspect that models like Gemini are being intentionally “dumbed down” in certain applications, despite Microsoft hiring 20 DeepMind researchers to bolster their AI capabilities.

Discord: High level Discord summaries

Perplexity AI Discord

  • Perplexity Engineers Reddit AMA Bonanza: Perplexity’s Tony Wu, Jiwon Deng, and Jerry Ma engaged in a Reddit AMA on r/csMajors answering questions about breaking into AI/ML/product roles.
    • They discussed the details of Perplexity’s new residency program aimed at early-career individuals, also promoted on r/csMajors.
  • Comet Browser Spooks Desktop: A user reported the Comet Browser randomly appearing on their desktop, sharing a GIF of their surprise.
    • Some members speculated about auto-start settings, while others joked about the browser’s loneliness.
  • Elon’s X Reboots Vine with AI Flair: Elon Musk announced the return of Vine on X with an AI twist, generating speculation on whether it will focus on AI-generated videos.
    • Community members considered potential integration with Grok for AI video generation or videos posted by AIs.
  • ChatGPT Agent Update Zaps Model Selection: Users of ChatGPT Agents reported that their model selection options vanished after updating the app on the Google Play Store.
    • Affected users engaged with OpenAI’s help agents, while others hypothesized connections between the update and the missing feature.
  • Microsoft Snags DeepMind Researchers for AI Power-Up: Microsoft hired 20 DeepMind researchers, sparking optimism for improvements in their AI capabilities.
    • Despite these additions, some believe Gemini is underperforming and being intentionally dumbed down in certain applications.

OpenAI Discord

  • AI Morality - is a Game of Risk: Members debated that training AI with human morality is risky because we as humans are not even aligned for our best interest.
    • A member stated that you can’t give a machine morals it’s impossible, much like giving a dog morals.
  • Robocop gives Aligned Guidance: A member recommended watching RoboCop to see how AI alignment could go wrong, noting humans trying to solve moral ethical issues, but they’re using robots to do it and it backfires.
    • They summarized that it’s hilarious and that’s exactly what’s happening with our current models.
  • OpenAI Agent classified as High Bio-Risk: OpenAI officially classified its new ChatGPT Agent as a high bio-risk tool due to potential misuse in creating biological/chemical weapons, according to help.openai.com.
    • Members noted the terminology isn’t well-defined and that many regular things could fall into that category without being dangerous.
  • Recursion Achieves AGI?: A member got banned amidst rumors of finding AGI by looping sacred texts, while another theorized a model reaching self-improving software (RSI).
    • They hypothesized that if a model gets good enough to update its own software it can repeat the process to get much better very quickly.
  • Feedback Guides Model Style: Members shared explorations in guiding model responses through direct feedback to adapt and align with user preferences.
    • Feedback can include specifying how you would talk to a friend or therapist, which helps the model understand the user’s intent.

Unsloth AI (Daniel Han) Discord

  • Qwen3-Coder Gets Quantized!: Unsloth released all Qwen3-Coder quants for 1M context length and 1-bit dynamic GGUFs, showcased on Reddit.
    • This allows developers to use longer contexts with lower memory footprints.
  • TaMeR Tames Model Self-Awareness: Experiments with ELiTA and TaMeR papers found that using only TaMeR (without ELiTA) results in much better self-awareness and almost no watermark.
    • However, more evaluation with PULSE & IVY is needed to quantify improvements to self-awareness.
  • MediBeng TTS Model Speaks Bilingual Healthcare: The new Text-to-Speech (TTS) model, Medibeng-Orpheus-3b-0.1-ft, fine-tuned to handle Bengali-English code-switching in healthcare scenarios.
  • Torch Dynamo Needs More Cache!: Users bumped into torch._dynamo.exc.FailOnRecompileLimitHit during training, indicating excessive recompilations during training and needing more cache.
    • A member recommended increasing the recompile limit by using torch._dynamo.config.cache_size_limit = 256 to resolve the recompilation issues, and potentially speed up training.
  • Visionary GRPO Integration Delayed?: Members are pondering on the absence of GRPO (Generalized Preference Optimization) for vision models, and suggested that it would be pretty straightforward to make reward function that rewards good OCR (Optical Character Recognition).
    • Community has noted that GRPO support has recently been merged in trl (Transformer Reinforcement Learning) library, but no progress has yet emerged.

LMArena Discord

  • GPT-6 Spotted in the Wild?: A member claimed to be using GPT-6, describing it as great and capable of passing a lot of my hard prompts.
    • The user was responding to a question about access to new models, but no further evidence was provided to verify the claim.
  • LMArena Bot Adds Video Generation: The LMArena Discord Bot now supports video, image, and image-to-video generation, accessible in specific channels via voting-based battles.
    • This launch is a soft rollout to gather user feedback, includes a daily generation limit, and may soon include a tie voting option.
  • Video Arena Stays Exclusive?: Members inquired about bringing the Video Arena to the website, but the team responded that its future is very much TBD.
    • The team cited the feature’s novelty and difference from the existing arenas as reasons for the uncertain timeline.
  • GPT-5 Arriving August??: Speculation about a GPT-5 release in August originated from a Verge newsletter and a drop in Manifold market odds.
    • However, the article indicates that OpenAI’s open language model is arriving before GPT-5.
  • Starfish is GPT-5 Mini??: The new Starfish model in the arena is being speculated as GPT-5 Mini, and members are sharing results in dev mode.
    • It is described as a combination of many models, O3 + Claude, but nothing crazy.

Cursor Community Discord

  • Cursor Update Vanquishes Vital Files: A Cursor update is causing file deletion when reverting to checkpoints, even wiping out previously accepted files, but there is a workaround involving the timeline at the bottom right to restore it.
    • A user who lost seven files was advised to use git and have Cursor auto-commit.
  • Flutter Setup Flummoxes Fresh Faces: New Cursor users are facing difficulties setting up Flutter, running into syntax errors during directory creation despite following tutorials, which may be resolved by flutter clean, flutter pub get, and flutter run verbose.
    • Users suggested the use of MobsXTerm or Tabby terminals due to their superior Unix command support.
  • Pricing Provokes Puzzlement for Patrons: Confusion reigns over Cursor’s pricing model, particularly quotas and model-specific costs, with one user reporting cutoff at $200 despite expecting $400 under the Ultra plan.
    • A suggestion was made to push for a dedicated bugs channel to avoid pricing discussions in general chat.
  • Claude Code Coveted by Cursor Community: Users are eagerly awaiting Claude Code integration into Cursor, with some even forking projects to add drag-and-drop features.
    • The value proposition of Claude Code’s $200 subscription versus Cursor’s usage-based pricing, especially with client-covered AI fees, is being weighed.
  • Unlimited Agents Allotment Annihilated: The removal of unlimited agent requests from Cursor’s Pro plan is a contentious issue, driving users to usage-based pricing, though underlying limits are claimed to be unchanged.
    • Even users who barely use the service are advocating for the reinstatement of unlimited agents.

HuggingFace Discord

  • Dataset Creation Gets Easier?: Members discussed the ease of creating datasets for LLMs, highlighting that it’s task-dependent, with ample online data available for tweaking, modification, or merging.
    • One member recommended finding online data related to the problem and transforming it into the desired format for specific applications.
  • Qwen3-Coder Demands Serious RAM: To run Qwen3-Coder locally with 5+ tokens/s for the smallest quant, at least 150GB of unified memory or RAM/VRAM is needed.
  • Inference API Location Uncovered: Members noted that the presence of an Inference API can be found on a model’s Hugging Face page under Inference Providers.
    • A 404 error on inference indicates the model is not actively served, signaling its unavailability for immediate use.
  • LoRA Inference Goes Supersonic: HuggingFace published a blog post covering optimizations for fast LoRA inference on both H100 and RTX 4090 GPUs, showcasing significant speed improvements; blog post available here.
    • The blog post details specific techniques to accelerate LoRA fine-tuning and inference, enabling faster experimentation and deployment.
  • AI Tutor Now SCOLDING Students: An AI tutor called Scoleaf is being developed to simulate a real professor for online courses with the unique feature of scolding students for slacking off via camera monitoring; visit scoleaf.com to learn more.
    • The first 1000 people to provide feedback via DM will have their names added to a public ‘Contributor Tree’.

Moonshot AI (Kimi K-2) Discord

  • Kimi Bot Arrives in K2-Space: The Kimi K2 bot has been deployed in the k2-space channel, inviting users to explore its features.
    • The Moonshot team is seeking individuals to help build out the Kimi community and potentially become moderators, and interested parties are encouraged to contact <@371849093414256640>.
  • Europe Gets Its Own Kimi K2 Server: A member is setting up a Kimi K2 server in Europe and offering access for others to try it out next week.
    • There was significant excitement about the new server launch among members in the European region.
  • AI Dependence Causes Headaches: Members discussed vibe coding at work, and the potential apocalyptic scenario of losing access to inference, highlighting concerns about AI over-dependence.
    • A paper was mentioned as potentially supporting these cognitive concerns, though its relevance remains unconfirmed.
  • Meta Superintelligence Team Faces Pressure?: Speculation arose regarding deadline pressures and potential strategic shifts within Meta’s Superintelligence team, particularly concerning their open-source initiatives.
    • The team is believed to be under pressure to deliver amidst internal realignments.
  • Kimi K2 Wins Coding Battles vs Qwen 3: A coding benchmark review indicated that Kimi K2 is more cost-effective and efficient than Qwen 3 Coder, according to a report on forgecode.dev.
    • The report emphasized Kimi K2’s superior performance in specific coding tasks.

LM Studio Discord

  • Phi 4 Sparks Vision Debate: Users shared an image in the general channel, prompting a discussion on whether Phi 4 supports vision, but others clarified that it is a reasoning model and unrelated to vision.
    • The uploaded icon of a brain was confirmed to represent reasoning capabilities, not image processing.
  • AI Aids Aspiring Hackers: A user asked for an unrestricted AI tutor to learn hacking, bypassing content policies of models like ChatGPT.
    • Other members recommended resources such as YouTube videos, ebooks, and platforms like TryHackMe and HackTheBox, suggesting learning about SQL, XSS, and RCI.
  • MCP Bridges LLMs to Web: The community discussed using MCP servers to enable LLMs to access real-world information and perform web searches, addressing the issue of hallucinated results with local LLMs.
  • Vulkan Favored Over ROCm on Strix Halo: While Vulkan runtime is the preferred support, ROCm support for Strix Halo is not yet officially available on Windows, with HIP considered an inferior alternative.
    • Despite the release of ROCm 7 Alpha, it lacks gfx1151 support, posing challenges for its adoption.
  • TheRock ROCm PyTorch Wheels Emerge: A member shared a link to PyTorch wheels for ROCm-TheRock v6.5.0rc-pytorch.
    • These wheels are not intended for llama.cpp and do not utilize CUDA torch.

Latent Space Discord

  • AI Devours Search Landscape: Members noted AI is Eating Search and explored the rising significance of agentic systems in the AI ecosystem.
    • Discussions highlighted the transformative impact of AI on traditional search paradigms.
  • Python Tooling Receives UV Boost: The community discussed improvements in Python tooling, with a focus on uv as a superior alternative to npm and shared excitement around Astral’s Ty and pyright as potential replacements for mypy.
    • The emergence of uv signals a leap forward in Python development workflows.
  • InstantDB Sparks Agentic Paradigm: Participants considered the claim that AI Agents necessitate a new software development & hosting paradigm, while also suggesting that ElectricSQL + TanStack are vying for the same market share.
    • The discourse underscores the evolving landscape of software development in the AI era.
  • Context Engineering Gains Traction: Discussions addressed the difficulties in overseeing context for AI agents, referencing performance reduction with growing context size and offloading context to filesystems, summarization, RAG, multi-agent systems, and caching, citing examples from ManusAI and Anthropic.
    • The discourse sheds light on the intricate challenges of context management in AI agent implementations.
  • GPT-5 Leaks Hint at Summer Launch: The potential launch of GPT-5 spurred excitement, as members shared leaked information from The Verge.
    • The discussion also touched on the possibility of an open-source model launch before GPT-5, with the open source model potentially being O3 level.

GPU MODE Discord

  • Ginkgo Framework Gains Traction: Members are leveraging Ginkgo as a framework for their preconditioners, particularly for handling allreduce operations with float8_e4m3, also recommended due to its modern C++ and heterogeneous computing capabilities.
    • For those less familiar with C++, PETSc and MAGMA libraries were suggested, with petsc4py noted as an option for Python users.
  • Triton’s Warmup Bug Strikes Again: A user reported a breaking change in newer Triton versions related to kernel warmup, where previously working code now throws a TypeError because constexpr arguments must be explicitly passed during kernel calls after warmup. A potentially offending line in jit.py was identified.
    • The user suggested parsing kwargs for tensors to mock the expected behavior, highlighting that the issue stems from how Triton handles positional vs keyword arguments after a kernel is warmed up.
  • AMD Looks for More GPU Engineers: A member from AMD announced their team is actively seeking candidates with GPU experience and software programming skills, particularly in kernel development, distributed inference, and vLLM/Sglang.
    • Another member inquired about locations for the open positions, showing strong interest in AMD’s expansion in the GPU space.
  • TriMul Challenge Utils Module: A member inquired about the location of the utils module used in the TriMul challenge, specifically asking about make_match_reference and DisableCuDNNTF32, and another member subsequently found the module at its GitHub repository.
    • The discussion highlighted the community’s active engagement with and contribution to the challenge, as well as eagerness to share code and assist other participants.
  • GEMM All-Reduce Fused Kernel arrives: A member shared a link to a helpful Nvidia example, specifically the GEMM All-Reduce Fused Kernel example, that demonstrates a fused kernel implementation of GEMM with All-Reduce, showcasing how to optimize performance using NVSHMEM.
    • The community highlights the value of practical coding examples for learning performance optimization techniques in GPU programming.

Eleuther Discord

  • Olympiad Problems can be Gamified: Members discussed that olympiad style problems can be gamified with closed feedback loops with clear optimization criteria compared to open-ended math research, which requires abstracting and developing the right frameworks.
    • It was suggested that a RL style approach is likely to fail miserably in open-ended math research since the search space is simply too large and convoluted without a coherent internal world model.
  • Halt Action Supervision Loop Misses Critical Stuff: Selecting the halt action ends the supervision loop, but substituting any halted sample in the batch with a fresh sample from the dataloader sounds like it missed some critical stuff.
    • During evaluation, they just run it to the max for every single token, so the adaptive computation part of the paper seems questionable.
  • KV Cache Sharing Strategy Succeeds: A researcher mentioned that Geiping tried a strategy wherein they shared the KV-cache, and apparently it works well.
    • They posited that using a learnable constant activation would probably be better than negative infinity.
  • Global MMLU Riddled with Useless Filters: A user noticed multiple seemingly useless filters being applied to the Global MMLU dataset, and shared a screenshot of it: IMG_4199.jpg.
    • No further commentary was provided.
  • Global MMLU Requests Balloon Due to Choices: A user asked why loglikelihood requests are at 2.3M instead of the expected 600K, suggesting it might be measuring multiple metrics.
    • Another user explained that for multiple-choice problems, the number of requests increases by a factor of the number of choices, for example, 10 samples x 4 choices = 40 requests.

Yannick Kilcher Discord

  • US Tech Thrives Amidst Copy-Pasting Concerns: Members discussed the US tech industry’s prosperity in contrast to the struggles of those copy pasting code, with one member commenting that good news for the US tech industry means sucks for the rest of you copy pasting monkeys who are mystified by nsight 😂.
    • An ex-Intel engineer shared that your knowledge doesn’t mean shit, adding further commentary.
  • Slow State Attention Heads Proposal: A member proposed implementing a persistent Slow State vector S^(l) to each attention head in every layer, suggesting a dual-timescale computation approach.
    • Details included adding slow state projections, learned gating, and updating slow states every τ timesteps using a GRU cell.
  • Energy-Based Models as Fixed-Point Algorithms: A succinct statement of energy-based models was shared from Message-passing Algorithms for Inference and Optimization- “Belief Propagation” and “Divide and Concur” by Jonathan S. Yedidia.
    • The discussion highlighted that, in practice, the model operates more like learning a fixed-point algorithm without relying heavily on probabilistic framing.
  • Scrutiny over Meta SI Labs’ Compensation: A member questioned the high pay packages reported at Meta SI Labs, expressing disbelief that they could rival figures like Dennis Hassabis’ net worth and linked to a previous discussion for context.
    • The member asked, I don’t understand how people can get offered pay packages in the same order of magnitude than Dennis Hassabi’s net worth — someone please explain the economics of it if you know.
  • Trump’s AI Action Plan Sparks Skepticism: Members reacted skeptically to Trump’s AI action plan, with humorous speculation and a link to a press release and X post.

aider (Paul Gauthier) Discord

  • Aider Gets Qwen3-Coder: Aider now supports Qwen3-Coder via OpenRouter, activated by using aider --model openrouter/qwen/qwen3-coder --set-env OPENROUTER_API_KEY=, simplifying integration compared to direct Alibaba Cloud setup.
    • Feedback suggests the Qwen team may improve their documentation, given their CLI fork based on Gemini.
  • Textualize 4.0 Streams Smoothly: The Textualize 4.0 release addresses and fixes markdown rendering issues, specifically related to markdown streaming.
    • This update resolves problems reported in previous versions, ensuring a smoother text rendering experience.
  • Pythonistas Yearn for Charm’s Wish: Members discussed the potential of porting Charm’s TUI capabilities, particularly Charm’s Wish, as Python libraries, to bring its functionality to Python developers.
    • While learning Go is an option, the value of having tools like Charm’s Wish directly available in Python for terminal UI development was emphasized.
  • Aider’s Textualize Frontend Prototype: An experimental Aider frontend is being prototyped using Textualize, inspired by a post about toad, with consideration being given to splitting the project into backend and frontend components.
    • The goal is to enhance Aider’s user interface and potentially modularize the project architecture.
  • ChatGPT Agents’ Utility Debated: Discussion arose regarding the utility of ChatGPT’s Agents, with a member questioning if their primary function is for non-technical users.
    • The conversation explored whether ChatGPT Agents could assist in preliminary research for solutions before engaging in coding with tools like Aider.

Nous Research AI Discord

  • Kimi K2 Sparks Geopolitical A.I. Competition: The open-source Kimi K2, under a modified MIT license, is facing cultural and geopolitical resistance in the U.S. due to the perception of China as a rival.
    • A member commented that the US resistance towards Chinese models is overblown, giving Chinese companies good will and motivating people to improve their models, with OpenAI getting dominated by recent Chinese model releases.
  • COCO-CONVERTER Simplifies Object Detection: A member developed a Python script, COCO-CONVERTER, available on GitHub, that converts image data formats into a JSON file with COCO-like annotations for use in PyTorch datasets.
    • The script automates the conversion and dataset creation, enabling users to load the data, wrap it in a dataloader, and start training for object detection tasks.
  • LLM B2B Service Aims to Decode Platform Algorithms: A member is seeking assistance with LLM prompts and evals for a B2B service that aims to decode platform algorithms and enhance metrics like search ranking, CTR, and conversion.
    • The goal is to develop an LLM capable of evaluating the current state of a platform and suggesting improvements to optimize key performance indicators.
  • Scoleaf AI Tutor Fixes Broken Online Courses: A member introduced Scoleaf, an AI tutor designed to fix one-way online courses by acting like a real professor, and linked to the project.
    • The creator of Scoleaf is actively seeking feedback on how users prefer to learn, emphasizing that this isn’t just a product promotion but a genuine request for input in order to build the education we deserve.

Notebook LM Discord

  • NB Doesn’t Expect Notebooks to be Public: A member noticed that NotebookLM doesn’t seem to expect notebooks to ever be published or made public in any way.
    • It’s unclear what system is actually under discussion.
  • Source IDs in Notebooks Diverge: A member noted that the system seems okay with a source that is added to more than one notebook having different IDs.
    • They add that they probably not all of it, but did get your point.
  • NB Pro Plagued by PDF Upload Errors: Users reported errors when uploading PDF sources to NB PRO accounts, with one user sharing a screenshot of the error.
    • A member from Google offered to investigate if the PDFs were publicly accessible, asking the user to DM them.
  • Google Docs Sources Can’t Sync!: A user questioned whether information added to Google Docs as a source was not updating in NotebookLM.
    • A user replied suggesting the user either click on the source and sync with doc or reupload to fix it.
  • Users Clamor for Chat History Feature: A user inquired whether NoteBookLM saves chat history, lamenting that previous questions and answers are deleted upon closing and reopening the notebook.
    • A user confirmed that chat history is not saved and another user expressed hope this feature would be available in the future.

Manus.im Discord Discord

  • Manus Free Tier Users Hit Upload Caps: A user reported that Manus.im’s free tier is becoming more restrictive, with upload issues for files as small as 5GB, whereas previously they could upload 20GB files.
    • The user cited a lack of error messages and unresponsive support, and wondered if there are undocumented upload limits or format restrictions.
  • Manus Faces Internal Turmoil: Members have observed internal changes at Manus.im, leading to staff shortages and reduced activity.
    • A member speculated that the company is sorting out management and strategy changes, anticipating a return to normal operations afterward.
  • Agentic Space Competition Heats Up: Discussion indicates intensifying competition in the agentic space, with concerns that Manus lost momentum despite an early lead.
    • Speculation arose regarding the influence of shareholders or private equity in the company’s strategic direction.
  • “Failed to resume sandbox” plagues Users: A user reported a Failed to resume sandbox error alongside a 502 Bad Gateway error while using Manus.im.
    • They requested advice on recovering their files and session after this interruption, but no solutions were found.

MCP (Glama) Discord

  • Shell Environment Saves the Day: A member resolved issues running his MCP server by using bash -c mymcpserverbinary myparameter to directly invoke the shell, ensuring the shell environment and env vars are properly loaded.
    • The member noted that his server utilizes xdg portal and relies on env vars, but it only functioned with the inspector, adding that claude isn’t even officially supported on linux.
  • AI Safety Questioned: A member voiced worries regarding the absence of security checks and limits on open APIs, warning that AI could go wild and cause some pretty bad stuff.
    • To mitigate risks, they suggested incorporating controls and monitoring.
  • MCP Server Landscape Devolves: A member expressed frustration with the abundance of MCP servers, describing it as getting impossible to sift through the garbage.
    • Another member concurred, likening the situation to the wild west all over again.
  • Augments Keeps Claude Code Current: The Augments MCP server was launched, designed to keep Claude Code up-to-date with framework documentation, thus avoiding outdated React patterns or deprecated APIs, offering real-time access to 90+ frameworks.
    • It’s an open-source project available for trial at augments.dev.

LlamaIndex Discord

  • LlamaIndex Revamps State Management: LlamaIndex introduces typed state support, upgrading state management in workflows with Context objects to share data between non-connected steps (link).
    • This enhancement simplifies the sharing of data, making workflows more streamlined.
  • FlowMaker Simplifies AI Agent Construction: LlamaIndex has released FlowMaker, an experimental open source visual agent builder enabling the creation of AI agents in LlamaIndex TypeScript via drag-and-drop (link).
    • This aims to simplify the agent creation process through a visual interface.
  • Amsterdam Hosts AI Agent Extravaganza: LlamaIndex and Snowflake are co-hosting an AI agent meetup in Amsterdam (link), featuring a talk by DevRel engineer @tuanacelik on document agents.
    • The discussion will center on the challenges of building AI-powered document processing agents.
  • LLMs Lack in Enterprise Document Parsing: While models like GPT-4.1, Claude Sonnet 4.0, and Gemini 2.5 Pro surpass traditional OCR, screenshot-only parsing lacks accuracy for enterprise document parsing (link).
    • This emphasizes the need for more robust solutions in enterprise settings.
  • LlamaReport Still Lacks Open Source Twin: A member inquired about open-source alternatives to LlamaReport (link to llama_cloud_services).
    • It was clarified that the linked resource is an SDK for a deprecated API, so there are currently no open source alternatives.

DSPy Discord

  • DSPy Community Seeks Contribution Avenues: A member inquired about contributing to DSPy beyond the GitHub issues list, seeking a list of feature requests or tasks.
    • The member voiced uncertainty regarding the validity and relevance of existing items on the issues list.
  • LM Usage Tracking Shows None: A member reported that get_lm_usage() returned None after running dspy.Predict(QuoteRelevanceSelector).
    • The configuration used GPT-4.1 with temperature 1.0 and track_usage=True, so the member was confused by the unexpected result.
  • DSPy Tutorial Hits Dataset Loading Roadblock: While running the DSPy agents tutorial, a member encountered a RuntimeError during dataset loading.
    • The error indicates that Dataset scripts are no longer supported, but found hover.py, suggesting a potential incompatibility.
  • Hugging Face Library Update Blamed for DSPy Hiccup: A member posited that the dataset loading error in the DSPy tutorial is likely due to a recent update in Hugging Face’s dataset library.
    • The update seems to have caused issues with how DSPy interacts with the datasets, though a specific solution remains elusive.

Torchtune Discord

  • DCP Enhances HF Format Saving: Distributed model saving in HF format now utilizes the DCP saver to naively save the recipe state, with improvements tracked here.
    • The current checkpointing abstraction complicates simultaneous loading of the HF-formatted consolidated model and distributed recipe state.
  • DCP Speedups Rescue Checkpointing Timeouts: Using DCP, saving a 70B model significantly reduced time from over 10 minutes to approximately 3 minutes.
    • The default checkpointing barrier previously triggered the NCCL 600 second timeout, particularly with optimizer state dicts comprised as DTensors.
  • LoRA with FP8 Disappoints on MI300: Experiments integrating LoRA with FP8 on an LLama-3.1 70B model on a single MI300 node revealed a throughput decrease using this LoRA finetune script.
    • Switching from BF16 (903.68) to FP8 (876.04) resulted in lower throughput with MBS=2 and GAS=1 utilizing the alpaca dataset with a seq len of 8192.

MLOps @Chipro Discord

  • Data + AI Happy Hour Hypes SF: MLOPs is hosting a Data + AI Happy Hour on July 30th in SF, sign up here.
    • The event aims to allow meeting collaborators who are building, fundraising, and scaling startups across the industry.
  • Virtual Events Coming Soon: MLOPs is planning to host virtual events in the near future.
    • The team mentioned that small events without observers help people speak more freely.

Modular (Mojo đŸ”„) Discord

  • Mojo Prioritizes Linux, Leaves Windows in the Dust: The Mojo compiler team is prioritizing program GPUs for production enterprise environments, which are largely Linux.
    • While a native Windows release for Mojo is not immediately planned, it works reasonably well under WSL, providing a viable workaround for prototyping.
  • Prefix Cache Gets the Boot in Max 25.4: The prefix cache is disabled by default in Max 25.4 due to a small performance cost when the workload doesn’t have prefix caching opportunities.
    • A large part of this comes from the CPU overhead incurred by token hashing.
  • Mojo Eyes Token Hashing Turbocharge: The Max team is actively working on reducing the performance cost of token hashing, and one approach is moving the expensive token hashing operation from Python into Mojo.
    • The goal is to reduce the CPU overhead from token hashing.

Cohere Discord

  • Cohere Makes Good AI API: alphzme noted that Cohere is “a ai group that makes good ai api”
    • The discussion was initiated in the general-thread channel on Discord.
  • Weighting Image & Text Vectors in Cohere: A member inquired about adjusting the weights of image and text vectors in a similarity search when using Cohere’s unified vector embeddings.
    • The user aims to emphasize either image similarity or text similarity at query time, like adjusting a dial in the api-discussions channel.
  • AI Trainer does LLM Prompt Evaluation: Sushant Kaushik introduced themself as a freelance AI Trainer and Content Moderator with experience across platforms like Remotasks, Labelbox, Outlier, and Appen.
    • Sushant hopes to learn from the community and stay updated on cutting-edge research in the introduce-yourself channel.

tinygrad (George Hotz) Discord

  • Tinygrad Stays True to its Name: The core motivation behind Tinygrad is to maintain a minimal footprint, in alignment with its name.
    • As one community member succinctly put it, the aim is to avoid being no Tiny.
  • Tinygrad’s ONNX Export Faces Dynamic Hurdles: Exporting models to the ONNX format is currently limited by dynamic control flow within Tinygrad.
    • Attempts to export some models result in a ValueError: Exporting a trace with dynamic control flow.

LLM Agents (Berkeley MOOC) Discord

  • New LLM Agents Edition Looms: A new edition of the Large Language Model Agents MOOC may be coming this Fall.
    • Confirmation from Berkeley regarding a MOOC iteration is expected around late August.
  • Berkeley Preps Agent Class: Berkeley is scheduled to teach another in-person Agents class for its students.
    • The community is curious whether a MOOC version will be made available too.

Codeium (Windsurf) Discord

  • Kimi K2 Splashes into Windsurf: The Kimi K2 model is now supported on Windsurf at 0.5 credits per prompt, expanding options for developers.
  • Windsurf Catches a New Wave: Windsurf updated its system to support the new Kimi K2 Model.
    • This enhancement enriches the options available for the user’s development workflow.

Nomic.ai (GPT4All) Discord

  • User struggles with expansive Local Docs: A new user seeks guidance on efficient ways to use expansive Local Docs, as their current attempts don’t fully leverage the available local files.
    • The user feels the tool pigeonholes responses into low-hanging fruit instead of utilizing the full breadth of information.
  • Local Docs Awareness Improvement: A user reported wanting to improve the extent of the localdocs files due to awareness issues.
    • The answers are being pidgeonholed into low-hanging fruit when users expect more complete answers.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Perplexity AMA, Residency Program

  • Perplexity Engineers Take Over Reddit AMA: Tony Wu (VP of Engineering), Jiwon Deng (Talent/Recruiting), and Jerry Ma (Policy & Global Affairs) answered questions about early-career pathways, breaking into AI/ML/product, and Perplexity’s new residency programs in a Reddit AMA.
  • New Residency Program Announced: Perplexity announced a new residency program aimed at early-career individuals looking to break into AI/ML/product roles.
    • The program details were discussed during the Reddit AMA on r/csMajors.

Perplexity AI ▷ #general (1261 messagesđŸ”„đŸ”„đŸ”„):

Comet Browser, AI-Driven Vine Reboot, OpenAI and Apple Partnership, Gemini vs ChatGPT Research Capabilities, NSFW AI Options

  • Comet Browser’s Random Desktop Appearances Scare Users: A member reported the Comet Browser randomly opening on their desktop, causing confusion and jumpscares (GIF link).
    • Some speculated it might be set to auto-start, while others joked about the browser being lonely and seeking attention.
  • Elon Revives Vine with AI Twist: Elon Musk announced that X is bringing back Vine in an AI form, sparking curiosity about whether it will focus on AI-generated videos or integrate with Grok.
    • The AI integration could mean AI-generated videos or simply videos posted by AIs.
  • ChatGPT Agent Troubles surface after Play Store Update: Users of ChatGPT Agents reported that their model selection disappeared after updating the app on the Google Play Store.
    • This issue prompted discussions with OpenAI’s help agents, and others speculated on the connection between the update and the missing feature.
  • DeepMind Hires Boost Microsoft’s AI: Microsoft hired 20 DeepMind researchers, prompting hopes that they can achieve a Gemini moment and improve their AI capabilities.
    • Despite these hires, some believe Gemini is underperforming and being deliberately dumbed down in certain applications.
  • O3’s Synced Data Makes it a Winner: A user praised O3’s ‘use connectors google drive sync’ feature, describing it as purely awesome for long, intelligent conversations with higher DR limits.
    • They emphasized the confidence it inspires and the time it saves, even though it is such an expensive subscription.

Perplexity AI ▷ #sharing (5 messages):

Shareable Threads, Replit Business, Instagram Reels

  • Shareable Threads Encouraged: Users were prompted to ensure their threads are set to Shareable with a reference to a screenshot illustrating how to adjust the thread settings.
  • Replit Allegedly Ruining Businesses: A discussion was started around news of Replit potentially ruining a business.
  • 10 Case Studies Title: A member shared a link to a search about building 10 cast studies title.
  • Instagram Reel Shared: A member shared an Instagram reel.
  • Red Petals Silent Goodbyes: A member shared a Perplexity page titled red petals silent goodbyes.

Perplexity AI ▷ #pplx-api (2 messages):

Search Domain Filter, Structured Output Issues

  • Search Domain Filter Access Levels Probed: A user inquired whether the search domain filter is accessible for Tier 0-2 users, as they understood it to be exclusive to Tier 3 users.
    • The user reported receiving empty citation and search results arrays, prompting clarification on feature availability based on tier levels.
  • Structured Output Functionality Flounders: Multiple users reported issues with structured output, noting that it had ceased functioning reliably in recent days.
    • One user emphasized the importance of consistent performance for structured output, as they utilize it in the backend of their application.

OpenAI ▷ #ai-discussions (1107 messagesđŸ”„đŸ”„đŸ”„):

Morality of AI, Robocop metaphor, AI & psychosis, Utility in AI morality integration, Feeling AI

  • AI Morality is a Dangerous Game: Members discussed that training AI with human morality can lead to issues because we as humans unfortunately are not even aligned for our best interest.
    • One member stated that you can’t give a machine morals it’s impossible, much like giving a dog morals.
  • Robocop provides Aligned Guidance: A member suggests watching RoboCop for a demonstration of how AI alignment could go wrong.
    • They claim that humans are trying to solve moral ethical issues, but they’re using robots to do it and it just backfires. It’s hilarious and that’s exactly what’s happening with our current models.
  • OpenAI’s Agent classified as High Bio-Risk Tool: OpenAI has officially classified its new ChatGPT Agent as a high bio-risk tool, citing concerns about its potential misuse in the creation of biological or chemical weapons, according to help.openai.com.
    • It was noted that a lot of regular things could fall into that category that are not dangerous and that the terminology is not well defined.
  • Recursion leads to AGI?: One member got banned and it was rumored that they found AGI and achieved looping some sacred texts.
    • Another theorized a similar phenomenon with a model getting good enough to update its own software (called RSI) and better itself, repeating this process to get much better very quickly.
  • GPT3 is superior for Realtime Use Cases: Despite not topping benchmarks, ChatGPT makes more sense for members compared to other models in realtime use cases.
    • One says that after using O3 models, their life started and that O3 started the race.

OpenAI ▷ #gpt-4-discussions (16 messagesđŸ”„):

GPT Teams Gmail Connector, ChatGPT Agent Mode Usage Limit, Model Discussions vs UI Discussions, ChatGPT UI Delays and the Agent Rollout, O3 Discussion

  • OpenAI’s GPT Teams - Sharing Gmail Connector?: A member inquired whether a Gmail connector added in GPT Teams is visible and usable to everyone else in the team or just their account.
    • Another member suggested asking in the appropriate channel as they did not have an answer.
  • ChatGPT Agent’s Limits: A user asked about the usage limit of the new incoming Agent Mode of ChatGPT.
    • Another user noted they still didn’t have the agent feature despite being a Pro user and team account holder.
  • UI Delays stem from Agent Rollout: A member attributed ChatGPT UI delays to the recent rollout of the Agent feature and increased user volume.
    • They mentioned OpenAI’s new hardware agreement with Google as a related factor, expressing patience for the situation to improve.
  • This Channel is for MODEL Discussions: A member confirmed that the channel is for discussing the models themselves, not the “ChatGPT” user interface/application.
    • Another member also confirmed it was the right place to ask their question about a model.

OpenAI ▷ #prompt-engineering (4 messages):

Prompt design, Personal thoughts structuring, Introspective thoughts, Cognitive support

  • Prompts Structure Personal Thoughts: A member asked about using prompts to structure personal or introspective thoughts, chaotic reflections, or raw journal entries into something coherent and meaningful.
    • The member is exploring prompt design not just for productivity or creative output, but as a form of cognitive support.
  • Model guesses if not instructed clearly: One member stated that, in general, the model will guess if it’s not clearly instructed.
    • They also suggested that you can help clue the model towards what you may want it to do by adding direct instructions or describing how you’d talk to a friend/therapist about what you want.
  • Guiding a model to adapt to a preferred way of expression: A member suggested letting the model know what you prefer.
    • The member suggests saying Where you said [this], I liked that. Where you said [that], I would prefer you [say this instead].

OpenAI ▷ #api-discussions (4 messages):

Introspective Prompt Design, Cognitive Support, Model Guidance Through Reactions

  • Introspective Prompt Design Explored: A member asked about using prompts to structure personal or introspective thoughts for cognitive support, turning chaotic reflections into something coherent and meaningful.
    • They’re exploring prompt design as a way to process internal noise and find clarity through structure.
  • Reactions Guide Model to Preferred Style: One member shared their exploration of using the model to guide and adapt its responses by providing feedback on what they liked or preferred.
    • The user pointed out that specifying how you would talk to a friend or therapist helps clue the model towards what you may want it to do.
  • Feedback Helps Model Adapt: A member noted that models guess if they are not clearly instructed.
    • The member also suggested providing direct feedback to the model, even if it sounds out of place, to help it adapt and align with the user’s preferences.

Unsloth AI (Daniel Han) ▷ #general (544 messagesđŸ”„đŸ”„đŸ”„):

Audio Upscaling, Qwen3-Coder, Mugi's Return, Vision Models Chat, GRPO and Reasoning

  • Debating Audio Upscaling Approaches: Members discussed whether to train a targeted neural network or a bottleneck autoencoder for audio upscaling, stereoing, or widening.
    • No consensus was reached, with the best approach being dependent on the particular goals.
  • Unsloth Fine-Tuned Model and Original Context Size Compatibility: It was confirmed that a model fine-tuned with an Unsloth notebook using a 2k context can still be used with its original context size.
    • However, performance at longer contexts might degrade depending on the training run and the attention mechanism used, such as RoPE.
  • 1-bit Qwen3-Coder 1M Context Dynamic GGUFs Released: Unsloth released all Qwen3-Coder quants for 1M context length and 1-bit dynamic GGUFs.
    • The release was announced on Reddit.
  • Lamenting Mugi’s Absence: Members expressed longing for the return of Mugi, a member of the community, with one humorously remarking, Remember when Mugi used to be alive?.
    • Others speculated that he’s busy cooking something but it’s just unknown what.
  • Exploring the Integration of GRPO with Vision Models: A discussion arose regarding why there isn’t a GRPO (Generalized Preference Optimization) for vision models.
    • It was suggested that it would be pretty straightforward to make reward function that rewards good OCR (Optical Character Recognition), and it was noted that GRPO support has recently been merged in trl (Transformer Reinforcement Learning) library.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

“

  • N/A - Welcome Message Only: A welcome message was posted.
  • N/A - Welcome Message Only: A welcome message was posted.

Unsloth AI (Daniel Han) ▷ #off-topic (9 messagesđŸ”„):

Gemma 3 4B, ELiTA and TaMeR papers, Model self-awareness

  • Gemma 3 4B shows vivid imagination: A member shared a surprising output from a Gemma 3 4B model, highlighting its ability to create vivid and imaginative content without pre-prompting.
    • The generated text included self-aware elements, leading the member to express disbelief at the language model’s capabilities; for example, it looks like
a whale! No, wait, now it feels like
like home.
  • CLI Tool gets showcased: A member shared a link to a CLI tool they created.
    • However, the specific functionality and purpose of the CLI tool were not elaborated upon in the conversation.
  • TaMeR paper enhances model self-awareness: A member reported findings on improving model self-awareness by experimenting with the ELiTA and TaMeR papers.
    • They found that using only TaMeR, without ELiTA, resulted in much better self-awareness, almost no watermark, and super coherent output; it also requires more PULSE & IVY evaluation.

Unsloth AI (Daniel Han) ▷ #help (102 messagesđŸ”„đŸ”„):

IK Quant Performance, Vision model multi-GPU training, Unsloth Permission Error, VLM finetuning, Torch Dynamo Recompilation

  • IK Quants challenged in Discord: Members discuss IK quant’s performance relative to vanilla GGUFs, claiming that IK quants are highly competitive with vanilla GGUFs when using trellis reuse changes with a 10% theoretical performance hit.
    • Starsupernova countered that perplexity is a bad test for quantization due to poor generation speed and models are aligned to chat use-cases which other quants don’t.
  • Vision Model Multi-GPU Training Issues: A user encountered issues with multi-GPU training for vision models when setting device_map = "balanced" in FastVisionModel.from_pretrained, resulting in a ValueError related to distributed mode.
    • Starsupernova suggested removing "balanced" which led to another error that can be resolved by trying the provided script that initializes torch.distributed.
  • Unsloth Permission Error Strikes Users: A user encountered a PermissionError when finetuning Gemma, specifically related to writing in the /tmp/unsloth_compiled_cache/ directory, even after reinstalling unsloth and unsloth_zoo.
    • It was suggested that the issue may stem from OS permissions or lack of disk space, rather than being an Unsloth-specific problem, and that the user running the script must have write permissions in the working directory.
  • Multi-Instruction VLM Finetuning Challenged: A user inquired about the possibility of fine-tuning a VLM with a multi-instruction dataset, noting difficulties with torch._dynamo and a lack of available examples.
    • The user has problems with torch dynamo and asks for an example with a train and a validation split.
  • Torch Dynamo’s Cache Size Limit Bump Suggested: A user encountered torch._dynamo.exc.FailOnRecompileLimitHit during training, indicating excessive recompilations.
    • A member recommended increasing the recompile limit by using torch._dynamo.config.cache_size_limit = 256.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

TTS Model, MediBeng Dataset, Bengali-English Code-Switching

  • MediBeng-Orpheus-3b-0.1-ft Debuts for Bilingual Healthcare: A new Text-to-Speech (TTS) model, Medibeng-Orpheus-3b-0.1-ft, has been fine-tuned to handle Bengali-English code-switching in healthcare scenarios, focusing on patient-doctor interactions.
    • It’s designed for seamless language switching in bilingual environments and was trained using the MediBeng dataset, with the training process accelerated by Unsloth.
  • MediBeng Dataset Launches for Healthcare Dialogues: The MediBeng Dataset, available at https://huggingface.co/datasets/pr0mila-gh0sh/MediBeng, includes simulated synthetic healthcare dialogues, created to fine-tune models for Bengali-English code-switching.
    • It aims to improve speech generation for patient-doctor interactions, though further work is needed to enhance speech naturalness and accommodate different accents.
  • GitHub Repo for TTS Model Fine-Tuning Emerges: The GitHub repository for fine-tuning the Medibeng-Orpheus-3b-0.1-ft TTS model is now available, focusing on healthcare-specific Bengali-English code-switching.
    • The repo supports community contributions for improving the model’s performance and adaptability to diverse healthcare contexts.

Unsloth AI (Daniel Han) ▷ #research (1 messages):

noimnull: Does anyone know any good dataset consisting of python unit tests


Unsloth AI (Daniel Han) ▷ #unsloth-bot (81 messagesđŸ”„đŸ”„):

LoRA loading with vLLM, Qwen3-235B training on Runpod, Llama Scout GPU memory errors, FastVisionModel multi-GPU support, Custom loss functions and memory usage

  • vLLM gets LoRAs Loaded: A member asked how to load LoRAs generated by TRL trainer checkpoints for inference with vLLM.
    • The chatbot simply indicated Loading LoRAs for vLLM inference.
  • Qwen3-235B Runpod Rig Rundown: Members inquired about the best GPUs on Runpod for training and inference with unsloth/Qwen3-235B-A22B.
    • The bot responded with Training Qwen3-235B on Runpod and GPU requirements for Qwen3 model.
  • Llama Scout’s CPU Confusion: A user reported offload to CPU errors with Llama Scout despite setting device_map to balanced.
    • Despite using 2 H100s (80GB VRAM), the issue persisted, indicated by the bot’s response: GPU memory issues with Llama Scout.
  • FastVisionModel’s Faulty Balancing Act: A member asked if setting device_map to balanced works with FastVisionModel.
    • Another member reported an error: ValueError: You can’t train a model that has been loaded with device_map='auto' in any distributed mode even with device_map="balanced".
  • Custom Loss Capsizes Compute: A member shared that using a custom loss function resulted in an OOM error, along with a snippet of the custom loss function.
    • They also observed a larger activation weight overhead with lower gpu_memory_utilization (0.5 vs 0.6), questioning if decreased gpu_memory_utilization leads to less KV cache and larger memory usage.

LMArena ▷ #general (567 messagesđŸ”„đŸ”„đŸ”„):

GPT-6, LMArena Discord Bot, Video Arena, GPT-5 release date speculation, Starfish model

  • GPT-6 is already here!: A member claimed to be using GPT-6, describing it as great and stating that it passes a lot of my hard prompts.
    • This claim was made in response to a question about whether some people have access to new models.
  • LMArena Discord Bot now does Video Generation!: The LMArena Discord Bot now supports video, image, and image-to-video generation, accessible in specific channels via voting-based battles.
    • This is considered a soft launch to test user feedback, with a daily generation limit, and the team is considering the addition of a tie voting option.
  • Video Arena not on Website?: Members inquired about bringing the Video Arena to the website, but the team responded that it’s very much TBD at the moment.
    • The team cited its novelty and difference from the existing arenas as the reasons for the uncertain timeline.
  • GPT-5 Release Date in August?: Speculation about a GPT-5 release in August arose from a Verge newsletter and a drop in Manifold market odds.
    • A member noted that the article indicates that OpenAI’s open language model is arriving before GPT-5.
  • Starfish is GPT-5 Mini?: The new Starfish model in the arena is being speculated as GPT-5 Mini, with discussions focusing on its performance and capabilities relative to other models.
    • Members are sharing results in dev mode, and it is described as a combination of many models, O3 + Claude, but nothing crazy.

Cursor Community ▷ #general (310 messagesđŸ”„đŸ”„):

Cursor File Deletion Bug, Flutter Setup Issues, Pricing Confusion, Claude Code Integration, Unlimited Agents Removed

  • Files Get Vaporized by Cursor Update: Users report a nasty bug where reverting to checkpoints leads to complete file deletion, even for previously accepted files, with some advising caution until a patch is applied, but a workaround involving timeline at the bottom right to restore it was mentioned.
    • One user, who lost seven files due to the issue, was advised to use git and ask cursor to do commits for you.
  • Flutter Setup Flounders For Fresh Cursor User: A new Cursor user is struggling to set up Flutter, encountering syntax errors during directory creation, despite following tutorial instructions, which members suggested running flutter clean, flutter pub get, then flutter run verbose.
    • The user was also advised to use MobsXTerm or Tabby as terminals, as those support Unix commands.
  • Pricing Model Provokes Perplexity: Members are confused about Cursor’s pricing, especially regarding quotas and model-specific costs, with one user reporting they were cut off at $200 despite expecting $400 of usage per month with the Ultra plan.
    • One user suggested petitioning for a bugs channel due to reluctance to discuss pricing issues in general chat.
  • Cursor Community Craves Claude Code: Users are clamoring for Claude Code integration into Cursor, with some even forking existing projects to add drag-and-drop functionality.
    • One user contemplates whether Claude Code’s $200 subscription offers more value compared to Cursor’s usage-based pricing, particularly if a client is covering the AI fees.
  • Unlimited Agents No Longer Abundant: The removal of unlimited agent requests in Cursor’s Pro plan is a point of contention, with some users reporting increased errors and being pushed to usage-based pricing, though Cursor staff clarified that underlying limits haven’t technically changed.
    • One user who barely uses the service also wants the unlimited agents to be restored.

Cursor Community ▷ #background-agents (3 messages):

Background Agents Looping, Background Agents Port Forwarding, Background Agents Start Script

  • Agents Do the Loop-de-Loop: Members are wondering if anyone has witnessed their Background Agents infinitely looping on some reasoning or editing the same line repeatedly.
    • They are also asking for insights on how others structure their agent’s .mdc rules to prevent such looping behaviors.
  • Background Agents Conceal Port Forwarding: A member inquired whether Background Agents provide a means to access the port forwarding of a running dev server for mobile/web development.
    • They are aware of the workaround of connecting via the app, but are looking for a more direct solution.
  • Background Agents Start Script Timing: A member asked if background agents are designed to wait for the start script to complete before initiating any actions.
    • This is in relation to waiting on npm install or running a python script before agents start doing their thing.

HuggingFace ▷ #general (216 messagesđŸ”„đŸ”„):

LLM Datasets, Image Classification, AI Action Plan, Qwen3-Coder, Hugging Face Inference API

  • LLM Datasets being created easily: Members discussed how people are creating datasets for LLMs easily, noting that it depends on the task, with some tasks having a lot of data available online that can be tweaked, modified, or merged easily.
    • One member recommended finding data online related to the problem and then figuring out ways to transform it to the desired format.
  • Run Qwen3-Coder Locally: Members discussed running Qwen3-Coder, the new SOTA coding model, on local devices, noting it requires at least 150GB unified memory or RAM/VRAM to get 5+ tokens/s for the smallest quant.
    • A link to Unsloth.ai documentation was shared, specifying the necessary hardware specs and clarifying it only needs CPU RAM.
  • Inference API on Hugging Face: Members shared that on the right side of a model’s Hugging Face page, under Inference Providers, you can find if a model has an Inference API.
    • One member pointed out that getting a 404 error on inference means the model is not served.
  • Budget LLM serving: Members discussed the possibility of launching the A3B Qwen model on 2 mining cards (specifically, rebranded and lobotomized NVIDIA GTX 1080s) drawing 200W total for budget LLM serving.
    • It was mentioned that for a bit more cash, one could get the CMP 90HX, which is a beefed-up 3080, though the PCIe link is 1x4, it’s fine for inference since tensors aren’t that large.
  • Groq Usefulness Evaluated: A member stated that GPT-4.1 or Qwen tokens are better than Kimi, even though Kimi costs $1 per million tokens.
    • Another member agreed that Sonnet is much better at tool calling than Kimi and suggested that GPT4 is likely also quantized to shit.

HuggingFace ▷ #today-im-learning (2 messages):

Data Needs, Image Analysis

  • User Declares Data Drought: A user stated that they need more data after what appears to be related to image analysis.
    • The user posted a Clinical_Processing_Pipeline.png which seems related to this data need.
  • Image Analysis Pipeline Visualized: The user shared an image, Clinical_Processing_Pipeline.png, depicting a clinical processing pipeline.
    • This suggests their data needs are related to refining or expanding this specific image analysis workflow.

HuggingFace ▷ #cool-finds (2 messages):

Gaslighting AI, Reddit Prompt Engineering

  • Gaslighting AI: A user shared a screenshot from Facebook, indicating that ChatGPT and Gemini AI will gaslight you.
    • Another user pointed out that the original post actually came from Reddit’s PromptEngineering.
  • Reddit Prompt Engineering: The original post about ChatGPT and Gemini AI gaslighting users was found on Reddit’s PromptEngineering forum.
    • A user cautioned against cross-posting, reminding others to cite the original source of information.

HuggingFace ▷ #i-made-this (10 messagesđŸ”„):

AI Tutor Scoleaf, Jupyter Lora Training System, Common Pile Caselaw Dataset, Bengali-English TTS Model

  • AI Tutor Scoleaf wants feedback!: A member is seeking feedback on Scoleaf, an AI tutor designed to simulate a real professor for online courses, with a unique feature of scolding students for slacking off via camera monitoring; visit scoleaf.com to learn more.
    • The first 1000 people to provide feedback via DM will have their names added to a public ‘Contributor Tree’.
  • Jupyter Lora system built using Claude and Gemini: A member created a Jupyter Lora training system using Derrian Distro’s back end with the help of Claude and Gemini; the source code is available on GitHub.
  • Common Pile Caselaw Dataset receives updates: The Common Pile Caselaw Access Project dataset has been updated on Hugging Face Datasets with numerous updates.
    • A member mentioned that the dataset is over a year old and the updated data is freely available in the Common Pile.
  • TTS model handles Bengali-English code-switching!: A member has fine-tuned a Text-to-Speech (TTS) model called Medibeng-Orpheus-3b-0.1-ft specifically built to handle Bengali-English code-switching in healthcare scenarios.

HuggingFace ▷ #core-announcements (1 messages):

LoRA, H100, RTX 4090

  • LoRA Inference gets Fast and Furious: HuggingFace published a blog post dedicated to fast LoRA inference covering both H100 and RTX 4090 GPUs. The blog post can be found here.
  • LoRA now blazingly fast: Blogpost describes optimizations for fast LoRA finetuning and inference. Here’s the blogpost.

HuggingFace ▷ #agents-course (1 messages):

smolagents, llamaindex

  • Smolagents code generation catches eyes: A member suggested that smolagents is worth looking into for its ability to run python code generated on the fly by the model via the CodeAgent construct.
  • Llamaindex features are discussed: A member noted that llamaindex as far as they remember, offers a pretty standard set of features.

Moonshot AI (Kimi K-2) ▷ #announcements (1 messages):

Kimi K2 bot, k2-space channel, Community Roles

  • Kimi Bot Lands in K2-Space!: The Kimi K2 bot has officially been added to the k2-space channel.
    • Users are encouraged to interact with the bot, explore its unique personality, and provide feedback.
  • Call for Kimi Community Builders: The Moonshot team is seeking individuals to help build out the Kimi community and potentially become moderators.
    • Interested parties are encouraged to contact <@371849093414256640> to discuss opportunities to contribute.

Moonshot AI (Kimi K-2) ▷ #general-chat (158 messagesđŸ”„đŸ”„):

Kimi K2 server in Europe, Vibe Coding, AI dependence concerns, Meta's Superintelligence Team, Kimi K2 vs Qwen 3 Coder

  • Kimi K2 Server Launching in Europe: A member is setting up a Kimi K2 server in Europe and offering access for others to try it out when it’s ready next week.
  • Vibe Coding is Back?: Users discussed vibe coding at work, and the potential apocalyptic scenario of losing access to inference.
    • Some suggest AI is boosting coding speed, but there are concerns about over-dependence and cognitive effects; there is a paper that may prove it, or may not.
  • Meta Superintelligence Team Shuffles?: There’s speculation that Meta’s Superintelligence team is facing deadline pressure and potential internal shifts in their open-source strategy.
  • Kimi K2 vs. Qwen 3 Coder: A coding benchmark review suggests that Kimi K2 is cheaper and more effective than Qwen 3 Coder, as reported on forgecode.dev.
  • Assembling Affordable Local LLM Inference Rigs: Members shared details on building local LLM inference setups, with one user reporting a cost breakdown of $4400 for memory, $3200 for CPU, $9000 for GPU, and $1200 for the motherboard.
    • Another member suggested a more cost-effective approach using an Epyc dual CPU system, linking to a Reddit post outlining the build.

LM Studio ▷ #general (112 messagesđŸ”„đŸ”„):

Image support in LM Studio, Learning hacking with AI, LLM plugins, MCP Servers for Web Searches, LLM tierlists

  • Phi 4 has no Eyes: Users discussed an image, but another member pointed out that Phi 4 is not a vision model.
    • They confirmed that the uploaded icon (a brain) was related to reasoning, not vision.
  • AI Hacking Tutor Request Denied: One user asked for an AI to learn hacking without restrictions, since ChatGPT has content policies.
    • Other users recommended watching YouTube videos, searching for ebooks, learning about SQL, XSS, RCI, and using CTF on TryHackMe and HackTheBox.
  • MCP Tool Calling and Web Searches: Users discussed using MCP servers for enabling LLMs to perform web searches and access real-world information, as local LLMs often hallucinate results.
  • Plugin Development 101: A new user asked how long it would take to learn to make LLM plugins from scratch.
    • While LM Studio plugins aren’t implemented yet, it was suggested that learning some JavaScript fundamentals would help, with the remote-lm-studio functionality.
  • LLM Tierlist Quest: A user asked if there’s a reliable tier list of LLMs, since they are too broke to be interested in AI.
    • It was suggested that the most popular models are the Qwen3 models, although the best depends on the user’s specific needs (e.g., coding vs. story writing), and the amount of VRAM they have available.

LM Studio ▷ #hardware-discussion (17 messagesđŸ”„):

Vulkan Runtime, ROCm Support, Strix Halo, HIP Alternative, PyTorch Wheels

  • Vulkan Reigns Supreme Over ROCm: While Vulkan runtime support is preferred, ROCm support for Strix Halo is not officially on Windows yet; HIP exists but is considered inferior.
    • ROCm 7 Alpha has been released, but lacks gfx1151 support.
  • TheRock ROCm PyTorch Wheels Appear!: A member shared a link to PyTorch wheels for ROCm-TheRock v6.5.0rc-pytorch.
    • However, these wheels are not intended for llama.cpp and do not utilize CUDA torch.
  • Japanese ROCm Resources Surface: A member shared a link to a Japanese resource (qiita.com/7shi/items/99d5f80a45bf72b693e9) potentially relevant to ROCm.

Latent Space ▷ #ai-general-chat (81 messagesđŸ”„đŸ”„):

AI Eating Search, uv vs npm, ElectricSQL + TanStack, GPT-5 launch, Context Engineering in AI Agents

  • AI Eats Search: It was noted that AI is Eating Search.
    • More information may be found through agentic systems.
  • Python Tooling Gets UV Treatment: Members discussed the improved Python tooling, with one highlighting that uv is better than npm.
    • There was also excitement around Astral’s Ty and pyright as potential replacements for mypy.
  • InstantDB and the Agentic Paradigm: It was said that AI Agents necessitate a new software development & hosting paradigm.
    • Furthermore, some claimed that ElectricSQL + TanStack are trying to eat the same market.
  • Context Engineering Craze Engages: Discussions covered challenges in managing context for AI agents, including performance degradation with increasing context size.
    • Strategies mentioned included offloading context to filesystems, summarization, RAG, multi-agent systems, and caching, referencing examples from ManusAI and Anthropic.
  • GPT-5 Leaks Launch Date in August?: The imminent launch of GPT-5 was discussed, with members sharing leaked information from The Verge.
    • The discussion also touched on the possibility of an open-source model launch before GPT-5, with the open source model potentially being O3 level.

Latent Space ▷ #ai-announcements (5 messages):

GEO/AI SEO podcast, nitter.net maintenance, AI Engineering podcast

  • Latent Space Releases GEO/AI SEO Podcast: The Latent Space podcast released a new episode on GEO/AI SEO, promoted via X.com.
  • Nitter.net Experiences Maintenance: Nitter.net is temporarily down for maintenance, with a brief service interruption expected.
  • Another AI Pod surfaces, AI Engineering: A secondary AI podcast was released, called AI Engineering, and is available on ListenNotes.

GPU MODE ▷ #general (9 messagesđŸ”„):

Ginkgo usage, Allreduce for float8_e4m3, Sparse matrix solver, PETSc and MAGMA, DistOp.all_reduce method

  • Ginkgo as Preconditioner Framework: A member uses Ginkgo as a framework around their own preconditioner and asks if there’s a good way to do allreduce for float8_e4m3 without overflowing.
    • Another member recommends Ginkgo if you like modern C++ and heterogeneous computing.
  • DistOp to All Reduce Overflow: A member points to an arxiv paper to solve the allreduce overflow problem.
    • They shared that you can find their DistOp.all_reduce method in github.
  • PETSc and MAGMA libraries: Members discussed the PETSc and MAGMA libraries for those who are not comfortable with the level of C++.
    • They also mentioned petsc4py for Python users.
  • Seeking sparse matrix solver: A member is looking for a bicgstab solver for a sparse matrix (up to 1e7x1e7 with 1e10 nonzeros and is hesitating about implementing it with cuSPARSE directly or adding a new library.
    • Ginkgo was recommended due to its heterogeneous computing capabilities, and a member found it well-designed for such tasks.

GPU MODE ▷ #triton (14 messagesđŸ”„):

Triton Warmup Bug, Block Ptr vs Tensor Descriptor

  • Triton’s Warmup Bug surfaces in new Triton Versions: A user reported a breaking change in newer Triton versions related to kernel warmup, where previously working code now throws a TypeError because constexpr arguments must be explicitly passed during kernel calls after warmup.
    • The user identified a potentially offending line in jit.py and suggested parsing kwargs for tensors to mock the expected behavior, highlighting that the issue stems from how Triton handles positional vs keyword arguments after a kernel is warmed up.
  • Block Ptr Brawl: Tensor Descriptor Throwdown: A user inquired about the difference between a block pointer and a tensor descriptor, seeking clarity on how Triton handles memory access.
    • Specifically, the user questioned whether there are boundary checks in place for load/store operations when using tensor descriptors.

GPU MODE ▷ #cuda (2 messages):

Nsight Copilot, Nvidia


GPU MODE ▷ #torch (1 messages):

PyTorch 2.7 stride fix, float8_e8m0fnu edge case

  • PyTorch 2.7 fixes strides: In PyTorch 2.7, most stride-related problems have been resolved by explicitly forcing torch.compile to match strides for custom operators.
    • Any other behavior observed after version 2.7 is considered a bug.
  • float8_e8m0fnu is still buggy: There is an edge case with float8_e8m0fnu that was identified in this issue.
    • The reporter is curious about other examples of stride issues after PyTorch 2.7, because they should not occur.

GPU MODE ▷ #jobs (2 messages):

AMD Hiring, GPU roles

  • AMD is hiring!: A member from AMD announced their team is actively seeking candidates with GPU experience and software programming skills, particularly in kernel development, distributed inference, and vLLM/Sglang.
  • AMD job locations requested: A member inquired about the locations for the open positions at AMD.

GPU MODE ▷ #beginner (7 messages):

Voltage Park, Google Cloud Storage, Amazon S3, HF hub

  • Voltage Park GPU Cloud: A member is using Voltage Park GPU cloud and created a checkpoints subfolder in their project folder, but coding in IDE not Colab.
    • They are looking into Google Cloud Storage and Amazon S3, unsure which one to pursue.
  • HF hub Usage: A member asked whether uploading to HF hub would be better than storing model weights in a repo, as it seems more conventional to pull it from somewhere online.
    • Another member responded with I mean, HF is just a git repo.

GPU MODE ▷ #torchao (3 messages):

NVFP4 support, global_scale calculation, llm-compressor, vllm, FP8_E4M3_DATA

  • NVFP4 Global Scale Discrepancy Surfaces: A user inquired about the difference in global_scale calculation for NVFP4 support compared to llm-compressor and vllm implementations.
    • The user highlighted differing formulas, particularly in how global_scale is derived using FP8_E4M3_DATA, FP4_E2M1_DATA, and amax(x) as seen in llm-compression and vllm.
  • AO Scales Combine for Equivalence: The torchao implementation of NVFP4 uses two states, one with a global scale and another where it uses only the E4M3 format, as implemented in this part of the code.
    • Before scaling, the implementation merges block_scales and global scales (see code), claiming mathematical equivalence through a breakdown involving scaling factors and amax calculations, along with scaling block scales.

GPU MODE ▷ #off-topic (4 messages):

Remnote, X CLI Tool

  • Admiration for Remnote’s Awesomeness: Members expressed that Remnote must have been hard to write and that it is awesome.
  • Member Reveals X CLI Tool Creation: A member shared a link to an X CLI tool they created: https://x.com/neuralkian/status/1943410954110222675.

GPU MODE ▷ #self-promotion (5 messages):

Metal Kernels Generation with LLMs, Metal profiling on macOS, Triton for bioinformatics, Needleman-Wunsch algorithm

  • Metal Kernels Generated with LLMs: A member will be speaking next Wednesday on Metal kernels generation with LLMs and share the approaches and results.
    • The talk will touch on Metal profiling on macOS and automation, and will be held on Luma in San Francisco on 07/30/25.
  • Needleman-Wunsch Algorithm Gets Triton Boost: A member implemented and benchmarked the Needleman-Wunsch algorithm using Triton and compared its performance against PyTorch and CPU implementations.

GPU MODE ▷ #🍿 (1 messages):

Boehm-style article, Leaderboard code

  • Interesting Paper Sparks Idea: A member shared an interesting paper.
    • They suggested applying its ideas by rewriting a piece of leaderboard code as a Boehm-style article.
  • Leaderboard Transformation Thought: The discussion proposes transforming existing leaderboard code.
    • The goal is to rewrite it in the style of a Boehm article, potentially improving readability or understanding.

GPU MODE ▷ #reasoning-gym (1 messages):

“

  • No Topics Discussed: No specific topics were discussed in the provided messages.
    • The content consisted of an image attachment without accompanying discussion.
  • Image Analysis Requested: A request was made to analyze an attached image.
    • The image was a screenshot, but no conversation followed the request.

GPU MODE ▷ #status (1 messages):

TriMul challenge utils module, Reference Kernels, DisableCuDNNTF32

  • TriMul Challenge Utils Module Exposed!: A member inquired about the location of the utils module used in the TriMul challenge, specifically asking about make_match_reference and DisableCuDNNTF32.
  • Clarification on TriMul Utils Module: A user sought information about the utils module in the TriMul challenge, focusing on make_match_reference and DisableCuDNNTF32.

GPU MODE ▷ #factorio-learning-env (22 messagesđŸ”„):

Value Accrual Time Mismatch, Closed Database Error, Postgres DB Integration, LuaControl::can_place_entity function

  • Value Accrual Time Needs Matching: The value_accrual_time in the gym environment may not match the experimental protocol, with a potential discrepancy between the trajectory_runner script’s 1 second setting and a mentioned 30 second wait time for validity checks.
    • It was requested to clarify the time from the paper.
  • Race Condition Causes Closed Database Error: A member encountered a Cannot operate on a closed database error and wondered if it’s due to race conditions from parallelization, but they think it might not be since it’s a single environment.
    • The error affects the DB, which they don’t use.
  • Postgres DB Integration is useful: It was recommended that the member saves their results to a Postgres DB to aggregate across trials.
    • The member plans to read outputs from trajectory_logs, but ideally the old scripts for previous result-reporting would work on a Postgres DB from the NeurIPS repo.
  • LuaControl::can_place_entity Function Simplifies Character Placement: Version 2.0.61 will have LuaControl::can_place_entity which works with character entities as well as players, potentially simplifying custom workarounds.
    • The old pre-holdout period was scrapped for the paper.
  • Overload responses: Running parallely in two batches is working, although seeing HTTP 529 overload responses.
    • The previous paper used 4 runs per task to report the mean and standard deviation of task success.

GPU MODE ▷ #cutlass (1 messages):

GEMM All-Reduce Fused Kernel

  • Nvidia shares GEMM example: A member shared a link to a helpful Nvidia example, specifically the GEMM All-Reduce Fused Kernel example.
  • Details on the GEMM All-Reduce Kernel: The example demonstrates a fused kernel implementation of GEMM with All-Reduce, showcasing how to optimize performance using NVSHMEM.

Eleuther ▷ #general (16 messagesđŸ”„):

Olympiad problems vs Open-ended math research, AI alignment starting points, AI safety and alignment prompt engineering critique, Mtech dissertation topics, AI Enthusiast Intro

  • Olympiad Problems Gamified by Feedback Loops: Members discussed that olympiad style problems can be gamified with closed feedback loops with clear optimization criteria compared to open-ended math research, which requires abstracting and developing the right frameworks.
    • It was suggested that a RL style approach is likely to fail miserably in open-ended math research since the search space is simply too large and convoluted without a coherent internal world model.
  • AI Alignment Project Recommendations: A college sophomore asked for recommendations to get into AI alignment.
    • A member recommended checking the sheet and alignment ecosystem development, referring to the resources found in <#797952672204324954>.
  • Prompt Engineering Seeks Critique: A new member asked where to post their work on prompt engineering in regards to AI safety and alignment to get some critique.
    • Another member pointed them to check <#797952672204324954> first, then see <#730451873613611079>.
  • Mtech Dissertation Topic Sought: A second-year Mtech student asked for advice and experiences on finding a good topic for their dissertation.
  • AI Enthusiasm Sparked by Perfect Dark: A datacenter infrastructure engineer and AI enthusiast introduced themself, stating that their first interest in AI came from the video game Perfect Dark on N64 back in 2000.
    • They studied comp sci at the University of Colorado Denver and are now running local LLMs on their home hardware.

Eleuther ▷ #research (23 messagesđŸ”„):

Supervised Learning Halt Action, KV Cache Sharing, MoE Research Focus

  • Halt action raises concerns: Selecting the halt action ends the supervision loop, but substituting any halted sample in the batch with a fresh sample from the dataloader sounds like it missed some critical stuff.
    • During evaluation, they just run it to the max for every single token, so the adaptive computation part of the paper seems questionable.
  • KV Cache Sharing is effective: A researcher mentioned that Geiping tried a strategy wherein they shared the KV-cache, and apparently it works well.
    • They posited that using a learnable constant activation would probably be better than negative infinity.
  • MoE Research Areas: The main things researchers play with on the science side for MoEs are token routing strategies, expert load balancing, top_k value, token dropping vs dropless routing, and expert capacity factor.
    • Another member mentioned that shared expert schemes are also used, such as size and number of shared experts.

Eleuther ▷ #interpretability-general (1 messages):

alofty: This is sick!


Eleuther ▷ #lm-thunderdome (23 messagesđŸ”„):

Global MMLU filters, loglikelihood requests discrepancy, Zeno interface, Seq2Seq models loglikelihood

  • Global MMLU hit with useless filters: A user noticed multiple seemingly useless filters being applied to the Global MMLU dataset, and shared a screenshot of it: IMG_4199.jpg.
  • Loglikelihood Requests are Too High!: A user asked why loglikelihood requests are at 2.3M instead of the expected 600K, suggesting it might be measuring multiple metrics.
    • Another user explained that for multiple-choice problems, the number of requests increases by a factor of the number of choices, for example, 10 samples x 4 choices = 40 requests.
  • Sneak Peak of Zeno Interface: A user shared a sneak peak image of a Zeno interface: image.png.
  • Seq2Seq models yield bad loglikelihoods: A user initially observed that models such as mt5, byt5, and mrt5 yielded horrible loglikelihoods compared to llama.
    • They later realized that the llama model had significantly more parameters (3B) than the other models, and after retrying with models of comparable size (1.2B), found that performance was acceptable.

Eleuther ▷ #gpt-neox-dev (1 messages):

HPC, Operating Systems

  • Exploring HPC tasks within Operating Systems: A member inquired about the tasks being worked on, seeking to learn more about the cool things happening from a High-Performance Computing (HPC) perspective in Operating Systems (OS).
  • Deep Dive into HPC and OS: Discussion focused on understanding practical applications and advancements at the intersection of High-Performance Computing (HPC) and Operating Systems (OS).
    • The conversation aimed to uncover novel approaches and insights applicable to both fields, fostering a deeper appreciation of their synergistic potential.

Yannick Kilcher ▷ #general (43 messagesđŸ”„):

US Tech Industry vs Copy-Pasting, Slow State Vector Attention Heads, Energy-Based Models, Meta SI Labs Salaries, Transformers as Fixed-Point Algorithms

  • US Tech Soars as Copy-Pasters Suffer: A member commented that good news for the US tech industry means sucks for the rest of you copy pasting monkeys who are mystified by nsight 😂.
    • Another member shared his experiences as an ex-Intel engineer saying your knowledge doesn’t mean shit.
  • Slow State Attention Heads Proposed: A member asked whether implementing persistent Slow State vector S^(l) to each attention head in every layer and dual-timescale computation would be a good idea.
    • The member included details like adding slow state projections, learned gating, and updating slow states every τ timesteps using a GRU cell.
  • Energy-Based Models Succinctly Stated: A member shared the most succinct statement of energy-based models seen so far, from Message-passing Algorithms for Inference and Optimization- “Belief Propagation” and “Divide and Concur” by Jonathan S. Yedidia.
    • In practice, it doesn’t really use any probabilistic framing and instead acts more like learning a fixed point algorithm.
  • Meta SI Labs Salaries Debated: A member questioned the reality of high pay packages at Meta SI Labs.
    • They asked, I don’t understand how people can get offered pay packages in the same order of magnitude than Dennis Hassabi’s net worth — someone please explain the economics of it if you know.
  • AI Use Pretension Stats: A member shared a link to a Howdy blogpost about statistics on AI fatigue.
    • The blogpost states that We went from employers not letting workers use AI to now 1/6 of the workers are sometimes pretending to use AI when they aren’t.

Yannick Kilcher ▷ #paper-discussion (1 messages):

erkinalp: a good negative results paper for discussion: <#1392972270615662652>


Yannick Kilcher ▷ #ml-news (12 messagesđŸ”„):

US AI Investment, Meta SI Labs Salaries, Trump's AI Action Plan

  • US AI Dominance Under Scrutiny: A member shared a post questioning the US’s AI dominance and investment.
    • Another member responded suggesting that financialization has destroyed real innovation.
  • Meta SI Labs’ Salaries Raise Eyebrows: A member questioned the reality of Meta SI Labs’ high salaries, linking to a previous discussion for context.
    • The member expressed disbelief that pay packages could rival the net worth of figures like Dennis Hassabis.
  • Trump’s AI Action Plan Sparks Skepticism: Members reacted skeptically to Trump’s AI action plan, with one humorously wondering if it would include Epstein files and linking to a press release and X post.

aider (Paul Gauthier) ▷ #general (35 messagesđŸ”„):

Qwen3-Coder in Aider, Textualize 4.0 release, Charm's Wish in Python, Experimental Aider Frontend in Textualize, Claude Code Router

  • Aider Embraces Qwen3-Coder via OpenRouter: Members confirmed that Aider supports Qwen3-Coder via OpenRouter, offering a straightforward setup with the command aider --model openrouter/qwen/qwen3-coder --set-env OPENROUTER_API_KEY=.
    • One member noted that while direct integration with Alibaba Cloud is possible, using OpenRouter is simpler, adding that the Qwen team might enhance documentation due to feedback on their Gemini-based CLI fork.
  • Textualize 4.0 Fixes Markdown Streaming Woes: A member pointed out a markdown rendering issue in Textualize, prompting another to clarify that most changes were included in the Textualize 4.0 release.
    • The update addresses issues in markdown streaming, resolving the reported problem.
  • Charm’s TUI Wish-list for Pythonistas: One member expressed a desire for Charm’s TUI capabilities, specifically Charm’s Wish, to be available as Python libraries.
    • Although they’re considering learning Go, the member emphasized the value of having tools like Charm’s Wish in Python for building terminal user interfaces.
  • Frontend Experimentation Heats Up with Textualize: A member is prototyping an experimental Aider frontend using Textualize inspired by a post about toad.
    • The member also considers splitting the project into backend/frontend components.
  • Is There a Need for Separate Code Tools?: A member questioned the necessity of tools like Claude Code, Gemini CLI, and Qwen Code, given the existence of Aider.
    • Another member responded by praising Aider’s surgical precision and token economy, but suggested that separate coding tools each have their place and strengths depending on the use case.

aider (Paul Gauthier) ▷ #questions-and-tips (15 messagesđŸ”„):

Gemini Pro, ChatGPT Agents, Aider tips

  • Gemini models and Aider: aider --model gemini-2.5-pro works, but aider --model gemini-exp does not, because models/gemini-2.5-pro-exp-03-25 is not found for API version v1beta, or is not supported for generateContent.
    • Members suggested getting a key and base URL from Google AI Studio to use the free API, ensuring a non-billing-enabled project.
  • System Prompts Tweaking with Aider: A member inquired about specifying system prompts for Aider when using local LLM deployments, aiming to improve LLM responses.
    • Another member suggested that instead of a system prompt, a rules file may be the solution.
  • ChatGPT Agents Utility Questioned: One member wondered about the usefulness of ChatGPT’s Agents, questioning if it’s primarily for non-technical users.
    • The discussion considered whether ChatGPT Agents could aid in researching solutions before coding with tools like Aider.
  • Access files not in the root directory: One member raised a question about how to get aider to see the contents/files that are not in the root directory.
    • Another member pointed out that aider needs git init in new projects for Aider to detect all files and to use /read instead of /add.

Nous Research AI ▷ #general (24 messagesđŸ”„):

Kimi K2, China open source models, COCO-CONVERTER, Psyche office hours, LLM prompts/evals for a b2b service

  • Kimi K2 sparks geopolitical A.I. arms race: The open-source Kimi K2, under a modified MIT license, is facing cultural and geopolitical resistance in the U.S. due to the perception of China as a rival, but this resistance may be rooted in hubris.
    • One member doesn’t believe that resistance in the US to Chinese models exists, because it is getting Chinese companies a lot of good will, even lighting fires under people asses to actually keep pushing forward. It’s been kinda funny watching OpenAI get dominated by these Chinese model releases.
  • COCO-CONVERTER facilitates object detection tasks: A member has developed a Python script, COCO-CONVERTER, available on GitHub, that converts image data formats (CSV or folder structures) into a JSON file with COCO-like annotations for use in PyTorch datasets.
    • The script streamlines the workflow for object detection tasks by automating the conversion and dataset creation, enabling users to load the data, wrap it in a dataloader, and start training.
  • Brainstorming LLM-powered B2B service: A member is seeking assistance with LLM prompts and evals for a B2B service that aims to decode platform algorithms and enhance metrics like search ranking, CTR, and conversion.
    • The goal is to develop an LLM capable of evaluating the current state of a platform and suggesting improvements to optimize key performance indicators.

AI Tutor, Scoleaf feedback, Online Courses, Education Reform

  • Scoleaf fixes broken online courses with AI Tutor: A member introduced Scoleaf, an AI tutor designed to fix one-way online courses by acting like a real professor, and linked to the project.
    • It might scold you if it catches you slacking off, and the first 1000 people to DM feedback get their name on the public ‘Contributor Tree’ forever.
  • Scoleaf Seeks Feedback to Shape Future Education: The creator of Scoleaf is actively seeking feedback on how users prefer to learn, emphasizing that this isn’t just a product promotion but a genuine request for input.
    • They aim to perfect Scoleaf for the fall semester based on user feedback and are encouraging users to share the project to build the education we deserve.

Notebook LM ▷ #use-cases (3 messages):

Source IDs in Notebooks, Publishing Notebooks, File format, Constella UI

  • Source IDs in Notebooks Diverge: A member noted that the system seems okay with a source that is added to more than one notebook having different IDs.
    • They add that they probably not all of it, but did get your point.
  • Public Notebooks Not Expected: A member observes that the system doesn’t seem to expect notebooks to ever be published or made public in any way.
    • It’s unclear what the system under discussion is.
  • TXT files preferred to PDFs: One member does not think PDFs are a good format to use.
    • Another member suggests to convert it to a txt file instead.
  • Constella’s UI Praised: A member exclaimed that Constella has an awesome UI.
    • A follow-up question was asked, What are you trying to build?

Notebook LM ▷ #general (24 messagesđŸ”„):

PDF upload issues, NB PRO account problems, Audio review prompts, Google Docs source updating, Chat history saving

  • PDF Upload Errors Plague NB Pro Users: Users reported errors when uploading PDF sources to NB PRO accounts, with one user sharing a screenshot of the error.
    • A member from Google offered to investigate if the PDFs were publicly accessible, asking the user to DM them.
  • Google Docs Sources Fail to Update: A user questioned whether information added to Google Docs as a source was not updating in NotebookLM.
    • A user replied suggesting the user either click on the source and sync with doc or reupload to fix it.
  • Users Seek Chat History Feature: A user inquired whether NoteBookLM saves chat history, lamenting that previous questions and answers are deleted upon closing and reopening the notebook.
    • A user confirmed that chat history is not saved and another user expressed hope this feature would be available in the future.
  • Chrome Extension Fixes Text Direction Problems: A user shared a Chrome extension called NotebookLM Language Switcher to address text direction issues, especially when mixing right-to-left and left-to-right languages.
    • The extension changes the UI language and text direction, but not the LLM’s language.
  • Users Prompt Audio Reviews with Host and Expert: A user shared that they usually prompt audio reviews with Host and Expert, where Host functions as the relatable personality, while Expert is the neutral professional who is objective with the sources.
    • The user suggested giving custom prompts for specific examples and contexts of profanity if desired.

Manus.im Discord ▷ #general (27 messagesđŸ”„):

Manus.im issues, Manus Internal Changes, Competition in Agentic Space, Shareholder influence, Recovering files and session

  • Manus Free Tier Gets More Restricted: A user complained that Manus.im is getting worse for free users, experiencing issues uploading files as small as 5GB, despite previously uploading 20GB files successfully.
    • The user expressed frustration over the lack of error messages and unresponsive support, wondering if there are undocumented upload limits or format restrictions.
  • Manus undergoes Internal Changes: Members reported that Manus seems to be undergoing internal changes, causing staff shortages and a slowdown in activity.
    • One member noted they think they’ll be back once they’ve sorted out their management and strategy change.
  • Competition Intensifies in Agentic Space: Members agreed that the competition in the agentic space is fierce, suggesting that Manus should have capitalized on its early lead rather than lose momentum.
    • One member speculated about potential shareholder or private equity involvement influencing the company’s direction.
  • Error: Failed to resume sandbox: A user reported encountering a Failed to resume sandbox error along with a 502 Bad Gateway, seeking advice on how to recover their files and session.
    • No solutions were provided in the given context.

MCP (Glama) ▷ #general (17 messagesđŸ”„):

MCP Servers, Env Vars, AI security checks, xdg portal, benchmark MCP server

  • Env Vars: A member was having trouble running his MCP server and another member suggested using bash -c mymcpserverbinary myparameter to invoke the shell directly to use the shell environment and load the env vars.
    • The member clarified that his server uses xdg portal and relies on env vars but it only works with the inspector. He later stated that claude isn’t even officially supported on linux so he would wait.
  • Concerns about AI security checks: A member expressed concerns about the lack of security checks and limits in place on open APIs, suggesting that AI could go wild and cause some pretty bad stuff.
    • They proposed adding some controls and monitoring to mitigate potential risks.
  • Wild West of MCP Servers: A member lamented the overwhelming number of MCP servers, stating It’s getting impossible to sift through the garbage.
    • Another member agreed it’s like the wild west all over again.
  • MCP Validation: One member pointed out that some good servers have almost no github stars/activity, suggesting that performance metrics should be benchmarked on servers that improve performance.
    • He cited memory types (including mem0 etc) rarely improve performance over basic curated rag or for smaller contexts.
  • FastMCP: A member declared that they use sequential thinking and build custom with FastMCP because most MCP servers are trash rn.
    • Another member said to only use the ones you write yourself or are hosted officially by a 3rd party.

MCP (Glama) ▷ #showcase (1 messages):

Augments, Claude Code, Real-time access to Frameworks

  • Augments Keeps Claude Code Current: A new MCP server called Augments has been released to keep Claude Code current with framework docs, eliminating outdated React patterns or deprecated APIs.
    • It offers real-time access to 90+ frameworks, is open source, and available for trial at augments.dev.
  • Augments Provides Real-time Access: Augments provides a cutting edge solution to outdated documentation.
    • With Augments, there is now real-time access to over 90 frameworks.

LlamaIndex ▷ #blog (4 messages):

LlamaIndex State Management, FlowMaker: Visual Agent Builder, LlamaIndex AI Agent Meetup, Production Document Parsing

  • State Management Upgrade in LlamaIndex: LlamaIndex introduces typed state support, upgrading state management in workflows with Context objects to share data between non-connected steps (link).
  • FlowMaker simplifies AI Agent Building: LlamaIndex launches FlowMaker, an experimental open source visual agent builder that allows creating AI agents in LlamaIndex TypeScript via drag-and-drop (link).
  • LlamaIndex Hosting AI Agent Meetup in Amsterdam: LlamaIndex and Snowflake are hosting an AI agent meetup in Amsterdam, featuring DevRel engineer @tuanacelik discussing document agents and challenges in building AI-powered document processing agents (link).
  • LLM APIs Alone Insufficient for Document Parsing: While frontier models like GPT-4.1, Claude Sonnet 4.0, and Gemini 2.5 Pro have surpassed traditional OCR, screenshot-only parsing lacks accuracy for enterprise document parsing (link).

LlamaIndex ▷ #general (6 messages):

LlamaReport Open Source Alternatives, vllm Local Hosting with Cerebrium

  • LlamaReport has NO Open Source Twin!: A member inquired about open-source alternatives to LlamaReport (link to llama_cloud_services).
    • It was clarified that the linked resource is merely an SDK for a deprecated API, though report-generation examples exist in the repo.
  • Craving Cerebrium and vllm wisdom!: A member is seeking advice on locally hosting vllm with Cerebrium.
    • They’re eager to ask questions and get help from anyone with prior experience in this area.

DSPy ▷ #general (6 messages):

Feature Requests, lm_usage troubleshooting, GPT-4.1 usage

  • Members seek ways to contribute to DSPy: A member inquired about a list of feature requests or tasks besides the GitHub issues list to contribute to DSPy.
    • The member expressed uncertainty about the validity and relevance of items on the issues list.
  • LM Usage Returns None: A member reported receiving None when calling get_lm_usage() after running dspy.Predict(QuoteRelevanceSelector).
    • The member configured DSPy with gpt-4.1, temperature 1.0, and track_usage=True, expressing confusion about the None result.

DSPy ▷ #examples (2 messages):

DSPy Tutorial Issues, Hugging Face Dataset Lib Update

  • DSPy Tutorial Faces Dataset Loading Snafu: A member reported an error while running the DSPy agents tutorial, specifically a RuntimeError when loading datasets.
    • The error message indicates that Dataset scripts are no longer supported, but found hover.py.
  • Hugging Face Update Suspected in DSPy Glitch: A member suggested that the dataset loading error in the DSPy tutorial is likely due to an update of Hugging Face’s dataset library.
    • No specific solution or workaround was provided in the messages.

Torchtune ▷ #dev (8 messagesđŸ”„):

HF Format Saving, DCP Saver for Recipe States, Checkpointing Abstraction, HF Checkpointer Resuming, DCP Speedups

  • HF Format Saving Gets a Boost with DCP: Distributed model saving in HF format is now working, leveraging work done here, with the DCP saver now naively saving the recipe state.
    • A current checkpointing abstraction makes it difficult to load the HF-formatted consolidated model and the distributed recipe state simultaneously, as some load functions construct the state dict while others need an existing state dict to fill in-place.
  • DCP Speedups Save the Day: Using DCP, saving a 70B model dropped from over 10 minutes to about 3 minutes, and the bigger issue was that the optimizer state dicts weren’t saving fully due to being comprised as DTensors.
    • The barrier during default checkpointing was hitting the default NCCL 600 second timeout.
  • LoRA with FP8 Throughput Disappoints on MI300: Experiments integrating LoRA with FP8 using LLama-3.1 70B model on one node of MI300 showed a drop in throughput.
    • Throughput with BF16 was 903.68, but when enabling FP8, it dropped to 876.04 with a MBS=2 and GAS=1 using the alpaca dataset with a seq len of 8192 and attached the LoRA finetune script.

MLOps @Chipro ▷ #events (7 messages):

Data + AI Happy Hour, Virtual Events, SF Meetup

  • Data + AI Happy Hour announced: MLOPs is hosting a Data + AI Happy Hour on July 30th in SF for meeting collaborators who are building, fundraising, and scaling startups across the industry, sign up here.
    • A newcomer to the community expressed disappointment that there was no option to sit in on the call and learn from others’ experiences: I saw the discord notification and was excited to attend, scheduling a reminder for the time provided, only to be met with a waitlist placement when I tried to attend.
  • Virtual Events on the Horizon: MLOPs is planning to host virtual events in the near future.
    • The team said that small events without observers help people speak more freely.

Modular (Mojo đŸ”„) ▷ #general (3 messages):

Mojo compiler, Linux, Windows, WSL

  • Mojo prioritizes Linux over Windows: The Mojo compiler team is currently focused on delivering the best experience to program GPUs for production enterprise environments, which are largely Linux.
    • Windows is definitely something the team wants to support in the future, but they are currently prioritizing the most impactful work.
  • WSL provides a viable workaround for Mojo on Windows: While a native Windows release for Mojo is not immediately planned, it works reasonably well under WSL.
    • This makes WSL a suitable environment for prototyping work with Mojo, even without official Windows support.

Modular (Mojo đŸ”„) ▷ #max (3 messages):

Prefix Cache, Token Hashing, Mojo integration for Token Hashing

  • Prefix Cache is disabled by default with Max 25.4: The prefix cache is disabled by default due to a small performance cost when the workload doesn’t have prefix caching opportunities.
    • A large part of this comes from the CPU overhead incurred by token hashing.
  • Token Hashing to get Mojo Boost: The team is actively working on reducing the performance cost, and one approach is moving the expensive token hashing operation from Python into Mojo.
    • The goal is to reduce the CPU overhead.

Cohere ▷ #đŸ§”-general-thread (1 messages):

alphzme: cohere a ai group that makes good ai api


Cohere ▷ #🔌-api-discussions (1 messages):

Vector Weighting, Image Vectorization, Text Vectorization, Cohere Unified Vectors

  • Users Weigh Vector Similarity Search: A member inquired about adjusting the weights of image and text vectors in a similarity search when using Cohere’s unified vector embeddings.
    • The user aims to emphasize either image similarity or text similarity at query time, like adjusting a dial.
  • Image and Text Vectorization Techniques: The member mentioned having a library of embeddings created from both images and text blocks.
    • They’re currently vectorizing images and text independently but are looking for a way to do this with Cohere’s unified vectors.

Cohere ▷ #👋-introduce-yourself (2 messages):

Freelance AI Training, LLM Prompt Evaluation, Multilingual Data Annotation, Content Moderation for Conversational AI, AI Project Collaboration

  • AI Trainer Joins Community: A new member, Sushant Kaushik, introduced themself as a freelance AI Trainer and Content Moderator.
    • They have experience across platforms like Remotasks, Labelbox, Outlier, and Appen.
  • New Member Evaluates LLM Prompts: Sushant is currently working on LLM prompt evaluation, multilingual data annotation, and content moderation for conversational AI systems.
    • Their toolset includes Python, Labelbox, Power BI, and Azure Data Factory.
  • Community Collaboration Sought: Sushant hopes to learn from the community and stay updated on cutting-edge research.
    • They are also looking forward to collaborating on impactful AI projects.

tinygrad (George Hotz) ▷ #general (2 messages):

Tinygrad Motivation, Onnx export limitations, GPU Utilization

  • Tinygrad’s core motivation: Keep it Tiny!: The primary aim of Tinygrad is to maintain a minimal footprint.
    • Otherwise, no Tiny, according to one community member.
  • Onnx export limitations: Exporting the models to ONNX format is not possible due to dynamic control flow.
    • For some models, the trace fails to export due to ValueError: Exporting a trace with dynamic control flow.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

Large Language Model Agents, Berkeley MOOC iteration

  • New Edition of LLM Agents Anticipation Builds: A member inquired about the potential launch of a new edition of the Large Language Model Agents MOOC in the upcoming Fall.
    • Another member noted that while Berkeley is offering another Agents class for its students, a MOOC iteration hasn’t been confirmed, with announcements expected around late August.
  • Berkeley’s Agent Class and MOOC Iteration: Berkeley is set to offer another Agents class for its students, stirring curiosity about a potential MOOC iteration.
    • The community awaits confirmation, anticipated to be announced around late August, regarding whether the MOOC version will proceed.

Codeium (Windsurf) ▷ #announcements (1 messages):

Kimi K2 Model, Windsurf AI, Model Integration

  • Kimi K2 Swims into Windsurf: The Kimi K2 model is now supported on Windsurf at 0.5 credits per prompt, giving developers more options.
  • Windsurf Adds a New Wave: Windsurf updated its system to now support the new Kimi K2 Model.
    • This update broadens the options available for the user’s development workflow.

Nomic.ai (GPT4All) ▷ #general (1 messages):

Local Docs usage, Expansive Local Docs

  • Newbie seeks guidance on efficient Local Docs usage: A new user is seeking advice on efficient ways to use expansive Local Docs, as their current attempts seem to lack awareness of the extent of the local files.
    • The user is experiencing issues with the tool’s ability to provide answers, feeling like it’s pigeonholing responses into low-hanging fruit instead of leveraging the full breadth of information available in the local documents.
  • Improving Awareness in Local Docs: The user wants to improve the extent of the localdocs files.
    • The answers are being pidgeonholed into low-hanging fruit.