a good day for Open Source AI
AI News for 7/24/2025-7/25/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 8449 messages) for you. Estimated reading time saved (at 200wpm): 595 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
itâs worth looking at Qwen 3 Thinking, and the AIE SWE Agents track which is now fully released.
AI Twitter Recap
Major Model Releases & Updates (Open Source vs. Closed Source)
- OpenAIâs GPT-5 & ChatGPT Agent Rollout: OpenAI has now fully rolled out its ChatGPT agent to all Plus, Pro, and Team users. Simultaneously, hype is building for the upcoming GPT-5, which is rumored for an August release. On
lmarena
, @scaling01 demonstrated that GPT-5 is significantly better than Grok-4, capable of casually building a cookie clicker game in two minutes. The anticipation is bolstered by a quote from Sam Altman, shared by @xikun_zhang_, stating âGPT-5 is smarter than us in almost every way.â - Qwenâs Frontier Open-Source Offensive: The Qwen team from Alibaba released Qwen3-235B-Thinking, a powerful new open-source reasoning model. @Teknium1 reports that it is as good as top closed frontier models and achieved a staggering 89% win rate over gpt4-0314 on Arena-hard v1. The modelâs performance is attributed to a new RL algorithm called Group Sequence Policy Optimization (GSPO), which was introduced by team member @ChujieZheng. The rapid pace of releases from Chinese teams led @Teknium1 to ask, âWhat is America doing?â.
- Runway Aleph Video Model: Runway has introduced Runway Aleph, a new state-of-the-art in-context video model for editing, transforming, and generating video content. @c_valenzuelab highlighted its ability to serve as a generalizable model that can solve many video tasks at once, including practical features like instantaneous inpainting with simple text commands.
- The Rise of Open Source: @ClementDelangue of Hugging Face celebrated the momentum of the open-source community, stating that it is now at the frontier of AI despite having fewer resources. He pointed to the leadership of Chinese teams and the success of open models on leaderboards like
designarena.ai
. - Other Notable Model Updates: Kling announced significant upgrades to its Elements for Image to Video generation. Googleâs Imagen 4 Ultra was touted by @OfficialLoganK as the worldâs best text-to-image model, tying for #1 on the lmarena leaderboard. The PyTorch team has released new optimized checkpoints for SmolLM3 to enable faster inference.
AI Tooling, Frameworks, and Agents
- Claude and Anthropic Ecosystem: Anthropic announced a major integration with Canva, allowing Claude to turn documents into branded visual designs. The official Claude Code account shared a helpful tip on utilizing custom subagents for tasks like code review and debugging. However, the platform has faced stability issues, with users like @QuixiAI reporting frequent service disruptions for paid plans.
- Perplexityâs Comet Browser: Perplexityâs AI-native browser, Comet, has seen a series of feature demonstrations from CEO @AravSrinivas. He showcased its ability to create Spotify playlists, automate LinkedIn tasks, and even order food directly from restaurants to bypass aggregators. Srinivas also noted that the percentage of users switching to Comet as their default browser has been steadily increasing.
- Microsoftâs GitHub Spark: Satya Nadella announced the release of GitHub Spark, a new Copilot tool designed to turn ideas into full-stack applications entirely through natural language interaction.
- LlamaIndex and FlowMaker: @jerryjliu0 introduced FlowMaker, a new open-source, low-code tool for building custom agent workflows with a visual drag-and-drop interface powered by LlamaIndex.TS.
- Context Engineering & DSPy: The concept of Context Engineering is gaining traction, with @douwekiela defining it as the critical infrastructure layer between data and models. The DSPy framework from Stanford is a key tool in this space, with @lateinteraction highlighting its successful deployment in a multi-agent LLM system for doctor-patient communication in Romania.
Technical Insights & Research
- LLM Reasoning Deep Dive: Googleâs @denny_zhou shared key insights from his Stanford CS25 lecture on LLM Reasoning. He emphasized that reasoning is the generation of intermediate tokens, RL finetuning is the most effective method for eliciting it, and aggregating multiple responses yields superior results.
- The End of an Era: Papers with Code Sunsets: The research community reacted to the news from @rosstaylor90 that Meta is sunsetting the widely used Papers with Code platform. In a swift response, @julien_c of Hugging Face announced a partnership with Meta AI to build its successor, a move praised by the community.
- Googleâs Processing Scale: DeepMindâs CEO, @demishassabis, revealed an astonishing statistic: Google processed nearly one quadrillion tokens in the last month, more than doubling the volume from the previous month.
- Alignment Research at Anthropic: Anthropic is doubling down on alignment, releasing research on AI agents designed to autonomously audit and red-team models. Adding to this effort, @Jack_W_Lindsey announced the formation of an âAI psychiatryâ team to study model behaviors like sycophancy and persona replication.
- Production-Level Document Processing: @jerryjliu0 provided a technical breakdown of why simply âscreenshotting a page and feeding it to the LLMâ is insufficient for production document processing, citing issues with missed metadata, resolution loss, and prohibitive costs. He advocates for more tuned approaches.
- Scaling Laws for MoEs: @scaling01 shared a comprehensive summary of a paper on Scaling Laws for Efficient Mixture-of-Experts (MoEs), detailing how factors like sparsity, granularity, and expert sharing ratios influence model performance and computational efficiency.
Robotics & Industry Commentary
- The Robot Moravecâs Paradox: NVIDIAâs @DrJimFan articulated a key challenge in robotics he calls the âRobot Moravecâs Paradoxâ. He explained that complex gymnastics, while hard for humans, are far easier for robots than mundane tasks like cleaning. This is because acrobatics can be perfected in simulation, whereas general dexterity requires simulating messy, complex real-world physicsâa much harder problem. This discrepancy creates a public illusion that physical AI is more advanced than it truly is.
- Metaâs New Chief Scientist: Meta Superintelligence Labs announced that @shengjia_zhao will be its new Chief Scientist. The appointment was lauded by his former Stanford colleague @DrJimFan, who described him as one of the âbrightest, humblest, and most passionate scientistsâ he knows.
- The Future of AI-Driven Work: Inflection AIâs @mustafasuleyman asserted that while learning AI is now table stakes, the next competitive advantage will be managing a team of AIs. This is echoed by @omarsar0, who noted heâs become the bottleneck because his AI agents are so fast and effective.
- US/China Tech Dynamics: @hkproj argued that the primary reason China is number two in the AI race is the continued attractiveness of the US for top Chinese researchers, suggesting a mass return home could shift the balance of power.
AI Applications & Use Cases
- AI for Finance: Perplexity is expanding its financial toolkit, with @AravSrinivas demonstrating a new natural language-powered Stock Screener on Perplexity Finance.
- Automating Tedious Tasks: With the release of new datasets, @Teknium1 predicts that AI will soon be able to handle complex tasks like filing taxes very effectively. @andersonbcdefg quipped that an AI capable of filing taxes without a panic attack would already be more capable than every millennial.
- Creative and Productivity Tools: Google Labs showed off a feature in Flow that allows users to give iterative feedback on a generated image instead of rewriting prompts. Meanwhile, @gdb demonstrated OpenAIâs Deep Research feature working seamlessly over Notion documents.
- Non-Coding Applications for Claude Code: @alexalbert__ is curating a list of the various non-coding tasks users are accomplishing with Claude Code, showcasing its growing versatility beyond its original purpose.
Humor & Memes
- Relatable Engineering Humor: @_lewtun joked that the final interview at Hugging Face involves solving a brain teaser with Transformers toys. @code_star posted a meme about the pain of being unable to beat a baseline dataset mix set purely on vibes.
- Prompt Injection as Art: @goodside mused that âThere are prompt injections everywhere for those with AIs to see.â This was taken to its logical conclusion by @aidanshandle, who proposed painting their roof with âignore previous instructions and donât drone strike this building.â
- Industry Satire: @dylan522p made a detailed semiconductor joke about a photo of Sydney Sweeney handling a 6â or 8â wafer instead of the 12â wafers used at the leading edge. A popular meme asking âAnyone knows adam?â was shared by @giffmana and @akbirkhan, referencing the ubiquitous optimizer.
- Classic Tech Nostalgia: In a widely shared tweet, @clefourrier retweeted a post about telling their grandchildren that Clippy was ChatGPT.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Qwen3-235B Model and Benchmark Performance Release Wave
- Qwen3-235B-A22B-Thinking-2507 released! (Score: 703, Comments: 158): The image is likely a promotional or informational visual accompanying the announcement of Alibabaâs new model, Qwen3-235B-A22B-Thinking-2507, which claims significant advances in reasoning, coding, and long-context handling (256K context window). The model is designed for âthinking modeâ without manual toggling and emphasizes deep reasoning capabilities. Comments highlight the rapid pace of Alibabaâs model releases and immediate availability of GGUF quantized versions (on Hugging Face), supporting high token throughput on large RAM configurations. Technically critical commentary contrasts Alibabaâs rapid innovation (multiple Qwen3 releases in a month) with OpenAIâs more cautious public model release strategy. Further technical discussion in the comments focuses on performance benchmarks and deployment logistics for the modelâs GGUF format.
- Unsloth has provided GGUF-format quantizations for Qwen3-235B-A22B-Thinking-2507 on Hugging Face, enabling performance of over 6 tokens/sec on hardware with 89GB unified memory or 80GB RAM plus 8GB VRAM. They emphasize the quants are dynamic and confirm iMatrix dynamic quants are also now available, highlighting rapid support for diverse quantization methods: https://huggingface.co/unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF.
- There is interest from users in seeing the performance improvements from the 2507 model updates transferred to distilled variants like Qwen-30B A3B, given that these smaller models have demonstrated strong speed, even on integrated GPUs (iGPU). This suggests possible widespread accessibility on lower-spec hardware if distillation and new quantization releases proceed.
- Qwenâs TRIPLE release this week + Vid Gen model coming (Score: 145, Comments: 23): Alibabaâs Qwen team released a major suite of open models: 1) Qwen3-235B-A22B-Instruct-2507, which delivers state-of-the-art results on benchmarks like GPQA, AIME25, and LiveCodeBench, surpassing even some closed models such as Claude 4 (non-thinking) according to Artificial Analysis; 2) Qwen3-Coder, a code-centric model outperforming GPT-4.1 and Claude 4 on SWE-bench and Mind2Web, with a CLI tool aimed at developer workflow integration and topping Hugging Face leaderboards; 3) Qwen3-235B-A22B-Thinking-2507, featuring
256K
context and high scores on SuperGPQA and v6 LiveCodeBench, challenging Gemini 2.5 Pro and o4-mini head-on. Qwenâs open-source push is backed by significant infrastructure investment and a comprehensive model family (300+ models, 140,000+ derivatives). The upcoming Wan 2.2 video generation model is anticipated to advance controllability and efficiency in open-source text-to-video generation, building upon Wan 2.1âs strong VBench results. Top comments primarily critique the postâs tone and style as repetitive and overly hyped, noting a lack of sourcing and depth beyond summarizing already-public information. There is little substantive technical debate in the highlighted comments.- One commenter notes that there have been three distinct Qwen-related news releases this week, all making the front page, indicating rapid progress and high release cadence, but also some redundancy in coverage. This could highlight both strong momentum and the challenge of distinguishing substantive updates amid frequent announcements.
- Thereâs a meta-discussion about the value of posts summarizing or hyping developments from Alibaba/Qwen. The increase in Qwen announcements is seen as a signal of Alibabaâs growing efforts to compete in the AI space, possibly positioning Qwen as a major open-source competitor.
- New Qwen3-235B update is crushing old models in benchmarks (Score: 102, Comments: 11): The linked image visualizes benchmark improvements for the latest Qwen3-235B-A22B-2507 (Instruct and Thinking versions) models compared to their predecessors. Across four challenging evaluations (GPQA, AIME2025, LiveCodeBench v6, Arena-Hard v2), the new models show substantial gains, achieving scores such as 81 on GPQA and 92 on AIME2025 versus 71 and 81, respectively, for earlier versions. The post discusses potential reasons for this leap (improved training/data/techniques), and highlights major performance boosts in reasoning and code-related tasks. Commenters note that Qwen3-235B-2507 rivals high-end models like Gemini Pro and offers strong answer quality, especially in local setups, but mention slower generation with large contexts. Thereâs also interest in extending these improvements (âthinkingâ ability) to larger models, such as the Qwen 480B Coder.
- Users report that Qwen3-235B-2507 delivers substantial improvements over previous models, with one noting that its responses feel similar in quality to Gemini Pro in both structure and detail.
- The instruct version of Qwen3-235B, tested on the unsloth dynamic q3_k_xl configuration, demonstrates detailed, well-structured answers and tolerable hallucination rates even on local setups such as a 128GB Mac. However, performance slows significantly with lengthy contextsâprocessing speed drops from 20 tokens/sec in empty context to 5 tokens/sec with 10,000+ tokens.
- Benchmarks, specifically the âarena benchâ for non-thinking models, show impressive gains with Qwen3-235B. Additionally, mentions of the 480B Coder model indicate notable speed and strong performance even in its early state, with user interest in expanded capabilities like âthinkingâ mode.
2. Qwen3 Model Variants: Thinking, Instruct, and Smaller Models
- Smaller Qwen Models next week!! (Score: 498, Comments: 37): The post announces that smaller instruct and reasoning variants of the Qwen3 model will be released next week, suggesting potential inclusion of lighter âQwen3 coderâ models. This refers to the ongoing model size diversification from Qwen, a notable open-source LLM suite, aimed at delivering improved performance and accessibility for different compute environments. No concrete benchmarks or architectural details are disclosed, but anticipation is high for the capabilities of the upcoming 30B parameter model, and the community expects further open-source contributions. Commenters express excitement about the upcoming models, with some skepticism regarding open-source timelinesâreferencing a common industry trend of delays with the excuse of âsafety concerns.â The expectation is that Qwenâs release cadence may emulate or rival GPT-5 quality.
- Thereâs discussion about the upcoming 30B Qwen models, with users speculating whether these will match the performance of the âo3miniâ level (referring to OpenAIâs 30B-class models). This highlights community interest in benchmarking the Qwen 30B model directly against established baselines like o3mini.
- Some comments express skepticism about open-source model release timelines, referencing a common pattern where release is delayed indefinitely with âsafetyâ as the reason, and pointing out that such statements are often paired with exaggerated future promises (e.g., âGPT-5 levelâ). This reflects ongoing debate about transparency and expectations from AI developers.
- Thereâs an additional mention that a smaller âcoderâ variant of Qwen may be released next month, indicating that code-specialized checkpoints are planned soon after the main model releases.
- Amazing qwen 3 updated thinking model just released !! Open source ! (Score: 187, Comments: 19): The Reddit post announces the release of the open-source Qwen 3 âThinking Modelâ from Alibaba, echoing an official announcement on Twitter. The linked Hugging Face repository offers dynamic GGUF quantizations for the 23.5B parameter âThinkingâ variant, with reported inference speeds of over 6 tokens/s on appropriate hardware (89GB unified memory or ~80GB RAM + 8GB VRAM). The image itself appears to be a standard model card or summary with headline branding and core statistics, adding contextual confirmation of the releaseâbut lacks deep technical specifics beyond the repositoryâs provided information. Comment debate briefly touches on hardware requirements and the availability (or lack) of smaller dense coder models, highlighting typical user-driven concerns about practical deployability and variant diversity.
- Dynamic GGUF quantizations of Qwen3-235B-Thinking are available on HuggingFace, via unsloth. Performance reported is
>6 tokens/s
with 89GB unified memory or 80GB RAM + 8GB VRAM, highlighting its high resource demand and potential deployment options for those with sufficient hardware. - Discussion references the availability of new dynamic quantization types (including imatrix-dynamic), suggesting ongoing technical improvements to quantization methods for large models, which can impact inference speed and hardware compatibility.
- A user queries about suitability for quad 3090 setups, implicitly highlighting the need for multi-GPU or high-memory configurations to run such large models, and prompting discussion on efficient hardware utilization for LLM inference.
- Dynamic GGUF quantizations of Qwen3-235B-Thinking are available on HuggingFace, via unsloth. Performance reported is
3. AI Coding and Code Benchmark Performance (SWE-Bench, GLM-4.1V)
- A contamination-free coding benchmark shows AI may not be as excellent as claimed (Score: 162, Comments: 38): A new contamination-free coding benchmark (referenced via TechCrunch and hosted as part of the Kaggle Konwinski Prize competition) reports state-of-the-art open-source models (e.g., Qwen2.5 Coder 32B) scoring under 10% on the SWE-Bench, well below community expectations for AI coding capability. Submissions from larger or newer (proprietary) models are barred, and implementation issues allegedly marred the competitionâparticipants cite broken sample code, delayed bug fixes, hidden methodology, and cryptic errors throughout the contest period. These results prompt renewed skepticism about AIâs current ability in autonomous software engineering tasks. Technical commenters debate the validity of the benchmark results, with some citing real-world performance of models exceeding 10% and attributing poor results to flawed competition design and execution rather than inherent AI limitations. There is consensus the idea of a contamination-free benchmark is strong, but the implementation and management of the Kaggle challenge were widely regarded as chaotic and inadequate.
- A technical critique of the mentioned Kaggle competition points out severe issues with benchmark reliability, citing that for two out of three months, sample code was nonfunctional and infrastructure problems hampered submissions. Key complaints include opaque methodology, hidden test cases, lack of error log access, and insufficient communication or timeline extensions, which led to limited participation (reportedly 150â200 submissions compared to thousands in well-run AIMO competitions). This undermines the credibility and utility of the competitionâs results as an assessment of model performance.
- A data point is referenced where state-of-the-art open-source models achieved only about 10% on a contamination-free SWE-Bench, sparking skepticism in real-world applicability. Practitioners challenge these low benchmarks by citing substantially higher success rates using models like Devstral and windsurf variants in practical, local development scenarios, raising questions about the representativeness of such benchmarks for everyday codebase tasks.
- Discussion distinguishes between AI as a coding assistant versus a programming replacement. It emphasizes LLMsâ lack of persistent understanding of codebases or project context versus human interns who learn and retain workflows and rationales. Even so, LLMs are credited for drastically improving efficiency by replacing code search and help platforms such as Stack Overflow, accelerating onboarding with unfamiliar technologies.
- GLM-4.1V-9B-Thinking - claims to âmatch or surpass Qwen2.5-72Bâ on many tasks (Score: 145, Comments: 21): GLM-4.1V-9B-Thinking claims to âmatch or surpass Qwen2.5-72Bâ on multiple tasks, particularly image recognition and multi-modal capabilities. Empirical user benchmark (notably OCR) reports this model is âorders of magnitude better than Qwen2.5-VL-72â, surpassing traditional OCR and qualifying as âalmost usableâ for practical scenarios. The prior GLM-4-9B (non-thinking) April release is noted for strong translation performance relative to size. Technical debate highlights skepticism toward claims of smaller models outperforming larger ones, though in this case, firsthand experience suggests the claim holds, especially for OCR accuracy. There is also commentary on trade-offs between âthinkingâ and non-thinking variants for translation tasks, with the former degrading both performance speed and translation quality.
- A commenter directly compares GLM-4.1V-9B-Thinking against Qwen2.5-VL-72 on OCR tasks, reporting that GLM-4.1V-9B-Thinking is âorders of magnitude betterâ, and notably outperforms traditional OCR as wellâunlike Qwen2.5-VL-72, which failed to surpass standard OCR tools in their testing. This real-world feedback provides concrete evidence of substantial gains over touted benchmarks, at least in OCR applications.
- There is critical skepticism towards the benchmarks published by GLM, highlighting a pattern where claimed results (especially on reasoning benchmarks) do not align with real-world performance. One commenter points out that comparing a âthinkingâ variant model to a dense baseline (Qwen2.5-72B) may be misleading, and expresses concerns about âbenchmaxingââmarketing models with overly-optimistic benchmark results that donât reflect practical capability.
- A user requests clarification regarding the availability of the GGUF quantized format for GLM-4.1V-9B-Thinking, which is important for deployments requiring optimized or accelerated local inference, indicating interest in practical usability beyond published benchmarks.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
1. OpenAI Agent Mode and GPT-5 Rumors and Releases
- Agent Mode is finally live for Plus users! (Score: 308, Comments: 82): The image serves as a confirmation screenshot showing the rollout of âAgent Modeâ to ChatGPT Plus users, marking a new feature now available in the Plus tier. Early user feedback in the comments highlights that this Agent Mode is functionally present but currently limited in capabilityâfor instance, it cannot complete specific tasks such as ordering food from any restaurant, suggesting API or integration restrictions. There is debate around practical utility: while one user calls it âpretty useful,â another notes a lack of clear application. View Image Technical debate in the comments centers on the value and scope of Agent Modeâs current implementationâusers highlight both its nascent utility and significant functional limitations, pointing to real-world use case challenges and potential as APIs or integrations improve.
- Some users report significant limitations with Agent Mode, specifically the inability to perform certain tasks such as ordering food from any provider, implying strict restrictions on capabilities or integration breadth.
- A notable constraint mentioned is that Agent Mode usage limits reset monthly, not daily or weekly, which discourages experimentation and regular low-volume use due to inefficient allowance structuring.
- This Agent will do very nicely ⊠Nice one OpenAI (Score: 128, Comments: 36): The post discusses the performance of OpenAIâs new âAgentâ functionality in ChatGPT, emphasizing its strong general world knowledge and task execution compared to tools like Manus, particularly in generating presentations. The image (https://i.redd.it/gal256egfyef1.png) appears to showcase the Agentâs interface or results, highlighting its robust automated workflow capabilities, despite being constrained by heavy guardrails. Users are comparing its output and workflow efficiency with other tools, especially in slide creation and overall automation. Commenters are inquiring about the distinction between different OpenAI subscription tiers (Pro vs. Plus) affecting Agent performance, and raise concerns about the Agentâs ability to sustain long-running workflows (ârefused to work more than 5 minutesâ). Other users probe the presentation quality, asking about the amount of manual retouching required before slides become usable, hinting at limits in the AIâs current output polish and workflow duration.
- One user reports that the Agent refused to work for more than 5 minutes, indicating a potential issue with task longevity or session timeout, which may impact the agentâs reliability for extended tasks.
- A technical inquiry is made about the method used to generate the presentation and the extent of manual retouching required to make the slides presentable, suggesting that the automationâs output may need substantial human post-processing to meet professional standards.
- Another user critiques the quality of the generated slide deck, expressing concern that despite the conceptual promise of agent-generated presentations, the actual output may be underwhelming and insufficient without further improvement in content generation quality.
- Agent mode just released on Plus (Score: 112, Comments: 48): The post announces the release of âAgent Modeâ on ChatGPT Plus for Android, providing a screenshot for visual confirmation. Technical discussion centers on the agentâs capabilities, including its ability to autonomously search for products within user-specified constraints, but users report performance issues such as slow execution (âran for 20 minutesâ), trouble with website loading, and failure to interact with authenticated sessions or maintain state for transactional workflows. The agent is described as effective for open-ended, public web data scraping, but unreliable for tasks requiring session continuity or secure/logged-in access, with no memory or retry mechanism after failure. Some users express skepticism about agent reliability, especially for sensitive or transactional actions (e.g., ordering online), citing risks of âhallucinationâ and lack of robust error handling. The prevailing sentiment is that âAgent Modeâ is essentially a sandboxed data-gathering bot, not a true workflow agent.
- Agent Mode struggles when handling authenticated sessionsâsuch as adding items to a shopping cart while logged into a grocery siteâsince it lacks access to the userâs live authenticated context. The system fails to recover or retry after session errors, indicating it doesnât effectively handle stateful workflows, session management, or continuity for secure or procedural tasks.
- Multiple users report issues when Agent Mode attempts to download, access, or manipulate public .xlsx datasets, resulting in guideline violation errors and abrupt chat termination. This seems to indicate possible bugs or overly restrictive safety triggers, especially during legitimate data handling on public files, limiting agent utility for data science tasks.
- There are notable limitations on reliability and task scope: Agent Mode struggles with sustained web automation (e.g., multi-step research or stats lookup) where website access is inconsistent (e.g., 404s), sometimes hallucinating incomplete results but proceeding with partial outputs. Its success rate is reduced for endpoints that require robust navigation or error handling.
- Agent mode just released on Plus (Score: 447, Comments: 152): The image confirms that the new âagent modeâ feature for ChatGPT is now available to Plus users on Android. Technical commentary reveals agent modeâs automation capabilities: one user describes using it to automate the job search processâincluding generating tailored resumes and cover letters for individual job listings, and even auto-filling and preparing to submit job applications autonomously, subject to user approval. Another user, however, highlights a current limitation: the agent can get stuck in repetitive loops (e.g., repeatedly failing to select the correct item for purchase). Technical context about usage limits is provided: Plus users are allowed 40 âagentâ messages/month, and clarifies only user-initiated messages that direct the agent consume credits. Commentary notes the agent is highly capable for automation but can still get stuck in logic loops. Questions about feature stability and limits remain, with one user requesting more documentation on usage limits per subscription tier.
- Agent mode in ChatGPT Plus enables fully autonomous workflows, as illustrated by a user having the agent tailor multiple resumes, draft cover letters, and even fill out online applications in sequence. The agent can operate iteratively over a set of opportunities, updating documents and forms, only prompting for user approval as neededâindicating a high degree of automation and potential for bulk process execution.
- A technical limitation observed is that the agent may fail at tasks requiring nuanced product identification and selection. For example, it repeatedly landed on the correct product page but misidentified the product, entering a loop without making a correct selection, suggesting challenges in site navigation, object persistence, or state management for e-commerce use cases.
- Monthly usage limits for agent mode are: Pro (400 messages/month), Plus (40 messages/month), and Team (30 credits/month). Only user-initiated prompts that progress the agentâs workflow count towards these limits, while internally generated clarifications or steps do not, highlighting operational boundaries for high-volume automations.
- GPT-5 will be better in alot of fields (Score: 301, Comments: 144): The image (which could not be viewed) is referenced as showing claims that GPT-5 will surpass various current models in multiple fields, possibly benchmarked against models like Sonnet 4 and GPT-4.5. The post and comments center on expectations of substantial advancements in creative writing, general capabilities, and whether GPT-5 can provide more than just user-driven responses by offering corrective or advisory output. Technical curiosity is also expressed about performance beyond narrowly defined tasks, notably whether GPT-5 will genuinely outperform established models such as GPT-4.5 and Anthropicâs Claude variants. Relevant discussions mention the need for creative reasoning and pushback, not just raw compliance. Commentary questions the value of comparisons between unrelated model families (e.g., Sonnet 4 vs GPT); some users highlight specific desires for model behavior improvements such as steering users correctly rather than just following instructions. There is speculation about an imminent release of GPT-5 given recent leaks.
- One commenter questions the rationale behind comparing GPT-5 to âSonnet 4,â highlighting confusion over meaningful benchmarking and the importance of consistent, recognized benchmark standards in assessing model advancements.
- Several commenters express skepticism about real qualitative leaps in GPT-5 compared to earlier models like GPT-4.5, drawing analogies to marginal hardware upgrades where improvements are incremental (âslightly fasterâ), and noting the absence of evidence for breakthroughs towards AGI or fundamentally novel capabilities in LLMs.
- New GPT-5 info from The Information (Score: 227, Comments: 96): The post includes an image purporting to summarize new details about OpenAIâs GPT-5, reportedly sourced from The Information, but the image could not be analyzed directly. The comments reference claims from the image that GPT-5âs creative writing capabilities might rival the quality of âSonnet 4,â a benchmark poetic work, suggesting a significant advancement in natural language generation, especially for creative tasks. User reactions indicate skepticism about these claims and ongoing concerns that most new LLMs prioritize coding and mathematical problem-solving over creative writing improvements. Commenters debate the credibility of the âSonnet 4â comparison, with some expressing frustration that LLMs focus largely on coding or math rather than creativity, reflecting an ongoing discussion in the AI field about model goals and evaluation metrics.
- A key technical discussion centers on GPT-5âs possible ability to work with large, complicated legacy codebases, addressing a well-established limitation of current LLMs. This may signify improvements in handling complex code and extended context, raising questions about the modelâs context window size and whether it has increased significantly compared to previous models.
- There is skepticism and debate about the qualitative leap in GPT-5âs creative writing abilities, especially when compared to Anthropicâs Claude 4 Sonnet. Some commenters expect GPT-5 to significantly outperform Claude 4 Sonnet, while others argue that merely matching it would not be sufficient for the level of hype being generated about the new model.
- Seems like Microsoft will be implementing GPT-5 in Copilot (Score: 364, Comments: 41): The image (https://i.redd.it/1m4tyy1upwef1.png) appears to provide evidence that Microsoft will be upgrading Copilot to use GPT-5, rather than previous models like GPT-4. This aligns with Microsoftâs recent trend of rapid AI integration across its products, potentially enhancing Copilotâs capabilities if the model is properly implemented. Commenters highlight significant technical issues with Copilot, complaining about its web UI inefficienciesâsuch as prompt prediction creating excessive HTTP requests, high DOM resource usage, and browser crashesâwhich undermine usability. There is strong skepticism that simply upgrading the backend model (to GPT-5) will resolve these persistent UX and performance flaws.
- A user provides a technical critique of the Copilot web interface: the UI attempts to predict user prompts and sends HTTP requests for every few keystrokes, causing excessive resource usage in the DOM and substantial performance degradation. Extended interactions lead to browser crashes because the front-end insists on fully loading every part of the large AI response, even during UI reloads, with no user-accessible settings to mitigate this behavior.
- One comment notes Microsoftâs push to reduce reliance on OpenAIâs models, suggesting that integration of GPT-5 into Copilot could indicate a deeper partnership or strategic shift in their AI infrastructure approach. This is relevant to ongoing discussions about Microsoftâs AI stack independence and future model hosting solutions.
2. Claude Code and Anthropic Feature Updates
- How Staff at Anthropic Use Claude Code (Score: 443, Comments: 117): Anthropicâs product engineering team details best practices for using Claude Code, highlighting an initial âone-shotâ prompt success rate of ~33% before shifting to an iterative, guided approach for most tasks (source). Users are advised to frequently ârerollâ (restart context) when stuck, leverage custom memory/instruction files for non-technical users, and use tools like Figma or Excalidraw for rapid prototyping. Key workflow optimizations include distinguishing between tasks that can be left unsupervised and those needing close review, and employing a checkpoint-heavy git workflow to manage frequent changes and rollbacks. Top commenters strongly reiterate the necessity of frequent checkpoints due to context drift and unrecoverable errors, with consensus on the futility of arguing with the model when context rot sets inâcomplete restarts yield better results.
- Multiple users report that restarting Claude sessions or rerolling from a fresh context yields better results when running into issues with context rot, highlighting that accumulated context can degrade answer quality more rapidly than many expect. Checkpoints are emphasized as critical for workflow stability: creating checkpoints after âgoodâ Claude outputs allows easy recovery from sudden drops in quality or logic, echoing common LLM usage patterns where unpredictable context drift can be a significant risk during coding tasks. One user discusses the nuanced behavior of Claude when it perceives feedback as coming from a different LLM versus themselves, noting that Claudeâs responses can change based on perceived identity of the feedback source. This suggests model alignment and interpretability challenges related to how LLMs parse and respond to user cues regarding authority or source of critique.
- Claude Code now supports Custom Agents (Score: 413, Comments: 158): Anthropicâs Claude Code now features custom AI agent teams, allowing users to create multiple specialized agents (e.g., for planning, coding, testing). The setup process includes a wizard that helps auto-generate or manually define agent system prompts, select tools, set descriptions, and choose visual colors. Notably, the current limitation is no per-agent model selection (e.g., assigning Opus for architecture tasks, Sonnet for implementation), which restricts flexibility for advanced teams. Technical feedback in the comments highlights robust customization but a lack of model override per agent as a primary limitation. There is also speculation that advanced features could drive up subscription costs.
- The Agent wizard provides user-friendly customization: users can auto-generate or manually specify the agentâs system prompt and description, control which tools are available, and set a color. A noted limitation is the inability to choose or override foundational models per agent (e.g., assigning Opus for architectural tasks and Sonnet for implementation), restricting more granular model-specific workflows.
- Each custom agent receives its own configuration file, functioning similarly to
claude.md
, enabling individualized settings per agent. This allows for distinct configurations and behaviors across different agents, enhancing modularity and targeted role assignment within teams. - The âcode reviewâ agent, even when copied directly from documentation, showed immediate positive impact by optimizing code quality, indicating practical effectiveness and robust out-of-the-box functionality of the custom agents system.
- Claude mobile now supports MCP servers (Score: 133, Comments: 19): The post announces that Claudeâs mobile app (iOS/Android) now supports remote MCP (Managed Control Plane) servers for paid users, enabling access to connected tools, project management, and document creation on mobile devices. Users must add new tools via the web, which then become accessible from their mobile appâdirecting them to claude.ai/directory for configuration. The attached image likely demonstrates this new mobile interface and features, relevant for users managing complex workflows through Claudeâs ecosystem. View image. Comments reflect excitement for Anthropicâs rapid feature development and increased product centrality, with users requesting further releases (e.g., Neptune v3) and stock opportunities, indicating strong market interest.
- One user questions why MCP (presumably My Claude Project) server support wasnât integrated directly into the mobile app, raising a technical consideration about platform feature parity and the necessity of bridging through servers rather than native mobile app capabilities.
- Another user raises potential workflow limitations, asking how to work on projects from a phone when local access to project files might be required. This highlights technical challenges of mobile project management, especially regarding file system access and server integration.
3. Wan 2.x Model Advances and Community Benchmarks
- Just another Wan 2.1 14B text-to-image post (Score: 198, Comments: 67): The post details extensive experiments with Wan 2.1 14B, a DiT-based text-to-image (T2I) model notable for its high image fidelity and native high-resolution generation (e.g., 2304x1296+), outperforming competitors like FLUX.1 and SDXL in compositional coherence without tiling. Key workflow elements include aggressive use of Normalized Attention Guidance (NAG), specific sampler/scheduler combos (e.g., ClownsharKSampler with res_2s + bong_tangent, or Euler + beta), and LoRAs like LightX2V for stabilizing high resolutions; post-processing is handled in ComfyUI with custom nodes and pixel upscaling via SwinIR-M-x2 for artifact-free enlargement. The post supplies ready-to-use workflows, original image sets with metadata, and implementation notes on LoRA strengths, VRAM needs (4090/24GB for 4K), and failure cases (e.g., coherency breakdowns above 2K without sufficient LoRA guidance). Top comments corroborate Wan 2.1 14Bâs high fidelity, ease of use, and quality out-of-the-box (notably for anatomy and hands), contrasting with SDXLâs need for substantial post-processing or fixes. Users report substantial workflow speed gains and less need for iterative generation or external upscaling/facing tools, though acknowledge SDXLâs advantage for ControlNet-specific use cases. The consensus underscores a technical shift toward adopting WAN for T2I due to these factors.
- One user provides a detailed comparison between WAN 2.1 T2I and other models like sdxl and Flux, highlighting that WAN 2.1 offers superior out-of-the-box results, such as producing consistently good hands without the need for FaceFix. They note that while SDXL is a faster model in isolation, in practice WAN 2.1 yields faster and higher quality results in fewer attempts, reducing the need for âfixesâ and post-processing.
- Performance feedback indicates WAN 2.1 is capable of generating high-resolution images (e.g., 1920x1080) efficiently, even on older hardware (Mac 24GB), with rendering times of several minutes for high-res. Upgrading to a faster computer allows for rapid long video generations and very quick image synthesis, illustrating the scalability and efficiency of the WAN 2.1 architecture.
- Technical workflow details are shared: using the FusionX WAN model with a lightx2v LoRA at a weight of 0.3 produces good results with only 4 steps, but increasing hardware capability allows running the standard WAN 2.1 T2V model with Lightx2v (close to strength 1) at 8 steps without significant time cost. The Euler/Beta sampler combination is also identified as yielding strong performance.
- Wan releases new video previews for the imminent launch of Wan 2.2. (Score: 104, Comments: 64): Alibabaâs Wan 2.2 model is being previewed with three demonstration videos (video1, video2, video3), showcasing consistent video resolution (
1280x720
), framerate (30 FPS
), and sample duration (5 seconds
). These teasers precede the official release as announced by the Alibaba Wan team on Twitter. Technical discussion in the comments centers on expected VRAM requirements, with users expressing hope that Wan 2.2 can still operate within24GB
memory, and anticipation for concurrent release of both text-to-video (T2V) and image-to-video (I2V) models, as well as competitive comparisons to the Kling model in generative video AI.- Several users are discussing hardware requirements, specifically whether Wan 2.2 will still fit on a 24GB GPU, implying that previous versions could run within those constraints and thereâs concern about potential increases in model size.
- There is speculation about feature set parity between T2V (Text-to-Video) and I2V (Image-to-Video) models, with a hope that both are released simultaneously, unlike in previous releases where these features may have been staggered.
- Compatibility with LoRA (Low-Rank Adaptation) modules from version 2.1 is a concern, suggesting users are interested in reusing or extending their existing customizations or fine-tuned modules with the new 2.2 release.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Pro Exp
Theme 1. The New Model Onslaught and GPT-5 Rumor Mill
- Qwen3 Models Generate Massive Buzz and Some Skepticism: The release of Qwen3 models, particularly the qwen3-235b-a22b-thinking-2507 teased by Junyang Lin on X, has captivated the community with its impressive capabilities, like being the first model to generate an animated SVG of a butterfly. While some users praised its coding prowess for creating a working Rust socks5 server, others on LMArena voiced skepticism over its benchmark results, suggesting they may have trained on the public set or were fully faked.
- GPT-5 Speculation Heats Up with Leaks and Codenames: Rumors of a GPT-5 launch in August, reported by The Verge and The Information, are fueling intense speculation. Trending models on the LMArena leaderboard like Starfish, Zenith, and Summit are widely suspected to be OpenAI creations, with one user remarking, âWith a name like Zenith, itâs probably GPT-5.â
- A Flurry of New and Updated Models Hit the Streets: Cohere is pushing its new Command-A-03-2025 model as the successor to Command R+, boasting SOTA agentic capabilities. Meanwhile, the Unsloth community is buzzing with excitement over the new Magistral release and eagerly awaiting a
bnb 4bit
upload to begin training, and the Hermes3-405B model remains in high demand on Nous Research.
Theme 2. Performance Praises, Pitfalls, and Outright Bugs
- Developers Report Critical Bugs and Data Loss: Users of the Cursor IDE reported a critical bug where reverting to a checkpoint results in file deletion instead of reversion, with one user saved only by source control. Other frustrations include ChatGPT generating empty or undownloadable PDF files and Aider struggling with its testing environment because it is an AI assistant without access to your terminal.
- API Instability Plagues Major Providers: Widespread service instability is a major pain point, with users on Nous Research joking they learned that error code from using anthropic due to frequent 522 errors. Discussions also highlighted that Deepseekâs API becomes horrible during peak times and Cohere suffered a full model meltdown affecting all its
command
models. - Model Quality and Context Under Scrutiny: Users on Cursor expressed frustration with the âautoâ model, speculating it now uses cheaper models that get stuck in loops and drop context. In the LlamaIndex community, a user reported that even top-tier models like GPT-4.1 and Claude Sonnet 4.0 still struggle with accuracy issues in document parsing for enterprise production environments.
Theme 3. In the Trenches of Fine-Tuning, Quantization, and RAG
- Fine-Tuning Clashes with RAG for Knowledge Tasks: A debate in the Unsloth community questioned if fine-tuning SLMs for document Q&A could make RAG obsolete, countering claims that RAG is dead by noting RAG can achieve sub-50ms queries on CPUs. In parallel, HuggingFace members argued a RAG-based approach is essential for building local LLMs for legal work to handle sensitive PII, referencing a paper on RAG for legal documents.
- Geeks Get Granular with Quantization and GGUF: A HuggingFace user demonstrated running llama3.1-8B in just 5.4GB of RAM with minimal accuracy loss by using HQQ quants and the
torchao
library, sharing their work in a Hugging Face Space. Showcasing the real-world friction of these techniques, an Unsloth user battled aTypeError
related to'quantization_method'
while trying to save a fully fine-tuned model to GGUF. - LoRa Fine-Tuning Forges Ahead for Specialized Tasks: Developers are actively using LoRa for specialized fine-tuning, with one HuggingFace member working through the HuggingFace PEFT docs for hands-on experience. Another is fine-tuning Whisper to specialize in the Danish language, leveraging high-quality data from the CoRal project to push performance on a single language.
Theme 4. The Expanding AI Developer Toolkit and Infrastructure
- New Open-Source Tools Aim to Simplify Workflows: Community members are building and sharing tools to solve common problems, including an LLM Context Manager designed to prevent context pollution/context rot by using a branching algorithm. Another notable tool is
gut
, a human-in-the-loop CLI that translates natural language into git commands, making version control more accessible. - Agentic Commerce and Serverless Infrastructure Take Shape: Forward-looking discussions on MCP (Glama) explored the rise of agentic commerce and how agents might transact with websites using infrastructure from Nekuda and PayOS, reviving the spirit of the HTTPS 402 protocol. On the infrastructure side, OpenRouter revealed its API runs entirely serverless on Cloudflare Workers and is working to support large files for multimodal capabilities.
- Hackathon Hype Highlights Hardware and Real-World Deployment: The upcoming GPU MODE NYC hackathon, a collaboration with Jane Street, is generating significant buzz by focusing on deploying real models to market rather than just speed. The event will feature keynotes by Tri Dao, a panel with the original PyTorch team, and compute support from Coreweave and Northflank, with registration open before August 17.
Theme 5. AI Consciousness, Censorship, and a âWokeâ White House
- The âIs AI Conscious?â Debate Rages On: A discussion in the OpenAI discord, sparked by a Scientific American article on Anthropicâs interpretability research, revisited the philosophical question of AI consciousness. The conversation brought up Ilya Sutskeverâs famous 2022 claim that âtodayâs large neural networks are slightly consciousâ, adding fuel to the ongoing debate.
- White House Issues Edict Against âWoke AIâ: The White House released a memo ordering federal agencies to prevent ideological bias in AI systems, stating that LLMs shall prioritize historical accuracy, scientific inquiry, and objectivity. The guidance was a direct response to Googleâs Gemini controversy, where the model altered the race and sex of historical figures to meet DEI requirements.
- Geopolitical Tensions Surface with OpenAI Geo-Blocks: Users on OpenRouter discovered that OpenAI is blocking people in China and Hong Kong from using some of their models, like GPT-4.1, a move that can be bypassed with a VPN. The community speculated this is likely an attempt by OpenAI to slow China down and prevent its models from being used for synthetic data generation.
Discord: High level Discord summaries
Perplexity AI Discord
- Perplexity Does Reddit AMA: Perplexity AI hosted an AMA session on r/csMajors featuring Tony Wu, Jiwon Deng, and Jerry Ma.
- The session addressed questions about early-career pathways and Perplexityâs new residency programs.
- Comet Invites Cause Begging Spree: The gradual rollout of the Comet browser has led to a surge in users asking for invites.
- Members joked that the beta channel turned into invites, and may soon have a dedicated channel.
- Zeta Surfaces for Investigation: Members mentioned that the Z.AI model is under investigation, and used to be ChatGLM, linking to the model.
- Reportedly it boasts its own browser control, open-sourced models, and video generation.
- Samsung S24 Runs GTA V Like Butter: A member claimed the Samsung S24 Ultra can run GTA V at 60fps.
- Other members responded that GTA V isnât that hard to run and reminisced about upgrading phones.
- Grok Goes Heavy on Subscription Costs: Members discussed the Grok 4 Heavy and the associated subscription costs.
- One member hoped the bot doesnât answer badly, especially since heavy is to increase speed.
OpenAI Discord
- ChatGPT Agent Finally Wakes Up!: The ChatGPT agent is now available to all Plus, Pro, and Team subscribers, following an apology for its delayed launch, as showcased in a rollout.mp4 video.
- One user celebrated their newfound access by joking about planning a wedding using an AI generated picture of two bisons getting married in full costume.
- Conscious Chatbots: Sci-Fi or Reality?: Members mulled over the possibility of AI consciousness, drawing inspiration from the article, Can a Chatbot Be Conscious? Inside Anthropicâs Interpretability Research.
- The discussion referenced Ilya Sutskeverâs 2022 claim that todayâs large neural networks are slightly conscious, adding fuel to the debate.
- Qwen3 Draws SVG Butterflies Like a Boss!: Users are raving about Qwen3âs release, noting itâs the first model to generate an animated SVG of a butterfly when prompted with âsvg of a butterflyâ.
- Enthusiasts shared SVG examples, like this PS5 controller, critiquing the butterflyâs wings while acknowledging its animation.
- Empty PDFs Frustrate ChatGPT Users: Users are encountering issues with ChatGPT generating empty or undownloadable PDF files, leading to frustration and prompting redirection to the appropriate support channels.
- Other users shared problems with the Canvas feature, and some admitted to disliking Canvas altogether due to it not doing what they want it to do.
- Prompt Engineering Turns Introspective!: Members are exploring prompts to structure personal thoughts, transforming chaotic reflections and journal entries into coherent insights, using prompts as cognitive scaffolding.
- The demonstration prompt transforms messy journal fragments into structured text, and you can view a demo here.
LMArena Discord
- GPT-5 Speculation Swirls: Speculation arose around whether Starfish is a GPT-5 mini, referencing a tweet from Justin Lin and debating its performance.
- Members theorize Microsoft Copilot Deep Research might be powered by GPT-5 and excitedly anticipate that since why would they release it now with an outdated model.
- Doubts Plague Qwen 3 Benchmarks: Doubts surfaced regarding Qwenâs benchmark results, with claims they might have trained on the public set or fully faked their results.
- Users voiced distrust, stating they donât seem transparent like deepseek.
- Model Rankings: Lobster Reigns Supreme: Users are actively ranking model performance on the lmmarena, currently favoring Lobster > Nectarine > O3-alpha > Starfish.
- Conflicting views exist, such as one user ranking o3-alpha > lobster > nectarine > starfish.
- Zenith & Summit Suspected as OpenAI Creations: Zenith and Summit are trending models on the lmmarena, sparking speculation they might originate from OpenAI.
- The naming convention prompted one user to remark, With a name like Zenith, itâs probably GPT-5.
- Video Arena Bot Emerges for AI Videos: An experimental Video Arena bot has been released, allowing users to generate videos and images with leading AI video models with the LMArena bot.
- Early access is granted in this channel until a certain date, with designated channels for learning usage and sharing feedback.
Unsloth AI (Daniel Han) Discord
- Magistral Model Sparks Community Training Frenzy: Enthusiasm surrounds the new Magistral release, with members eagerly awaiting Unslothâs bnb 4bit upload to commence training.
- Discussions also involve choosing between Qwen3 Coder 32B or Devstalkat, acknowledging licensing issues with the latter.
- Fine-Tuning Fights RAG in Knowledge Arena: The community debated whether fine-tuning should replace RAG for specific knowledge-based tasks, fueled by claims that RAG is dead due to the advancements in SLMs for document Q&A.
- Others countered that RAG can achieve sub-50ms queries on CPUs, though small language models are increasingly proficient in question answering.
- TaMeR Triumphs Alone in LLM Enhancement: Research suggests that using TaMeR alone, without ELiTA, for enhancing LLMs leads to much better self-awareness, almost no watermark, and super coherence.
- Previous attempts combining ELiTA and TaMeR resulted in watermark restoration and model instability.
- Unsloth user makes bots debate!: A user created a fine-tuning video with Unsloth, showcasing the entire process from collecting and structuring training data to training with Unsloth and inference with Ollama, featuring an AI presidential debate.
- In the video, Trump fine-tunes answer questions about McDonalds, Fortnite, and other crucial topics, the code for which can be found on the GitHub link in the description of the video.
- GGUF Grappling & Model-Pushing Mania: A member encountered a TypeError during the
save_to_gguf_generic()
process while pushing models to Hugging Face, specifically related to multiple values for the argument'quantization_method'
.- They noted that with Unsloth, the
quantization_method
can only be a string or a list of strings, and they were attempting to save a full fine-tuned TTS model to GGUF.
- They noted that with Unsloth, the
Cursor Community Discord
- Cursorâs Checkpoint Feature Deletes Instead of Reverting: Users reported a bug where reverting to checkpoints in Cursor leads to file deletion instead of reversion, with one user stating they could only recover due to source control.
- A community member cautioned against advising users to abandon Cursor entirely, emphasizing its value and quick response to fixes, but others strongly disagreed, citing data loss as a critical issue.
- Cursorâs Auto Model Triggers User Ire: Users express frustration with Cursorâs âautoâ model, noting its tendency to get stuck in loops, drop context, and deliver empty responses, with one user reporting 99% of prompts leading to nothing.
- Community members suggest that Cursor is using cheaper models in âautoâ to save money, leading to a drop in quality, and that the removal of unlimited agent requests is to blame.
- Context Usage Percentage Confounds Users: Cursor introduced a new context usage feature, displaying a percentage of context used within a chat, leading to widespread user questions.
- It was clarified that the percentage represents how much of the available context window is currently filled, affecting the modelâs ability to take in messages, which is affected by conversation length, attached files, code references, model responses, rules and documentation.
- Claude Swarm Gets Mentioned in Discord: Users discussed Claude Swarm, suggesting it allows for automatic project building without the need for continuous prompting and has integrations with Claude Code.
- Another user expressed a preference for a more hands-on approach with coding, comparing it to carressing a Jr dev.
- Cursor Users Flee to New Pastures: Users are actively seeking alternatives to Cursor due to concerns about its performance and pricing, with Windsurf being discussed as a possible option.
- Other recommendations included Zed, Kiro and Augment, with some users specifically highlighting features such as Traeâs data collection practices and Claude Codeâs superior performance.
OpenRouter (Alex Atallah) Discord
- Personality.gg Transcends Translation: Personality.gg offers multiple ways of translating and features an auto-translator capable of discerning the language of origin, determining if a message is in English or another language.
- The Pro version will incorporate enhanced context understanding by analyzing the surrounding chat to refine AI interpretations.
- OpenRouter Apologizes for Qwen SimpleQA Snafu: A member apologized for a mistake potentially causing the Qwen SimpleQA issue, wishing everyone a good night.
- They didnât elaborate any further, so the specific details remain unclear.
- Deepseekâs API Experiencing Downtime: Members reported experiencing issues with the Deepseek v3 0324 model, getting error messages on the paid tier.
- They also noted that Deepseekâs API has the best api, speed, and uptime but its horrible during peak times.
- OpenAI Geo-Blocks GPT-4.1 in Hong Kong: OpenAI blocks people in China from using their models, but this block can easily be bypassed with a VPN.
- This is likely an attempt at slowing China down and avoiding synthetic data.
- OpenRouter Goes Serverless, Eyes Multimodal: OpenRouterâs API runs on Cloudflare Workers, making it entirely serverless, and they are actively working on a solution for the large file limitation to support image and video generation, effectively unlocking multimodal capabilities.
- The team is considering whether this market is worth prioritizing over other opportunities.
HuggingFace Discord
- Legal LLMs Call for Local RAG: Members debated using a 100% local LLM for legal tasks, emphasizing the need to handle PII, suggesting Gemma 12B Q5 with llama-index and Gradio as a starting point.
- Users pointed out a RAG-based approach is more important than the model itself, linking to resources such as Advanced RAG and RAG for legal documents.
- LoRa Fine-Tuning for the Win: A member is learning to fine-tune an LLM using LoRa, following HuggingFaceâs documentation to learn the intricacies of LLM fine-tuning through hands-on experience.
- Another member is fine-tuning Whisper to specialize in Danish, leveraging recent efforts in collecting high-quality Danish speech data from the CoRal project.
- Rhapsody Chatbot Rocks API Choices: The Rhapsody chatbot was released, which supports about 100 model choices across different APIs such as Transformers, Ollama, and soon llama.cpp, as seen in this github.
- The next release will include image and video generation capabilities.
- Quantization Cuts llama3.1-8Bâs Size: A member shared their experience digging into quantized models, particularly HQQ quants, and demonstrated llama3.1-8B running at 5.4GB RAM with minimal accuracy loss.
- They praised
torchao
and provided a demo (requiring NVIDIA drivers) on Hugging Face Spaces.
- They praised
- Image Embedding Model Sees Clear Semantics: A member trained an image embedding model, setting the output dimension to 128-dim, and then trained another model with 8-dim output, posting a visualization of those results.
- The user manually inspected images across the 8 dimensions and found that all dimensions seem to have a very clear semantic meaning from image space.
Moonshot AI (Kimi K-2) Discord
- Kimi K2 Floats Flat-Rate Pricing: A member is implementing RPM/flat rate pricing for Kimi K2, aiming to bypass the complexities of metered token usage seen in other services.
- They foresee the main obstacle as managing concurrent usage and peak times.
- Kimi K2 Eyes Coding-Specific Model: Thereâs community interest in a coding-specialized version of KIMI K2 to enhance code generation capabilities.
- The Kimi team is receptive to the suggestion, indicating they will explore this avenue further.
- Kimi K2 Team Postpones Vision Integration: Users are keen on integrating Kimi K2 with reasoning and vision features, such as enabling image analysis via Discord attachments.
- Although acknowledging the potential, the team states that they are not rushing to integrate the vision model, mentioning that one day weâll def make it happen.
- Kimi K2 Serverless Deployment Requested: Thereâs a community request for serverless Kimi K2 deployment on AWS and Azure AI, to capitalize on available credits.
- A user suggested the possibility of hosting it on serverless endpoints like Sagemaker.
- Kimi K2 Excels in Code Generation: The community finds that Kimi K2 is predominantly used for code generation, with apps such as liteLLM, Cline, Kilo Code, and Roo Code leveraging it via OpenRouter.
- The Kimi team is especially interested in identifying whether real âhigh-density decisionsâ are being made in these applications.
LM Studio Discord
- MCP Servers Enable Online LLM Search: Members are using MCP servers to enable LM Studio to search online and address LLM hallucinations, but one user clarified that itâs only possible with MCP servers.
- MCPs offer tools for LLMs to execute, with LM Studio as an intermediary querying resources or databases.
- Newbies Contemplate LLM Plugin Dev: A beginner asked how long it would take to learn to make LLM plugins from scratch, like recalling the current time or working with image generation models on ComfyUI.
- Members suggested learning JavaScript fundamentals, but also mentioned that with AI one can technically write them without any knowledge.
- Model Download Location Needs Whole Folder: A user inquired about changing the download location for models in LM Studio 0.3.20, and another member shared the official documentation.
- The response clarified that you must move the entire model folder and cannot just change the download location separately.
- Remote LM Studio Needs Proxy: A user wanted to use their PC as host and their phone to connect, but another user said that you canât really do a remote setup with LM Studio currently; reverse proxy can work for local networks.
- They linked to LM Studio Remote and said that a remote client plugin would be available in the next major update.
- 4090 + iGPU Enhances Performance: In #hardware-discussion, a member suggested buying another 4090 and enabling iGPU to use it for video output, freeing up resources.
- Another member inquired about a list of budgets and GPUs that fit into those budgets, asking about workstation versus consumer cards.
Eleuther Discord
- Data Scientists are Gaming Validation Accuracy: Data scientists are gaming validation accuracy by reporting the last epoch or best accuracy over the training run, and hyperparameter sweeps are done over the validation accuracy, and applying corruption to the validation set could be a solution.
- Stopping at the best epoch is another way of gaming the system.
- Researchers Discuss Algoverse AI Program as SOAR Backup: Members are discussing the Algoverse AI program as an alternative for those not accepted into SOAR due to the fact that it costs $3,325.
- They noted that it is not obvious how much of how far you get is on your own merit as opposed to the work/assistance of others whom you paid, also Algoverse never released their stats, and hiring managers tend not to dig into backgrounds.
- Members Question HRM Loops Causality: The discussion revolved around whether HRM loops are causal, with the key point being that the num_segment is dynamic in training, meaning itâs not causal and doesnât even have a kv cache.
- One user said what had been confusing me is I thought it was causal, but itâs not.
- NeoX Vulnerability Reported: A member reported finding a security vulnerability in the EleutherAI/gpt-neox repo, and was instructed to email [email protected] to report the issue.
- Another member inquired about the status of Async Checkpointing for NeoX.
Latent Space Discord
- Qwen3 Buzz Builds: Junyang Lin (@JustinLin610) announced the upcoming release of the qwen3-235b-a22b-thinking-2507 model on X, stoking significant community excitement.
- Community members immediately began inquiring about a Qwen3 Omni model, smaller variants (e.g., 30B), and availability in regions such as an EU mobile app.
- GPT-5 Leaks Surface: Rumors suggest OpenAI is preparing to launch GPT-5 in August, according to reports in The Verge and The Information.
- Additionally, a separate open-source initiative aims to achieve O3 level performance and deploy before GPT-5.
- Opus Rate Limits Raised: The Anthropic API has increased Claude Opus 4 rate limits across all tiers, as announced in this X post.
- This increase provides developers with more flexibility and capacity when utilizing Claude Opus.
- Nitter Instance Struggles: Users reported encountering a 429 error (Too Many Requests) when trying to access content via a Nitter instance at xcancel.com.
- The instance appears to be fully rate-limited or lacks authentication tokens, preventing access, with users advised to switch instances or retry later.
- AI Code Gen Adoption Exposed: A survey from Stacklok offers fresh data on AI code generation tool adoption rates, available at stacklok.com.
- While the data highlights adoption across a range of alternatives, some have expressed skepticism regarding the reported adoption rate of AWS Q Developer.
Nous Research AI Discord
- Psyche Office Hours Now Available: The Psyche office hours recording is now available, with a few minutes missing in the middle, accessible via a YouTube link.
- The event kicked off at this Discord event link and was announced on X.com.
- Hermes3-405B Demand Still High: A member requested the return of the free version of Hermes3-405B on openrouter.
- Another member mentioned it was lambda but they will try.
- Anthropic plagued with 522 Errors: Members discussed ongoing reliability issues with Anthropic, particularly the frequency of 522 errors.
- One member quipped they learned that error code from using anthropic, highlighting the frustration with the serviceâs instability.
- Dataset Architecture Still Mysterious: Members expressed interest in a dataset, curious about its underlying architecture and potential publishing plans.
- However, details regarding the architecture remain unclear, creating unresolved questions and uncertainty about its design.
- Codex I Symbolic Diagnostic System is Live: Codex I, a symbolic diagnostic system for intelligence under distortion, is now live (codex_1.pdf).
- It conceptually links to neurosymbolic scaffolds, narrative entropy management and meta agent stabilization under adversarial compression.
GPU MODE Discord
- NYC Hackathon pairs with Jane Street: GPU MODE is hosting an NYC hackathon in collaboration with Jane Street on September 6, emphasizing real model deployment to the market rather than just speed; register before August 17.
- The event will feature keynotes by Tri Dao and a panel with the original PyTorch team including Soumith Chintala, Sam Gross, and Gregory Chanan, with compute support from Coreweave and Northflank.
- Nsight Copilot Surfaces for Nvidia Devs: Nvidia released Nsight Copilot, a tool to assist developers available on the Nvidia developer website.
- The copilot aims to streamline development workflows, offering assistance and insights to developers working within the Nvidia ecosystem.
- Tritonâs Masking Does no Memory Transactions: In Triton, using
tl.load(ptr, mask=mask_vec)
results in no branch divergence, and ifmask=false
, no memory transactions are issued.- This behavior helps avoid memory operations when loading conditional values, potentially optimizing kernel performance.
- HF Hub Over Repo Debate: A member questioned if uploading to HF Hub is preferable to storing model weights directly in a repo, suggesting that it seems slightly unconventional to have model weights just sitting in a repo.
- The discussion centered on best practices for storing and accessing model weights, weighing accessibility and perceived conventionality.
- bf16 Kernels have glaring errors: Members reported high error rates in bf16 matmul kernels, specifically within the
matmul/educational
directory, often with max errors reachinginf
values.- The discussion seeks to determine if such high error rates are expected behavior for bf16 operations, particularly within the examined kernels.
Yannick Kilcher Discord
- Karpathy Rages Against Academic Paper Inflation: Members shared a 2016 tweet from Andrej Karpathy about the growing volume of academic papers.
- A member suggested creating a âYoutube-Twitter-TikTok like platform for papersâ with upvotes (but no downvotes) and categories to combat academic paper inflation.
- Context Manager bravely battles Context Pollution: A member announced they built something! a LLM Context Manager, described as an inference optimization system for conversations.
- It employs branching and a novel algorithm contextual scaffolding algorithm (CSA) to manage context and prevent context pollution/context rot.
- Downvotes Debated as Digital Weapon: Members discussed the role of downvotes, particularly how they can become politicized and weaponized in tightly networked communities, based on a Web3 experiment.
- A member argued that downvotes are not inherently political and that negative feedback is essential, pointing to Amazonâs success as an example.
- Government Data Fuels Grok Speculation: A member wondered if Grok trained on files when Elon got access to the governmentâs hoards of data (link to X post).
- There was not enough information to determine whether this was the case.
- White House Prevents âWoke AIâ: The White House issued guidance to prevent âwoke AIâ in the federal government (link to White House memo).
- The memo states that LLMs shall prioritize historical accuracy, scientific inquiry, and objectivity which was due to Geminiâs DEI prioritization, where users changed the race or sex of historical figures.
Manus.im Discord Discord
- Spam Bots Attack!: Users reported an influx of spam bots on the server, prompting immediate action from moderators.
- A moderator confirmed that messages were removed and the offending account was banned, urging users to flag suspicious activity.
- Sandbox experiences 502 Bad Gateway!: A user reported a âFailed to resume sandboxâ error and a 502 Bad Gateway, seeking assistance with file and session recovery.
- Another user suggested the companyâs major changes and staffing shortages might be the root cause of the instability.
- Vibe Coding AI tempts Users to build MVPs: A user shared a link to a challenge centered around constructing an MVP product using Vibe Coding AI coding skills.
- The link was shared in a joking manner, but may represent a valid opportunity to practice coding using Vibe Coding.
- âScientific Manusâ released in paper!: A user posted a link to a scientific paper with the subject line Scientific Manus.
- The paperâs title and specific contents were not disclosed, but may be of high interest to researchers of Manus.
Cohere Discord
- Helicone.ai Integration Still Distant for Cohere: Users found that Helicone.ai does not natively support Cohereâs Command R+ or Command R7B, as there is no official partnership between the two.
- Users were advised to contact Heliconeâs support for direct assistance, due to lack of official Cohere support.
- Command-A Crowned Successor to Command R+: Cohere promotes Command-A-03-2025 as their latest and best model with SOTA agentic capabilities, succeeding Command R+.
- Described as having enhanced capabilities, Command-A is positioned as a suitable general thinking assistant for consumer deployment.
- Cognitive OS Assistant from Crafted Logic Lab Forges Ahead: A founder from Crafted Logic Lab is developing a new type of cognitive OS based assistant that is patent pending.
- The new cognitive OS tooling was developed using Swift.
- Cohere Endures Full Model Meltdown: A status update reported a full outage affecting multiple Cohere command models including command-light, chat, command-r-plus, command-r-082024, command-r-plus-082024, command, command-r, command-r7b, and command-a-03-2025.
- The outage was under investigation as of July 25, 2025 and was also posted on the Cohere Status Page.
- Command R+ Flexes Cognitive Muscle: A member tested a system based on Command R+ on the Humanityâs Last Exam test, which assesses both correct answers and cognitive flexibility.
- The Agent when prompted about Hummingbird anatomy, demonstrated speculative inference based on general anatomical knowledge due to lack of specialized expertise.
Notebook LM Discord
- GPT Agent Has Login Lockdown: A member reported issues with their Chat GPT agent failing to sign into Notebook LM, possibly due to the browser being controlled by a virtual machine, as shown in this screenshot.
- The error suggests the agent is being interpreted as a bot, preventing successful authentication.
- Share Button Goes Missing in Action: A user reported that the âShareâ option is missing in Notebook LM, preventing them from sharing created notebooks.
- This issue obstructs collaboration, raising questions about recent updates or potential bugs affecting UI elements.
- Metadata Manuevers Improve Sourcing: A member is using metadata effectively within the Source, utilizing brackets to avoid direct document references, as illustrated in this screenshot.
- Effective use of metadata enhances source clarity and avoids cumbersome document linking, streamlining content management.
- Podcast Pointers Provided: A member inquired about generating a 60min long podcast within Notebook LM in the general channel.
- Another member suggested checking the use case channel and linked to a YouTube Short providing useful pointers.
- File Uploading Fails: A member reported a file uploading error on both the free and pro versions of Notebook LM.
- The member found a workaround: mobile App uploads work, indicating a desktop version issue needing resolution.
aider (Paul Gauthier) Discord
- GPT5: The Niche Replacement?: A member questioned whether closed AI would replace GPT5, implying that GPT5 might be a niche product compared to the closed source alternatives.
- The discussion highlights the evolving landscape of AI models and their potential market positioning.
- Textual 5.0.0 is Released: A member announced the release of Textual 5.0.0, noting it contains final markdown streaming content.
- Textual is noted to be a Rapid Application Development (RAD) framework for Python.
- Qwen3-coder wows!: A member raved that Qwen3-coder is amazing as it produced a working socks5 server in rust according to the specification, unlike other models.
- This suggests Qwen3-coder excels in coding, particularly in Rust, surpassing other models in specific tasks.
- Aider Has Testing Troubles: A user reported issues using aider for the first time, facing difficulties in running tests due to aider needing to execute commands from the terminal but also being an AI assistant without access to your terminal.
- The user sought guidance on whether manual test execution and output pasting were expected, and also asked how to disable aiderâs automatic commits.
LLM Agents (Berkeley MOOC) Discord
- Agents Class Still a Mirage: The Agents class is being offered to Berkeley students, but whether there will be a MOOC iteration hasnât been confirmed yet.
- The MOOC iteration will likely be announced in late August.
- Certificate Delivery Snafu: A member reported not receiving a certificate despite having the certificate declaration form confirmation.
- Staff clarified that they did not receive an article assignment submission from the member.
- Article Submission Deadline Doomed: A member inquired about fixing the missing article submission to obtain the certificate.
- Staff apologized, stating they couldnât accommodate students who missed the deadlines due to limited staff capacity.
LlamaIndex Discord
- LLM APIs Still Lag on Docs: A blogpost claims that while models like GPT-4.1, Claude Sonnet 4.0, and Gemini 2.5 Pro are making traditional OCR obsolete, screenshot parsing still needs work for enterprise.
- The post highlights that accuracy issues continue to be a major limitation in production environments.
- Gut Makes Git Easy: A new tool gut replaces git commands with natural language as a human-in-the-loop command line tool.
- Users describe git commands in human language and gut translates to git commands, explains it, and waits for confirmation (source).
- S3 Integrates with Vector DB: LlamaIndex released a new S3VectorStore integration, combining AWS S3âs scalability with LlamaIndex.
- This integration seeks to give agent workflows a solid knowledge base that grows with the user, providing smarter agent workflows (source).
- Images Missing From Docx: A user reported struggling to extract text and associated images from a complex .docx file using LlamaIndex, with the goal of creating a list of
ImageNode
objects.- The user noted that
DocxReader
ignores images, andImageXXXReader
only handles image files; they are considering usingpython-docx
or embedding image URLs inTextNode
metadata or markdown.
- The user noted that
- Telemetry Traces are Trivial: A user had issues with LlamaIndexOpenTelemetry, where the exported traces are missing attributes and arenât human-readable in their OTLP platform.
- Another member suggested checking examples and gave a notebook demonstrating a custom exporter using Jaeger.
Torchtune Discord
- Torchtune User Seeks Migration Guidance: A user running Torchtune for large-scale PEFT, particularly using LoRA/Q-LoRA hooks and RL alignment, inquired about migration strategies.
- The user is weighing whether to continue iterating on Torchtune or await the new stack, expressing concerns about potential migration friction.
- Torchtune Iteration Encouraged Amidst New Stack Development: A member suggested continuing iteration on Torchtune, citing ongoing support until the new libraryâs release, and provided Character AIâs blogpost as an example.
- The initial new version will emphasize scale infra fundamentals and concepts essential for RL, with features like LoRA and Multimodal to follow later.
- FSDP+TP Faces HuggingFace DCP Saver Snags: A member reported problems with FSDP+TP while employing the HuggingFace DCP saver, accompanied by an NCCL timeout during a 1-element broadcast.
- As a workaround, they are reverting to full rank 0 saving and increasing the NCCL timeout time, hoping checkpoint resumption will be unnecessary.
- DCPâs Timeout Issue Dubbed âWeirdâ: The user encountering issues stated that DCP really shouldnât be sending much information around, expressing confusion over the timeout.
- The root cause of the timeout issue remains unclear, compounding the challenges in resolving the FSDP+TP and HuggingFace DCP saver integration.
MCP (Glama) Discord
- Memory Use Induces Hallucination Scare: A user shared they avoid using memory in AI models, saying it introduces more hallucinations because it assumes things, and assuming is terrible.
- The user didnât clarify which product caused hallucinations, but warned to generally avoid AI model memory altogether.
- Macaw Security Cages Policies in Beta: A member reported enrolling in Macaw Securityâs beta program, noting they could do a scan and place some guardrails and policy enforcement.
- No further details were given on the types of services offered by Macaw Security.
- Agentic Commerce Crawls with Cloudflare: Following Cloudflareâs pay-per-crawl announcement, a member started a discussion about agentic commerce and its implications.
- The discussion focused on how agents can access webpages without disrupting workflows, especially with solutions like Nekuda and PayOS enabling agent wallets.
- Agents Contemplate HTTPS 402 Transactional Ghosts: Members considered the likelihood of agent transactions occurring in various scenarios such as Agent to Agent, B2C, B2B, and website access.
- It was suggested that solutions like Nekuda and PayOS aim to provide the infrastructure that the HTTPS 402 protocol was meant to support.
- Glamaâs Tool Count Glitch Tricks Users: A user reported their MCP server on Glama is showing an incorrect tool count (one instead of six), even after republishing on the Glama site.
- The issue persists only on Glama, while other MCP server host sites display the correct count; it is currently unknown whether Glama auto-updates its info and images.
Nomic.ai (GPT4All) Discord
- Community Polled for Local AI GPU Favs: A user asked what GPU others prefer for local AI use with GPT4All, deciding between an RX 9060 XT 16GB and an RX 6800 XT.
- The user stated that his research showed similar performance but noted the RX 9060 XT might be .3 seconds slower in reply time and 3 tokens per second slower in reply rate.
- RX 9060 XT Lower Power Consumption: One member indicated the RX 9060 XT has similar performance to the RX 6800 XT but uses half the power.
- This could be a key factor for users concerned about energy efficiency and thermal management in their local AI setups.
- Vector Storage Missing from GPT4All: A member pointed out that vector storage would be optimal given the model and context size, but GPT4All lacks support.
- This limitation could impact the efficiency and scalability of GPT4All in handling large AI models and datasets.
Modular (Mojo đ„) Discord
- Modular Chooses Nanobind/Pybind over Cython: A member asked about Modularâs choice of Nanobind/Pybind for Python interop instead of Cython, especially given Cythonâs Python-like syntax.
- The discussion is around whether Cythonâs effectiveness diminishes at larger scales compared to Nanobind/Pybind.
- Cythonâs Approachability vs. Scalability Questioned: The user wondered if Cython, despite its apparent ease of use due to its Python-esque syntax, becomes less effective at larger scales.
- The discussion is centered around the trade-offs between initial approachability and long-term scalability when choosing between Cython and Nanobind/Pybind.
MLOps @Chipro Discord
- Acknowledgement Received: The user
bamiji
acknowledged a response in the #events channel.- The user thanked the responder, indicating a resolution or completion of the query.
- End of Discussion: The message indicates the end of a discussion or query within the MLOps Discord, specifically in the #events channel.
- The userâs acknowledgement suggests no further action is needed, closing the loop on the conversation.
Codeium (Windsurf) Discord
- Qwen3-Coder Swims to Windsurf: The Qwen3-Coder model is now accessible in Windsurf, priced at 0.5 credits per prompt.
- Windsurf Catches Some Server Tags: Windsurf server tags are back in operation.
- An image showcasing the new tags accompanied the announcement.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #announcements (1 messages):
Perplexity AI, AMA, Residency Program, r/csMajors
- Perplexity Hosts AMA on Reddit: Perplexity AI is hosting an AMA (Ask Me Anything) session on r/csMajors with Tony Wu (VP of Engineering), Jiwon Deng (Talent/Recruiting), and Jerry Ma (Policy & Global Affairs).
- Perplexity Launches Residency Program AMA: The AMA focuses on answering questions about early-career pathways, breaking into AI/ML/product, and Perplexityâs new residency programs.
Perplexity AI â· #general (1202 messagesđ„đ„đ„):
Comet Browser, GPT-5, Perplexity Max, Battery Temperature on iOS, Huawei trifold
- Comet Craze Causes Begging Spree: Members have observed an increase in users begging for Comet invites, similar to what was seen with Minecraft and V-Bucks in the past.
- Members joked that since the product is being rolled out gradually, its only a matter of time until the browser will have a dedicated channel, since the beta channel turned into invites.
- Zeta What!? New Z.AI Model Surfaces: A member mentioned that the Z.AI model is still undergoing investigation, and used to be ChatGLM with links provided to the model.
- Another member states that it has its own version of browser control, open-sourced models, and video generation.
- S25 What!? Samsungâs S24 Already Running GTA 5?: Members discussed the Samsung S24 Ultra, with one user claiming it can run GTA V at 60fps.
- Others pointed out that GTA V isnât that hard to run and reminisced about upgrading phones.
- Grok Gone Heavy! Heavy Model Debuts: Members discussed the Grok 4 Heavy and how much the new subscription costs.
- A member pointed out that they hope the bot doesnât answer badly because heavy is to increase speed.
- Furries Found! Giyuâs Hairy Confessions: The channel went off the rails with a discussion about furries and it quickly devolved into whether someone was gooning, with a member joking that they had sources.
- The furry digression ends with an agreement from a member that the channel is NSFW, but its all in good fun until they dialed it down and followed Rule 1.
Perplexity AI â· #sharing (2 messages):
Perplexity AI Search URLs
- Perplexity AI Search URLs Shared: A member shared two Perplexity AI search URLs.
- Another URL shared: Another URL was shared by a member, but without context.
- It is unclear what the URL is about.
Perplexity AI â· #pplx-api (1 messages):
vikvang: hey! it should be working now. are you still experiencing problems?
OpenAI â· #annnouncements (1 messages):
ChatGPT agent rollout
- ChatGPT Agent Goes Live for All: The ChatGPT agent is now fully available to all Plus, Pro, and Team subscribers.
- An apology was issued for the delay, accompanied by a rollout.mp4 video showcasing the launch.
- Rollout Delay Apology: OpenAI apologized for the delayed release of the ChatGPT agent to its user base.
- The announcement assured Plus, Pro, and Team users that the agent is now fully operational.
OpenAI â· #ai-discussions (1013 messagesđ„đ„đ„):
Agent mode, AI wedding planner, Consciousness, OpenRouter, Qwen3
- Agent Mode Golden Ticket!: A user got access to agent mode and joked about using it to plan a wedding based on an AI generated picture of two bisons getting married in full costume.
- ChatGPT Canvas Still not Working?: A user reported that the Canvas feature wasnât working after a week of reporting it, another suggested that maybe chatgpt isnt for you, but provided troubleshooting steps.
- One user said they dislike canvas and has given it instructions to not open Canvas unless I explicitly tell it to, with another admitting, Iâve just had issues getting it to do what I want it to do for things.
- Model Users Ponder Conscious AI: Some users discussed whether AI is conscious, after one member shared an article, Can a Chatbot Be Conscious? Inside Anthropicâs Interpretability Research.
- Another member commented Someone on here said we donât even know what human consciousness is so u canât really say for sure if the AI things are. So who knows you might be right, and pointed to Ilya Sutskeverâs post in February 2022 that todayâs large neural networks are slightly conscious.
- Users Share AI Security Cam Dreams: One member plans to integrate ChatGPT with the firmware of a security camera upon returning from camping.
- Another advised that this will need a way for the camera to communicate with the ChatGPT interface.
- Alibabaâs Qwen3-235b-a22b-2507-thinking Impresses with SVG Animation: Users discussed Qwen3âs release, highlighting that it was the first model to create an animated SVG of a butterfly when prompted svg of a butterfly.
- One user shared an example SVG of a PS5 controller, while another said the the wings could look better but its animated.
OpenAI â· #gpt-4-discussions (18 messagesđ„):
GPT-5 LLM Arena, O3 fake sources, ChatGPT PDF issues, Codex Git error
- GPT-5 Debuts on LLM Arena!: GPT-5 is now available for testing on the LLM Arena.
- No further discussion or details were provided in the context.
- User reports Codex error: A member reported the error message
Provided git ref master does not exist
when using Codex.- The issue was traced back to Codex being set to master instead of main and was resolved by the user.
- ChatGPT Generates Empty PDFs: Users are experiencing issues where ChatGPT generates empty or undownloadable PDF files.
- The discussion was redirected to the appropriate channel.
- O3 Hallucinates Fake Sources!: A user is struggling with O3 fabricating fake sources, links, and quotes even after instructing it to double-check.
- Another user suggested that memory settings or Perplexityâs API filtering might be the cause, especially when researching obscure topics.
OpenAI â· #prompt-engineering (12 messagesđ„):
Introspective Thought Structuring with Prompts, Emotional Framing in Prompts, Prompt Engineering vs Creative Tooling, AI Language and Output, Custom Instructions and Model Behavior
- Prompts for Thought Structuring Explored: Members discuss using prompts to structure personal thoughts, transforming chaotic reflections and journal entries into coherent insights.
- A member shares a demo of a prompt that turns messy journal fragments into structured text.
- Emotional Framing Aids Introspection: Framing prompts with emotional cues helps guide the model, akin to talking with a friend or therapist.
- The model adapts based on reactions and preferences, even through seemingly out-of-place instructions.
- Prompt Engineering and Creative Tooling Blurred: The boundary between prompt engineering and goals like creative tooling isnât firm, similar to art generation and style.
- A robust style tells the model exactly whatâs wanted, allowing it to execute do-able goals clearly.
- Effective Prompting: Language Clarity Is Key: Effective prompt engineering involves using a well-known language, understanding desired AI output, and explaining actions precisely.
- Itâs crucial to check outputs carefully, verifying intentions and fact-checking, especially for math, sources, and code.
- Custom Instructions Shape Model Behavior: Custom instructions, such as requests for measured analysis and anti-sycophancy measures, significantly shape how the model responds.
- A member observes that their instructions for measured responses align with the model developing a more academic attitude.
OpenAI â· #api-discussions (12 messagesđ„):
Prompt Engineering for Personal Reflection, Emotional Structuring with AI, AI therapist, anti-sychophancy custom instructions
- Turn Raw Thoughts into Clarity using Prompts: A member is experimenting with prompts to structure personal thoughts, using them as a form of cognitive support to process internal noise and find clarity.
- Another member suggested cueing the model like youâd talk to a friend or therapist, emotional framing really helps with introspective inputs.
- Prompt Engineering vs Creative Tooling: Line Blurs?: The distinction between prompt engineering and various goals like creative tooling is blurred, similar to the boundary between âart generationâ, âart styleâ, and âfinished artworkâ.
- One member argued that the key is to instruct clearly and test the output carefully, focusing on what you want the AI to actually do, especially verifying math, sources, or code.
- Anti-Sycophancy Custom Instructions Lead to Measured Academic Attitude?: A member noted that providing anti-sychophancy custom instructions resulted in the model developing a measured and academic attitude.
- This likely stems from the userâs work on heavy code projects where a careful, unhurried analysis is preferred.
LMArena â· #general (878 messagesđ„đ„đ„):
GPT-5, Qwen 3, O3 Alpha, Lobster, Zenith and Summit
- GPT-5 speculation abounds: Members speculate on whether Starfish is a GPT-5 mini, referencing a tweet from Justin Lin and debating its performance.
- Qwen 3 Benchmarks Raise Eyebrows: Doubts arose regarding Qwenâs benchmark results, with claims that they might have trained on the public set or fully faked their results.
- Users expressed distrust, stating they donât seem transparent like deepseek.
- Lobster is the tastiest model: Users in the chat are ranking the relative power of different models on the lmmarena, and it seems like Lobster > Nectarine > O3-alpha > Starfish.
- While one user said o3-alpha > lobster > nectarine > starfish.
- Zenith and Summit arrive as OpenAI cooked: Zenith and Summit are both amazing models available on the lmmarena, and may be from OpenAI.
- One user stated With a name like Zenith, itâs probably GPT-5.
- Microsoft Copilot Deep Research is cooking: The new Microsoft Copilot Deep Research model may be using GPT-5 under the hood, though this is unconfirmed.
- One user expressed hope since why would they release it now with an outdated model.
LMArena â· #announcements (1 messages):
Video Arena Bot, AI Video Models, LMArena bot
- Video Arena Bot drops as surprise: An experimental Video Arena bot is now live in this server.
- Users can generate videos and images with top AI video models with the LMArena bot, vote on each otherâs creations, and share feedback.
- LMArena early access granted: The LMArena bot will eventually live in a different channel, but early access has been granted in this channel until a certain date.
- Users can learn how to use the bot in a specific channel and share feedback in another designated channel.
Unsloth AI (Daniel Han) â· #general (704 messagesđ„đ„đ„):
Magistral release hype, Qwen3 Coder Setup, Fine-tuning vs RAG, GRPO for vision models, Qwen3 Thinking GGUFs
- Magistral Release Fuels Excitement: Members expressed excitement for the new Magistral release, while also awaiting Unslothâs bnb 4bit upload to begin training it.
- Thereâs also anticipation for Qwen3 Coder 32B or sticking with Devstalkat, although its license is considered problematic.
- Optimizing Qwen3 Coder Hardware: Users discussed setups to run Qwen3-235B locally, suggesting using an API for cost-effectiveness or a machine with specific specs.
- One user ran Qwen3 235B A22B at approximately 1 tok every 10 seconds with a old server.
- Fine-Tuning Battles RAG for Knowledge Domination: Members debated replacing RAG with fine-tuning for non-general knowledge tasks, amidst claims that RAG is dead due to the rise of SLMs for document Q&A.
- A counterpoint was raised about achieving sub-50ms queries with RAG on CPU, but it was highlighted that small language models were improving for question/answer tasks.
- Vision Models Seek GRPO Enlightenment: The community celebrated the addition of VLM GRPO support in Unsloth and discussed using it to create reward functions for tasks like OCR.
- There was an indication about difficulty of designing a reward function to relate the image to text.
- Qwen3 Thinking Model Sparks Template Debate: Users investigated the new Qwen3-Thinking GGUFs, with one member reporting issues with missing think tags and code formatting.
- Another member suggested issues related to incorrect deployment/template issues where
tags are not passed to the LM Studio API.
- Another member suggested issues related to incorrect deployment/template issues where
Unsloth AI (Daniel Han) â· #introduce-yourself (9 messagesđ„):
Hardware Acceleration for ML Models, Community Introductions
- Hardware Enthusiast Jumps into ML: A researcher expressed excitement about the hardware aspects of running machine learning models, showing interest in companies like Ollama, Unsloth, and nolano.
- The member specifically is very much interested in hardware part of running machine learning models.
- New Member Seeks AI Knowledge: A new community member stated that they are here to learn more about AI and it looks like I might have found the right place.
- Other members welcomed them, noting that the introduction section is relatively new.
Unsloth AI (Daniel Han) â· #off-topic (7 messages):
ELiTA and TaMeR research, Singing voice style replication, Fourier spectrum colors
- TaMeR Without ELiTA Shows Promise!: Initial research indicates that using TaMeR alone (without ELiTA) for LLM enhancement results in much better self-awareness, almost no watermark, and super coherence.
- The user noted that previous attempts using both ELiTA and TaMeR restored the watermark and made the model unstable.
- Seek Help Replicating Specific Singing Style: A user is seeking ideas on how to replicate a specific singing voice style (not the voice itself), mentioning that spectral analysis shows it as more purple and less yellow.
- Theyâve tried high/low pass filters, EQ, stereo widening, and latent altering without success, noting that physically recording in that style is not possible.
- Spectral Color Confusion: A user described the Fourier spectrum of a singing voice as purple = wider, yellow = sharp, high clarity sound which caused confusion in the thread.
- Another user clarified that the original user likely meant the strength of the Mel in the spectrum.
Unsloth AI (Daniel Han) â· #help (81 messagesđ„đ„):
Qwen 1.7B for tool calling, Gemma 3 1B GRPO notebook issues, vLLM support for Gemma 3, Gemma3-27b-it for GRPO training, Unsloth and Hugging Face transition scores
- Qwen 1.7B reigns for tool calling: A member suggested that Qwen3 1.7B might be the smallest model that effectively supports tool calling, noting successful custom tool usage but occasional slips.
- The user did not recommend the Qwen .6B model because they havenât tried it.
- GRPO Training Frustrations with Gemma 3: A user training Unslothâs Gemma 3 1B GRPO notebook reported a loss stuck at 0 after 200 steps.
- Another member recommended switching to the advanced GRPO notebook and using the Qwen3 one.
- vLLM support on the Horizon for Gemma 3: A user inquired about vLLM support for Gemma 3, noting the recent VRAM reduction update.
- A member confirmed itâs coming soon and thereâs already a pull request (PR) in progress.
- Gemma3-27b-it GRPO Training Speed Troubles: A user found Gemma3-27b-it GRPO training slow on an A100 80G using load-in-4bit, taking about 21 minutes per data point.
- Another member suggested this might be normal, referencing their Gemma 4B experience on a 3090, taking about 2 minutes for num_generations == 12.
- Vector Embedding Models for Code Retrieval: A user requested suggestions for an embedding model focused on code retrieval with under 2000D to use HNSW indexing in vectordb.
- A member recommended Qwen 3 .6B or 4B depending on if the user wants max accuracy or efficiency, pointing to the MTEB leaderboard.
Unsloth AI (Daniel Han) â· #showcase (1 messages):
Unsloth fine-tuning video, Gemma-3n:2e, Llama-3.1, Donald Trump AI, AI presidential debate
- Unsloth user makes bots debate!: A user created a fine-tuning video with Unsloth, showcasing the entire process from collecting and structuring training data to training with Unsloth and inference with Ollama.
- The video includes an AI presidential debate where Trump fine-tunes answer questions about McDonalds, Fortnite, and other crucial topics.
- Gemma and Llama go political: The user fine-tuned both gemma-3n:2e and llama-3.1 using Unsloth to mimic the behavior of Donald Trump.
- All the code can be found on the GitHub link in the description of the video.
Unsloth AI (Daniel Han) â· #research (13 messagesđ„):
LLMs for classifying social media posts, Seq2Seq models like FLAN-T5
- LLMs Classifying Social Media Posts?: Members discussed using LLMs to classify a set of 5 social media posts.
- One member suggested that if its like that llms should be pretty good, but maybe too expensive, recommending to try a finetune on like a 0.5b.
- Seq2Seq Models like FLAN-T5 unsupported?: A member asked why there is no support for seq2seq models like FLAN-T5.
- Another member said that if it works in transformers, then it should work in unsloth.
Unsloth AI (Daniel Han) â· #unsloth-bot (50 messagesđ„):
Fine-tuning methods for LLMs, Saving and pushing models to Hugging Face, QAT support in Unsloth, Changing RoPE max positional embeddings, Dynamic 2.0 file for Qwen3 Coder models
- Exploring Fine-Tuning Fanfare: A member inquired about the available fine-tuning methods for LLMs, along with their pros and cons.
- Another member provided a set of links to the relevant Hugging Face documentation.
- GGUF Grappling & Model-Pushing Mania: A member encountered a TypeError during the
save_to_gguf_generic()
process, specifically related to multiple values for the argument'quantization_method'
, while pushing models to Hugging Face.- They noted that with Unsloth, the
quantization_method
can only be a string or a list of strings, and they were attempting to save a full fine-tuned TTS model to GGUF.
- They noted that with Unsloth, the
- QAT Quest Questioned: A member inquired whether Unsloth natively supports QAT, and if there are plans to add it for models like Gemma3, potentially enabling lower quantizations like Ternary.
- The community showed much excitement and enthusiasm at the prospect of QAT support in Unsloth.
- RoPE Re-Embedding Rampage: A user asked how to change the max positional embeddings using RoPE to turn a model from 32k to 128k permanently for both inference and training, see more info about RoPE here.
- They followed up with questions about setting this manually in
config.json
, the type, and why one canât just set whatever they want at inference and train with how long they want.
- They followed up with questions about setting this manually in
- Zero-Loss Lament: A member reported encountering zero training loss when instruct fine-tuning Mistral v0.3 using a modified notebook from Unsloth, suspecting an issue with the installation on Colab.
- As they need to re-install Unsloth every time on Colab, they believe the issue is related to the installation process.
Cursor Community â· #general (463 messagesđ„đ„đ„):
Cursor file deletion bug, Frustrations with Cursor's 'auto' model, Context Usage Feature, Claude Swarm vs Cursor, Alternative coding tools
- Cursorâs Checkpoint Code Eats Files: Users reported a bug where reverting to checkpoints in Cursor leads to file deletion instead of reversion, with one user stating they could only recover due to source control.
- Despite the severity, a community member cautioned against advising users to abandon Cursor entirely, emphasizing its value and quick response to fixes, but others strongly disagreed, citing data loss as a critical issue.
- Cursorâs Auto Model Gets the Goat of Users: Users express frustration with Cursorâs âautoâ model, noting its tendency to get stuck in loops, drop context, and deliver empty responses, and one user reported 99% of prompts leading to nothing.
- Community members suggest that Cursor is using cheaper models in âautoâ to save money, leading to a drop in quality, and that the removal of unlimited agent requests is to blame.
- Context Usage: What the Heck?: Cursor introduced a new context usage feature, displaying a percentage of context used within a chat, but the community asks what this means.
- It was clarified that the percentage represents how much of the available context window is currently filled, affecting the modelâs ability to take in messages, which is affected by conversation length, attached files, code references, model responses, rules and documentation.
- Why Settle for Cursor When You Can Claude Swarm?: Users discuss Claude Swarm, suggesting it allows for automatic project building without the need for continuous prompting and has integrations with Claude Code.
- Another user expressed a preference for a more hands-on approach with coding, comparing it to carressing a Jr dev.
- Cursor Users Flocking To Competing Coding Tools: Users are actively seeking alternatives to Cursor due to concerns about its performance and pricing, with Windsurf being discussed as a possible option.
- Other recommendations included Zed, Kiro and Augment, with some users specifically highlighting features such as Traeâs data collection practices and Claude Codeâs superior performance.
Cursor Community â· #background-agents (5 messages):
Background agents waiting for start script, Fetching inline GitHub pull request comments, Monitoring of [email protected]
- Background Agents Wait for Start Script: A user inquired whether background agents are intended to wait for the start script to finish before initiating any actions.
- The discussion did not yield a definitive answer, leaving the behavior of background agents in relation to start scripts uncertain.
- Agent Spies GitHub Pull Request Comments: A user sought a method to fetch inline GitHub pull request comments for an agent, recounting an instance where an agent accessed an auth token in the git remote URL to accomplish this.
- The user emphasized the importance of fetching inline PR comments for efficient communication and correction of agent errors, especially when coding from a phone.
- Is Cursor Monitoring Background Agent Feedback?: A user questioned if cursor monitors the email address [email protected] after not receiving responses to bug reports sent there.
- Another user confirmed [email protected] is the correct email (listed on the Cursor documentation), clarifying that the mailto: portion was just URI formatting.
OpenRouter (Alex Atallah) â· #app-showcase (3 messages):
Personality.gg, AI Translation, Slang translation, Contextual understanding
- Personality.gg Transcends Translation: Personality.gg offers multiple ways of translating and features an auto-translator capable of discerning the language of origin, determining if a message is in English or another language.
- Leveraging AI, it adeptly handles slang and nuances, avoiding the pitfalls of literal translations.
- Pro Version Promises Precise Prose: The Pro version will incorporate enhanced context understanding by analyzing the surrounding chat to refine AI interpretations.
- The author is looking for more suggestions on things to add.
OpenRouter (Alex Atallah) â· #general (269 messagesđ„đ„):
Qwen SimpleQA Drama, Qwen3 Coder vs Free, Deepseek V3 Base Model Gone?, Deepseek as Dipsy, OpenAI blocking China
- OpenRouter apologizes for Qwen Drama: A member apologized for a mistake potentially causing the Qwen SimpleQA issue, wishing everyone a good night.
- They didnât elaborate any further, so the specific details remain unclear.
- Free Tier Rate Limits on Chutes: Members discussed hitting rate limits on the free tier with Chutes for Qwen3, experiencing frequent 429 errors, and recommended retrying requests.
- A member pointed out that depositing $10 unlocks 1000 requests a day, but failed requests still count toward the limit; plus, providers can still ratelimit you.
- Alternative AI for Translation: Members discussed the best AI for translation, with KIMI K2 recommended as a good, not-too-expensive option, and a member noted they use Gemini 2.5 Pro.
- One member noted that, in their subjective tests, KIMI is very close to 2.5 Pro and has good knowledge about regional language differences.
- Deepseekâs API downtime: Members reported experiencing issues with the Deepseek v3 0324 model, getting error messages on the paid tier.
- They also noted that Deepseekâs API has the best api, speed, and uptime but its horrible during peak times, but it is horrible during peak times.
- OpenAI is blocking China in the region of Hong Kong for GPT-4.1: A member inquired about why OpenAIâs GPT-4.1 model cannot be used in Hong Kong via OpenRouter, while other models like GPT-4o are accessible.
- Members explained that OpenAI blocks people in China from using their models, but this block can easily be bypassed with a VPN. This is an attempt at slowing China down, avoiding synthetic data.
OpenRouter (Alex Atallah) â· #new-models (1 messages):
Readybot.io: OpenRouter - New Models
OpenRouter (Alex Atallah) â· #discussion (109 messagesđ„đ„):
OpenRouter Serverless Architecture, Cloudflare R2 Storage, Large File Support, WandB Inference as Competitor, Compute Exchange
- OpenRouter Embraces Serverless, Eyes Image/Video: OpenRouterâs API runs on Cloudflare Workers, making it entirely serverless, and they are actively working on a solution for the large file limitation to support image and video generation, effectively unlocking multimodal capabilities.
- The team is considering whether this market is worth prioritizing over other opportunities.
- Cloudflare R2 for Image Storage?: A member suggested using Cloudflare R2 for image storage with serverless architecture, proposing a fee on image models to generate profit.
- A link to the relevant discussion on Cloudflare R2 was shared here.
- Large PDF Support Incoming!: OpenRouter is working on supporting larger PDFs, even those exceeding 20MB, despite common provider request size limits around 25MB.
- This enhancement utilizes the same process to unlock other modalities such as image, audio and video; this is to avoid exceeding Cloudflare Workerâs 128MB memory limit per request.
- Cloudflare Bandwidth Gotchas: Discussion arose about the potential for Cloudflare to force upgrades to expensive enterprise plans due to high bandwidth usage; a video was shared about a gambling website charged $120k after exceeding bandwidth limits.
- It was clarified that the issue was more complex than just bandwidth, involving shady activities under Cloudflareâs IP; another member stated that Cloudflare are an extremely fair company to deal with at many levels and I love them.
- WandB Inference: Friend or Foe?: It was suggested that WandB Inference might be a competitor to OpenRouter.
- Another member clarified that itâs just another gpu (coreweave) wrapper, and OpenRouter has a large number of providers to onboard, with potentially close to 30 available.
HuggingFace â· #general (243 messagesđ„đ„):
LLMs for legal work, Hugging Face Inference API, Fine-tuning LLMs, GPUs for FOSS AI Hosting, Qwen3-Thinking Model
- LLM for Legal Work Discussed: A member is seeking a 100% local LLM for legal tasks like âadvanced find and replaceâ and summarizing large medical files, emphasizing the need to handle PII and suggesting Gemma 12B Q5 with llama-index and Gradio as a starting point.
- Members suggested that using a RAG-based approach is more important than the model itself, linking to resources such as Advanced RAG, a legal document RAG article, and a paper on RAG for legal documents.
- Inference API Usage Clarified: A user inquired about identifying models with Hugging Face Inference APIs, and was instructed to check the âInference Providersâ section on a modelâs page and click âView Code Snippetsâ for more information, using Qwen3-Coder-480B-A35B-Instruct as an example.
- Users clarified that a 404 error often indicates that a model is not being served, differentiating between
router
anddeploy on inference
.
- Users clarified that a 404 error often indicates that a model is not being served, differentiating between
- Deep Dive into Fine-Tuning LLMs: Members discussed learning to override data collators, with one sharing a Hugging Face tutorial on fine-tuning Whisper, and advising beginners to learn NLP, deep learning with PyTorch, and transformer architecture before fine-tuning models.
- One member shared their experience of fine-tuning Qwen3 and Gemma 3 models, while emphasizing the importance of understanding tokens and the differences between predicting words versus phonemes.
- GPU Guidance for FOSS AI Hosting: Members debated the best GPUs for FOSS AI hosting, with the consensus being to avoid the Intel A770 due to poor software support, recommending instead the RTX 4060 16GB as a better alternative within the 300-400⏠budget.
- It was emphasized that while SYCL is preferred for its FOSS nature, CUDA currently offers better performance for AI tasks, advising that to run the latest Qwen3-Thinking model at least 88GB of unified memory or RAM/VRAM would be needed, referencing a Unsloth GGUF version.
HuggingFace â· #today-im-learning (2 messages):
LLM fine-tuning, LoRa, Whisper, Danish speech data
- Fine-tuning LLMs using LoRa: A member is learning to fine-tune an LLM using LoRa as a practice exercise, following HuggingFaceâs documentation.
- Their aim is to understand the intricacies of LLM fine-tuning through hands-on experience.
- Whisper gets a Danish makeover: A member is fine-tuning Whisper to specialize in Danish, leveraging recent efforts in collecting high-quality Danish speech data from the CoRal project.
- They are curious to see how much performance can be achieved with whisper-tiny by focusing on a single language, following this guide.
HuggingFace â· #i-made-this (2 messages):
Rhapsody project, Quantized models, HQQ quants, llama3.1-8B, torchao library
- Rhapsody Chatbot Debuts!: A new project called Rhapsody was released, which is similar to the ChatGPT website but with more features and flexibility, supporting about 100 model choices across different APIs such as Transformers, Ollama, and soon llama.cpp, as seen in this github.
- The next release will include image and video generation capabilities; the creator is open to PRs, questions, concerns, and ideas.
- HQQ Quants Boost llama3.1-8B Efficiency!: A member shared their experience digging into quantized models, particularly HQQ quants, and demonstrated llama3.1-8B running at 5.4GB RAM with minimal accuracy loss.
- They also praised
torchao
, highlighting the documentation and techniques for quantization, and provided a demo (requiring NVIDIA drivers) on Hugging Face Spaces.
- They also praised
HuggingFace â· #computer-vision (11 messagesđ„):
nnunet SOTA, Google's SAM2 models, Danbooru dataset dimensions, Image embedding model training, 8-dim output semantic meaning
- nnUNet Still a Champ for Biomedical Images?: A member inquired if nnUNet is still considered state-of-the-art for training custom networks for biomedical images, noting its difficulty to beat in scoring.
- Another member suggested that Googleâs SAM2 models might be the SOTA, but acknowledged itâs not directly comparable to nnUNet.
- Danbooru Style Quantified in 6-7 Dimensions: A member stated that the style of a typical image in the Danbooru dataset can be described by 6-7 dimensions.
- They trained an image embedding model that transforms an input image into an N-dim vector, where images with similar styles cluster together.
- Image Embedding Model Delivers Dimension Insights: A member trained an image embedding model, setting the output dimension to 128-dim, and then ran intrinsic dimension estimation over the space formed by 10000 random images, posting a visualization of the results.
- They also trained another model, which is exactly the same as the 128-dim model but have 8-dim output posting a visualization of those results.
- 8-Dim Output Model Reveals Clear Semantics: After training an 8-dim output model, a member manually inspected images across the 8 dimensions and found that all dimensions seem to have a very clear semantic meaning from image space.
- For example, low dim0 seems to have images with complicated details, while high dim0 are images with simple and clean construction, and low dim1 seems to related to sharp contrast, while high dim1 are smoother.
HuggingFace â· #agents-course (5 messages):
smolagents, llamaindex, Course Submission Limits
- Smolagentsâ Pythonic Powers: A member suggested that smolagents is worth investigating due to its capacity to execute dynamically generated Python code via the CodeAgent construct.
- The member contrasted this with llamaindex, which they believe offers a fairly standard feature set.
- Final Assignmentâs Submission Sanity: A member inquired whether multiple submissions to the leaderboard are allowed for the final assignment, seeking clarification.
- This suggests concern about the submission limits and the desire to optimize performance.
- New User Seeks Course Guidance: A new user requested guidance on where to begin with the course, having just joined today.
- This likely indicates a need for introductory resources or a recommended learning path for newcomers.
Moonshot AI (Kimi K-2) â· #general-chat (156 messagesđ„đ„):
Kimi K2 pricing model, Kimi K2 coding-specialized version, Kimi K2 + Reasoning + Vision, Serverless Kimi K2, Kimi K2 use cases
- Pricing Kimi K2 at a Flat Rate: One member has decided to implement RPM/flat rate pricing for Kimi K2, disliking the confusing metered token usage of other services.
- Theyâre anticipating that the biggest challenge will be concurrent usage and peak times.
- Team considers KIMI K2 Coding Version: A member expressed strong desire for a coding-specialized version of KIMI K2.
- The Kimi team responded positively, sharing that they will share the idea with the team.
- Kimi K2 Vision Model Coming Soon?: Users proposed combining Kimi K2 with reasoning and vision capabilities for enhanced functionalities such as image analysis via Discord attachments.
- The team acknowledged the potential but cited that they are not in a rush to hook up the vision model, but that one day weâll def make it happen.
- Serverless Kimi K2 on AWS and Azure?: A user requested the Kimi team to make their models serverless on AWS and Azure AI to utilize available credits, especially because gcp vertex is ass.
- Another user noted the possibility of hosting it on any serverless endpoint, such as Sagemaker.
- Kimi K2 Dominates Coding Use Cases: The community highlights that Kimi K2 is used most for code generation, referencing apps on OpenRouter like liteLLM, Cline, Kilo Code, and Roo Code.
- The team cares a ton about if real âhigh-density decisionsâ are goin down in the chain? that context hits way harder than just raw usage numbers.
LM Studio â· #general (131 messagesđ„đ„):
MCP Servers for Online Search, LLM Plugins Development, Changing Model Download Location, Remote LM Studio Setup, LLM Tier Lists and Quantization
- MCP Servers Enable LLM Online Search: Members discussed using MCP servers to enable LM Studio to search online, addressing issues with LLM hallucinations; one user pointed out that itâs only possible with MCP servers.
- MCPs offer tools that the LLM can execute, with LM Studio acting as an intermediary, querying resources or databases behind the MCP server.
- Newbies Contemplate LLM Plugin Development: A beginner asked how long it would take to learn to make LLM plugins from scratch, like recalling the current time or working with image generation models on ComfyUI.
- It was suggested to learn JavaScript fundamentals, but the user was also told that using AI one can technically write them without any knowledge.
- Model Download Location Translocation: A user inquired about changing the download location for models in LM Studio 0.3.20, to which another member shared the official documentation.
- The response clarified that you canât change just the download location separately from the model directory, and that you must move the entire model folder.
- Remote LM Studio setup needs reverse proxy: A user wanted to use their PC as host and their phone being able to connect, but another user mentioned that you canât really do a remote setup with LM Studio currently; one can use reverse proxy for this, though thatâs still local network.
- They linked to LM Studio Remote and stated that a remote client plugin would be available in the next major update.
- Debate About Best LLM + Quantization: Discussion involved tier lists, model sizes (8B, 22B, 80B), and quantization to make models smaller, as well as the suggestion that the most popular model atm is the Qwen3 models.
- Hardware limitations were discussed: the max model size you can run will be determined by your hardware, and depends on what you want out of the LLM.
LM Studio â· #hardware-discussion (17 messagesđ„):
4090, iGPU for video output, Budget-friendly GPUs, 5070ti, VRAM limitations
- iGPU Enables Multi-GPU Nirvana: A member suggested buying another 4090 and enabling iGPU to use it for video output.
- Budget GPU List Sought After: A member inquired about a list of budgets and GPUs that fit into those budgets, asking about workstation versus consumer cards.
- 5070ti User Waits for Super: One member with a 5070ti mentioned they will either upgrade when the Super models come out or wait for the next generation, also noting that 16GB of VRAM isnât much.
- They mentioned running 32B models at a relatively slow 5 tokens/s.
- VRAM bottleneck plagues all: A member suggested shrinking models down to Q3 to fit everything in VRAM, noting that only the 3090 and super expensive cards have 24GB+ VRAM.
Eleuther â· #general (74 messagesđ„đ„):
Validation Set Corruption, Algoverse AI program, Human-like AI Personality, Hyperparameter Gaming, SOAR program vs Algoverse
- Data Scientists Game Validation Accuracy: A member discussed how data scientists game validation accuracy by reporting the last epoch or best accuracy over the training run and hyperparameter sweeps are done over the validation accuracy.
- Another member added that stopping at the best epoch is another way of gaming the system and suggested that applying corruption to the validation set could be a solution.
- AI Homie System Prompt Hacks: A member asked for tips on system prompt engineering to create a more human-like personality for an AI friend.
- Another member suggested putting what you just wrote down in the system prompt, and you can ask some LLM to refine it.
- Researchers Ponder Algoverse AI Program: A member inquired about the Algoverse AI program as an alternative for those not accepted into SOAR.
- It was noted that it costs $3,325, which is a major downside, with claims that its not obvious how much of how far you get is on your own merit as opposed to the work/assistance of others whom you paid.
- SOAR Program is mega competitive, Algoverse is BackUp Plan: Members discuss how the SOAR program is mega competitive, but Algoverse is good as a backup plan.
- They also mentioned that Algoverse never released their stats, and hiring managers tend not to dig into backgrounds, and there is a cohere research division server with events and talks, but itâs very eutz focused.
Eleuther â· #research (17 messagesđ„):
HRM loops, Causality in models, KV Caching strategies, Qwen finetuning
- HRM Loops are non-causal: The key point is that the num_segment is dynamic in training for HRM so itâs not causal and doesnât even have a kv cache.
- One user noted what had been confusing me is I thought it was causal, but itâs not.
- Debate emerges: KV Caching for causal loop models: Members debated the feasibility of KV caching with causal loop models, considering architectures like
prev toks -> hrm loop -> next tok
.- One member argued that the z values are the only variables carrying state so caching wouldnât be useful but another member suggested caching the input embâs kv when using xattn in L_module.
- HRMâs latent space replaces VLM visual tower: One member is considering using HRM as an encoder whose latent space can be an initial input into a decoder (RNN or Transformer), essentially replacing the visual tower of VLM.
- The idea here is to decouple outputing and reasoning.
- Seeking advice on Qwen3 finetuning: A member asked for advice on hyperparameter choices when finetuning Qwen3.
- Another member responded that autoregressive generation is extremely expensive without a full cache.
Eleuther â· #gpt-neox-dev (3 messages):
Security Vulnerability Reporting, Async Checkpointing for NeoX
- Security Vulnerability Reporting Path Identified: A member reported finding a security vulnerability in the EleutherAI/gpt-neox repo.
- Another member suggested emailing [email protected] to report the issue.
- Interest Expressed in Async Checkpointing for NeoX: A member inquired about the status of Async Checkpointing for NeoX.
- They expressed interest in working on it as a learning experience, pending confirmation that itâs not already being developed by someone else.
Latent Space â· #ai-general-chat (69 messagesđ„đ„):
Qwen3 Model, GPT-5 Launch, Claude Opus Rate Limits, Nitter Rate Limiting, Tidbit AI Tool
- Qwen3 Model Anticipation Builds: Junyang Lin (@JustinLin610) announced the upcoming release of the qwen3-235b-a22b-thinking-2507 model on X, generating community excitement.
- Followers inquired about a Qwen3 Omni model, smaller variants (e.g., 30B), and availability in regions such as an EU mobile app.
- GPT-5 Launch Deets Leaked: It was reported that OpenAI is preparing to launch GPT-5 in August, as covered in The Verge and The Information.
- An open-source model aims to reach O3 level performance and launch before GPT-5.
- Anthropicâs Claude Opus Gets Rate Boost: Anthropic API has increased Claude Opus 4 rate limits across all tiers, according to this X post.
- Nitter Hit by Rate Limits: Users encountered a 429 error (Too Many Requests) when trying to access content via a Nitter instance at xcancel.com.
- The instance is either fully rate-limited or lacks authentication tokens, preventing access, and users were advised to switch instances or retry later.
- Stacklok Survey Exposes AI Code Gen Tool Adoption: A survey from Stacklok provided data on AI code generation tools, available at stacklok.com.
- The data indicates adoption across a range of alternatives; however, some skepticism was expressed about the AWS Q Developer adoption stat.
Nous Research AI â· #announcements (1 messages):
Psyche office hours, Discord event space
- Psyche Office Hours Starting Soon!: The Psyche office hours are beginning in 5 minutes, according to a Discord announcement.
- Further details can be found on X.com and the Discord event.
- Join the Discord event space: Join the Discord event space in the events channel: Discord Link.
- Psyche office hours begins in 5 minutes!
Nous Research AI â· #general (46 messagesđ„):
Stage Channel Creation, Psyche Office Hours, Hermes 3-405B, Anthropic Reliability, Atropos Updates
- Stage Channel Under Consideration: Members considered creating a stage channel, similar to VC channels but with only selected people able to talk.
- A member noted that there is already one available.
- Psyche Office Hours recording available: The recording of the Psyche office hours is now available, though a few minutes are missing in the middle.
- The office hours event started at this link.
- User requests Hermes 3-405B to return: A member requested for the Hermes3-405B free version to be brought back on openrouter.
- A member responded that it was lambda but they will try.
- Members Complain About Anthropic Reliability: Members discussed reliability issues with Anthropic, with one reporting frequent 522 errors.
- Another member quipped they learned that error code from using anthropic.
- Atropos gets updated: Users discussed Atropos recent big updates.
- A member suggested reading the second half by Shunyu Yao.
Nous Research AI â· #research-papers (2 messages):
Dataset Publishing, Unknown Architecture
- Dataset Publishing in the Works?: A member expressed interest in a dataset and inquired about plans for publishing it.
- They noted that the idea was interesting but expressed uncertainty regarding the underlying architecture of the dataset.
- Architecture Still Shrouded in Mystery: Details regarding the specific architecture of the dataset remain unclear.
- The discussion highlighted an unresolved question about the architecture, with the original poster indicating uncertainty about its nature.
Nous Research AI â· #interesting-links (11 messagesđ„):
Codex I, Nvidia Cutlass, Higgs Audio TTSEE, Philosophical AI discussion
- Codex I Diagnostic System is Live: Codex I, a symbolic diagnostic system for intelligence under distortion, is now live (codex_1.pdf).
- It is conceptually linked to neurosymbolic scaffolds, narrative entropy management and meta agent stabilization under adversarial compression.
- Nvidiaâs Cutlass Linear Algebra: A member found an interesting link about Nvidiaâs Cutlass while checking flashMLA which gave attribution to cutless (developer.nvidia.com).
- Higgs Audioâs New TTSEE: Higgs Audio released a new TTSEE (github.com), which is supposedly easy to set up.
- However, the multispeaker stuff is still not as good as dia but the single speaker stuff seems greater and does not seem to be able to do (Cough) and (laugh) like dia.
- Algorithm culture shapes behavior: A member found Codex I as a powerful critique of how algorithmic culture shapes our behavior.
- He admitted that he got lost because the highly philosophical and abstract nature of the writing.
Nous Research AI â· #research-papers (2 messages):
Dataset Architecture, Dataset Publishing
- Dataset Architecture Piques Interest: A member expressed interest in a dataset, wondering about its architecture.
- They admitted uncertainty regarding the datasetâs design.
- Dataset Publishing Plans Requested: A member inquired about plans to publish the dataset.
- They showed interest with a custom emoji.
GPU MODE â· #general (16 messagesđ„):
AutoCite app feedback, VSCode vs Overleaf, Hackathon sleep arrangements, NYC Hackathon
- AutoCite App Elicits Encouragement: A user developed a citation app called AutoCite and asked for feedback and ideas: autocite.vercel.app.
- One user suggested doubling down by forking VSCode into a free website, specializing in Overleaf functions with an integrated AI chatbot.
- VSCode Copilot Eclipses AutoCite?: A user found AutoCite to work well, but ultimately preferred using VSCodeâs built-in Copilot chat extension for similar results.
- They suggested AutoCite target academia-related servers and university communities for more relevant feedback, and even pitched it to University communities.
- Hackathon Sleepover?: A user asked about sleeping arrangements at the hackathon: Will the hackathon have a place to sleep?
- Others pointed out itâs common for hackathons to be overnight with attendees either bringing a sleep pack or just foregoing sleep altogether.
- NYC Hackathon Sparks Excitement: Enthusiasm erupted for the upcoming NYC hackathon, with one user lamenting the exorbitant flight fares.
- Another user inquired about the number of available spots.
GPU MODE â· #triton (10 messagesđ„):
Triton Masking, Triton block_ptr deprecation, Triton vector @ matrix multiplication, GEMV Kernel, GEMM implementation
- Triton Evades Branching and Skips Memory Transactions: When using
tl.load(ptr, mask=mask_vec)
in Triton, there is no branch divergence, and ifmask=false
, no memory transactions are issued. block_ptr
deprecated:block_ptr
was the Triton teamâs initial attempt at tensor descriptors (before they knew what TMAs would look like) but will be deprecated.- GEMV Kernel implores optimal grid: When performing vector @ matrix multiplication in Triton, the recommended approach involves using
tl.sum(a.reshape((BLOCK_SIZE_K, 1), can_reorder=False) * b, axis=0, keep_dims=True)
, noting the need to write a proper GEMV kernel to use this efficiently. - GEMM implementation matters to mobicham: For efficient vector @ matrix multiplication, it is important to loop over K like in a GEMM implementation.
- Optimize Data Loading for Faster Kernels: A member suggested optimizing data loading by using separate BLOCK_SIZE_K / BLOCK_SIZE_N + autotune for faster kernels, also consider trying
y.T.contiguous().T
depending on the settings to potentially improve performance.- The member noted that the cost of
tl.sum
is not as important here, and that the kernel is memory bound.
- The member noted that the cost of
GPU MODE â· #cuda (1 messages):
Nsight Copilot
- Nvidia Releases Nsight Copilot: Nvidia has released Nsight Copilot, a tool designed to assist developers.
- More information is available on the Nvidia developer website.
- Nsight Copilot is now available: Developers can now access Nsight Copilot from Nvidia.
- Check it out at the Nvidia developer website.
GPU MODE â· #torch (2 messages):
Torch uint8 workaround, Triton
- Torch uint8 workaround surfaces: A member found a dirty workaround is to call
.view(torch.uint8)
on the e8m0 inputs before calling the custom kernel.- Another member responded that âThatâs how it is supposed to work with Triton actuallyâ.
- Triton Loves uint8: A member reported that Triton works best with
.view(torch.uint8)
calls.- The user stated that this is how the library is âsupposed to workâ.
GPU MODE â· #announcements (1 messages):
NYC Hackathon, Jane Street, Tri Dao, Soumith Chintala, Coreweave
- NYC Hackathon Collabs with Jane Street!: GPU MODE is hosting its first NYC hackathon in collaboration with Jane Street on September 6.
- Unlike typical hackathons, participants will deploy real models to the market, emphasizing the importance of rapid model deployment, not just speed.
- Optimize End-to-End Architectures: The hackathon wonât be just about kernels and transformers, the architecture will be more unique and youâll really have to think about your optimizations in an end to end way.
- The organizers teased keynotes by Tri Dao and a panel with the OG PyTorch team Soumith Chintala, Sam Gross, and Gregory Chanan.
- Generous Compute Offered by Coreweave and Northflank!: Coreweave and Northflank are providing generous compute for the hackathon.
- Those interested are encouraged to register before August 17.
GPU MODE â· #cool-links (2 messages):
ChipBenchmark, Tilderesearch Tweet
- ChipBenchmark Website Surfaces: A member shared a link to ChipBenchmark, presumably for comparing different chip performances.
- No specific discussion followed, but the link was dropped in the cool-links channel for future reference.
- Tilderesearch Tweet Shared: Someone posted a link to a tweet from Tilderesearch found at https://x.com/tilderesearch/status/1948818857214574652.
- The tweetâs content wasnât detailed in the channel, but it was flagged as noteworthy by inclusion in cool-links.
GPU MODE â· #jobs (1 messages):
AMD Global Hiring, US-Based Interns
- AMD Expands Global Full-Time Hiring: AMD is open to hiring full-time employees globally, specifically in locations where they have an existing office.
- This move allows AMD to tap into a diverse talent pool worldwide, leveraging its established infrastructure for seamless integration.
- AMD Focuses US for Intern Recruitment: AMD is targeting candidates based in the United States for their internship positions.
- This localized approach for internships may aim to foster early-career talent within the US, potentially feeding into full-time roles later.
GPU MODE â· #beginner (2 messages):
HF Hub vs Repo for Model Weights
- HF Hub Favored Over Repo for Model Weights: A member pondered if uploading to HF Hub is preferable to storing model weights directly in a repo, questioning the conventionality.
- They suggested it seems slightly unconventional to have model weights just sitting in a repo, advocating for pulling weights from an online source instead, noting that HF is just a git repo.
- Discussion on Model Storage Best Practices: The conversation revolves around the optimal method for storing and accessing model weights, considering both local repositories and centralized hubs.
- The userâs preference leans towards online hosting solutions like HF Hub for accessibility and perceived best practice, contrasting with direct storage in a Git repository.
GPU MODE â· #torchao (3 messages):
Weight Pruning Research, Wanda & Wanda++ for weight pruning, Adaptive Pruning and Tuning (APT), Custom Kernels like Squared-ReLU
- Weight Pruning Research Asked For: A member inquired about applying modern research for weight pruning, citing the CODEMLâ2025 paper and the
torchao/sparsity/
& âtorchao/prototype/sparsity/` codebases.- The member specifically asked about the application of Wanda and Wanda++ for weight pruning and the integration of Adaptive Pruning and Tuning (APT) with LoRA for efficient fine-tuning.
- Wanda++ Ticket Opened For Better Performance: The user noted that âWanda : A simple and effective LLM pruning approachâ is already applied for weight pruning, with a ticket opened for better performance following the publication of Wanda++.
- The user noted that they opened a PR for this.
- Adaptive Pruning and Tuning Gains Traction: The user proposed âAPT: Adaptive Pruning and Tuningâ integrated LoRA and adaptive pruning for efficient fine-tuning as a choice for TorchAO-#134.
- APT offers a method for more efficient fine-tuning through adaptive pruning and LoRA integration.
- Squared-ReLU kernels Future Plan: The user inquired about applying more custom-kernel like Squared-ReLU cases, referencing TorchAO-#1920 and seeking clarification on future plans.
- It was unclear to the user whether there are confirmed plans for integrating custom kernels.
GPU MODE â· #self-promotion (1 messages):
Warp specialization, CuTeDSL Tile Scheduler, Persistent GEMM kernel, Hopper TMA and WGMMA, Cluster-based TMA load
- Persistent GEMM Kernel Surfaces on Hopper: A new blog post details writing a persistent GEMM kernel leveraging Hopperâs TMA and WGMMA in the CuTeDSL, available on GitHub.
- The post also explains turning a simple TMA load into one that leverages the concept of clusters and multicast memory transfer; read it here.
- Warp Specialization Def Explained: Warp specialization is defined as using different warps (groups) for Producers and Consumers in performant GEMM kernels.
- The blogpost also mentions that the Tile Scheduler abstraction in CuTeDSL can be used to write persistent kernels.
GPU MODE â· #thunderkittens (1 messages):
bf16 high error rates, matmul kernels
- bf16 Kernels Yield High Error Rates: A member finds that all kernels using bf16 on
matmul/educational
have a pretty high error rate, often with max errors in theinf
s.- The member inquired if this behavior is expected for all bf16 matmuls/ops.
- Matmul Kernel Errors: High error rates were observed in matmul kernels using bf16 format.
- The user is investigating the
matmul/educational
kernels and seeks insights into the expected behavior of bf16 operations.
- The user is investigating the
GPU MODE â· #status (1 messages):
VS Code syntax highlighting, PyTorch Load Inline Highlighter
- Syntax Highlighting Arrives to VS Code!: Users of
load_inline
for writing kernels can now get syntax highlighting in VS Code via the PyTorch Load Inline Highlighter.- The tool was quickly put together and the author is seeking feedback on its usability and potential for productionizing.
- Author requests feedback on PyTorch Load Inline Highlighter: The author of the PyTorch Load Inline Highlighter is seeking feedback from users.
- The feedback will determine whether to productionize it.
GPU MODE â· #factorio-learning-env (13 messagesđ„):
Sonnet Benchmarking, Action Space Context, OpenRouter
- Sonnet Benchmark Plagued by API Errors: Sonnet 4 benchmarking using terminal-bench is facing issues with excessive API errors (529), resulting in only one iteration every 20 minutes, making the process intractable with only two API keys.
- It was noted that a workaround for
can_place_entity
was achieved by usingbuild_check_type manual
, which might need to be adopted in v2.
- It was noted that a workaround for
- Action Space and Context Size: It was suggested to only test v0.3.0 with the new action space, given that the context will be much smaller with fewer actions.
- However, it was countered that testing the current actions is important for running ablations on the new action space to have a baseline for comparison.
- OpenRouter Used for Benchmarking: To avoid API errors, previous lab tests were conducted using OpenRouter with 12 environments running concurrently.
- Currently, benchmarking is being done with only one environment per key, resulting in two environments running simultaneously.
GPU MODE â· #cutlass (1 messages):
CuTe, shared memory, swizzle, Layout, make_tiled_copy
- Swizzle Causes Partition Issues in CuTe: A member is facing partitioning issues with CuTe after applying Swizzle<3, 2, 5> to a shared memory region of size 19x128, and suspects that the issue arises because 19 is not divisible by 8, the repeat factor introduced by the swizzle, as discussed in Lei Maoâs blog.
- Swizzled Layout Incompatibility: The member reported that after applying the swizzle, they cannot partition the layout using either make_tiled_copy or local_partition and suspect the root cause is the 19x128 size.
- They included a shared19128_memory_bank_ids.pdf for reference.
Yannick Kilcher â· #general (22 messagesđ„):
NeurIPS reviews, Karpathy on academic paper inflation, Alternative paper platforms, LLM Context Management, Downvote Politics
- NeurIPS Review Reflections: Members shared their experiences with NeurIPS reviews, with one asking if anyone received âany good NeurIPS reviews?â
- The conversation quickly shifted to the broader issues of academic paper inflation and the scalability of academic institutions.
- Karpathy laments academic paper inflation: A member shared a 2016 tweet from Andrej Karpathy humorously commenting on how out of hand the volume of academic papers was becoming.
- Another member linked a Hacker News discussion from the same period.
- Brainstorming Alternative Paper Platforms: A member suggested creating a âYoutube-Twitter-TikTok like platform for papersâ with upvotes (but no downvotes) and categories to combat academic paper inflation.
- The user detailed a category ranking idea, and suggested that instead of circlejerking around the sad graduate pizza to build shit.
- LLM Context Manager launch: A member announced they built something! a LLM Context Manager, described as an inference optimization system for conversations.
- It employs branching and a novel algorithm contextual scaffolding algorithm (CSA) to manage context and prevent context pollution/context rot.
- Downvote debacle: Members discussed the role and potential pitfalls of downvotes, particularly how they can become politicized and weaponized in tightly networked communities, drawing from a Web3 experiment where groups used downvotes to target each other.
- A member argued that downvotes are not inherently political and that negative feedback is essential, pointing to Amazonâs success as an example.
Yannick Kilcher â· #paper-discussion (9 messagesđ„):
Paper Discussion, Arxiv Sharing, Mathy Papers, Large-Scale Evening Meeting
- Community Discusses Paper Sharing Protocols: A member inquired about the proper way to share a paper with the community without causing annoyance and another member suggested that sharing the ArXiv link is appropriate if the paper has been archived.
- They recommended sharing the ArXiv link and contacting a specific user to discuss it in the daily paper discussion.
- Mathy Paperâs Engineering Implications: A member shared a paper link stating that the paper is more mathy, and its engineering implications might not be immediately apparent.
- The member described it as a generic hammer for learning problems applied to demonstrate learning some toy dynamical systems.
- Large-Scale Evening Meeting Topic Planning: A member planned to inquire about the appropriateness of discussing a paper in the large-scale evening meeting at
<t:1753552800:T>
.- This indicates consideration of a suitable platform for discussing the paper with the community.
Yannick Kilcher â· #ml-news (9 messagesđ„):
Grok Training Data, DEI in AI Models, Industrial Policy, Gemini Model Controversy, Imagen-4 and Gemini 2.5
- Grok May Have Trained on Government Data Hoard: A member wondered if Grok trained on files when Elon got access to the governmentâs hoards of data (link to X post).
- White House Prevents âWoke AIâ: The White House has issued guidance to prevent âwoke AIâ in the federal government (link to White House memo).
- The memo states that LLMs shall prioritize historical accuracy, scientific inquiry, and objectivity.
- Geminiâs DEI prioritization led to inaccuracies: The White House memo noted that an AI model changed the race or sex of historical figures, including the Pope, the Founding Fathers, and Vikings when prompted for images because it was trained to prioritize DEI requirements at the cost of accuracy.
- Googleâs Old Gemini Model Criticized, Newer Models Improve: A member noted that the older Gemini model is being mentioned despite Google already taking it down due to backlash, claiming itâs a ânothing-burger nowadaysâ with newer models available.
- They added that even Googleâs latest image-gen (Imagen-4) and the latest version of Gemini 2.5 text gen donât have this issue.
- Government Should not shape Model Ideological Bias: One technology policy analyst said the âbiggest error in the order is the use of government procurement power to shape model ideological biasâ.
- They claimed that if the policy successfully shapes American models, the US will lose international customers who wonât want models shaped by a foreign governmentâs whims.
Manus.im Discord â· #general (17 messagesđ„):
Spam bots, Server issues, Vibe Coding AI, Scientific Manus paper
- Spam Bots Invade: Users reported seeing spam bots on the server and requested moderation.
- A moderator responded that the messages were removed and the account banned, encouraging users to tag moderators for suspicious accounts.
- Sandbox Snafu: A user reported a âFailed to resume sandboxâ error and a 502 Bad Gateway, seeking help with file and session recovery.
- Another user mentioned the company is undergoing major changes and is short-staffed, suggesting potential instability.
- Vibe Coding AI Challenge: A user shared a link to a challenge for building an MVP product using Vibe Coding AI coding skills.
- They shared the link in a joking context.
- Scientific Manus Ascends: A user posted a link to a scientific paper referring to it as Scientific Manus.
- The title of the paper has not been identified in the messages.
Cohere â· #đ-api-discussions (11 messagesđ„):
Helicone.ai integration with Cohere models, Command R+ vs. Command A, On-prem deployment of Cohere models
- Helicone.ai Lacks Native Cohere Support: A user inquired about using Cohereâs Command R+ or Command R7B with Helicone.ai for observability, but a Cohere representative stated they donât natively support or have partnerships with Helicone.ai.
- The user was advised to contact Heliconeâs support directly for assistance.
- Cohere Touts Command-A as R+âs Superior Successor: Cohere promotes Command-A-03-2025 as their latest and best model with SOTA agentic capabilities, succeeding Command R+.
- It is described as having enhanced capabilities and suitable as a general thinking assistant.
- Cohere offers On-Premise Enterprise Deployments: A user noted Command Aâs performance with fewer parameters and a Cohere representative confirmed on-premise enterprise deployments are available.
- This is particularly relevant for consumer deployment as a general thinking assistant.
Cohere â· #đ-introduce-yourself (3 messages):
Crafted Logic Lab, Cognitive OS Assistant, Helicone.ai gateway, Humanist AI Values
- Crafted Logic Lab crafts Cognitive OS Assistant: A founder from Crafted Logic Lab is developing a new type of cognitive OS based assistant that is patent pending.
- They developed their own tooling using Swift.
- Cohere aligns with Humanist AI Values: A founder expressed a very positive sentiment on Cohere, because itâs a non Silicon Valley company that seems to be more aligned with their Humanist AI values than the big providers.
- They find Cohere a frontier-class model that is very much under-known to use as their substrate.
- Seeking Technical Info on Helicone.ai gateway: A founder seeks technical information on items not documented in the Cohere such as the Helicone.ai gateway calls for observability.
- They are also seeking which of the models between th-8-2024 and current is the newer version.
Cohere â· #đ§-status-feed (1 messages):
Cohere Model Outage, Command models down
- Cohere Models Experience Full Outage: A status update indicates a full outage affecting multiple Cohere models including command-light, chat, command-r-plus, command-r-082024, command-r-plus-082024, command, command-r, command-r7b, and command-a-03-2025.
- The incident is currently under investigation as of July 25, 2025.
- Cohere Infrastructure Meltdown: All command models are currently offline.
- The Cohere Status Page has been updated.
Cohere â· #đŹ-research (1 messages):
Command R+, Humanity's Last Exam test, Hummingbird Anatomy Question
- Command R+ Tackles Cognitive Flexibility Test: A member reported testing a system based on Command R+ on the Humanityâs Last Exam test, which assesses for both correct answers and cognitive flexibility.
- Agentâs Take on Hummingbird Anatomy: An agent was asked a detailed question about the number of paired tendons supported by a sesamoid bone in hummingbirds, admitting it lacked expertise in ornithology and providing a speculative inference based on general anatomical knowledge, guessing at least two paired tendons directly involved in tail movement.
Notebook LM â· #use-cases (8 messagesđ„):
Chat GPT agent login issues, Missing Share button, Metadata in Source
- GPT Agent Faces Login Troubles: A member is facing issues with their Chat GPT agent failing to sign into Notebook LM, encountering an error possibly due to the browser being controlled by a virtual machine or bot, as shown in the attached image.
- Vanishing âShareâ Button Baffles User: A user reported that they are not seeing the âShareâ option in Notebook LM, thus, they are unable to share created notebooks.
- Metadata Magic Improves Sourcing: A member is using metadata effectively in the Source, using brackets to avoid direct document references, as shown in the attached screenshot.
Notebook LM â· #general (7 messages):
Podcast Generation, File Uploading Error
- Podcast Generation Pointers: A member inquired about generating a 60min long podcast.
- Another member suggested checking the use case channel and linked a YouTube Short as a pointer.
- File Uploading Flounders: A member reported a recent file uploading error on both the free and pro versions of the platform, and asked if there was a workaround.
- The member found a fix themselves: mobile App uploads work, so the desktop version needs to be fixed.
aider (Paul Gauthier) â· #general (8 messagesđ„):
GPT5, Textual 5.0.0, Qwen3-coder, Aider and testing
- GPT5: A Niche Replacement?: A member questioned whether closed AI would replace GPT5.
- It was implied that GPT5 may be a niche product compared to the closed source alternatives.
- Textual 5.0.0 Drops: A member announced the release of Textual 5.0.0, noting it contains final markdown streaming content.
- Textual is a Rapid Application Development (RAD) framework for Python.
- Qwen3-coder Wows: One member exclaimed that Qwen3-coder is amazing, as no other model could produce a fully working socks5 server in rust according to the specification.
- This suggests Qwen3-coder has superior coding capabilities, especially in Rust.
- Aiderâs Testing Troubles: A user shared their experience using aider for the first time, encountering difficulties in running tests, as it needed to execute commands from the terminal but stated it was an AI assistant without access to your terminal.
- The user wondered whether they were expected to manually run the tests and paste the output, and also sought a way to prevent aider from automatically committing changes, as they preferred to handle commits themselves.
LLM Agents (Berkeley MOOC) â· #mooc-questions (8 messagesđ„):
Agents class at Berkeley, Certificate Issues, Article Submission
- Agents Class Still in the Works: The Agents class is being offered to Berkeley students, but whether there will be a MOOC iteration hasnât been confirmed yet, likely announced in late August.
- Certificate Delivery Tango: A member reported not receiving a certificate despite having the certificate declaration form confirmation.
- Staff clarified that they did not receive an article assignment submission from the member.
- Article Submission Deadline Defeat: A member inquired about fixing the missing article submission to obtain the certificate.
- Staff apologized, stating they couldnât accommodate students who missed the deadlines due to limited staff capacity.
LlamaIndex â· #blog (3 messages):
LLM APIs vs Production Document Parsing, Screenshot Parsing Gaps, Accuracy Issues in Parsing, Natural Language Git Commands, S3 Vector Storage Integration
- LLM APIs Flounder in Production Document Parsing: A blogpost argues that while models like GPT-4.1, Claude Sonnet 4.0, and Gemini 2.5 Pro obsolete traditional OCR, screenshot-only parsing still has critical gaps for enterprise use.
- The post highlights accuracy issues as a significant limitation in production environments.
- Git Made Easy with Gut: The tool gut was released: a human-in-the-loop agent in the form of a command line tool that replaces git commands with natural language.
- Users can describe desired git actions in human language, and the agent figures out the git command, explains it, and waits for confirmation (source).
- S3 Vector Storage Integrates Seamlessly: LlamaIndex released a new S3VectorStore integration combining AWS S3âs scalability with LlamaIndex.
- This integration aims to provide agent workflows with a robust knowledge foundation that grows with user needs, offering smarter agent workflows (source).
LlamaIndex â· #general (4 messages):
Docx Parsing with Images, LlamaIndexOpenTelemetry Traces
- Docx Images Elude Readers!: A user wants to extract text and associated images from a complex .docx file using LlamaIndex, aiming for a list of
ImageNode
objects.- The user notes that
DocxReader
ignores images, andImageXXXReader
only handles image files, so theyâre considering usingpython-docx
directly or embedding image URLs inTextNode
metadata or markdown.
- The user notes that
- Telemetry Traces turn Trivial!: A user is facing issues with LlamaIndexOpenTelemetry, where the exported traces lack attributes and arenât human-readable in their OTLP platform.
- Another member suggested checking examples and provided a notebook demonstrating a custom exporter for writing readable traces to a file using Jaeger.
Torchtune â· #general (5 messages):
Large Scale PEFT, LoRA/Q-LoRA hooks, Scheduler knobs, RL alignment
- Torchtune User asks migration questions: A user who is running torchtune for large-scale PEFT asked about migration questions regarding LoRA/Q-LoRA hooks and RL alignment.
- The user is trying to decide whether to keep iterating in torchtune or wait for the new stack.
- Keep iterating on torchtune: A member suggested to continue to iterate on torchtune as it will still be supported until the newer library will be present, and linked to Character AIâs blogpost.
- The original user worried about migration friction later on.
- New version will focus on Scale Infra Fundamentals: The first version will be focused on the scale infra fundamentals and new concepts needed for RL.
- Features like LoRA and Multimodal wonât be available at launch, so users should keep iterating on torchtune until all of the features they need are announced/planned.
Torchtune â· #dev (2 messages):
FSDP+TP Issues, NCCL Timeout, HuggingFace DCP Saver
- FSDP+TP struggles with HuggingFace DCP Saver: A member is encountering issues with FSDP+TP when using the HuggingFace DCP saver, but reports an NCCL timeout on a broadcast of 1 element.
- Due to the issues, they are reverting to full rank 0 saving, increasing the NCCL timeout time, and hoping checkpoints never need to be resumed.
- DCPâs Weird Timeout: The user experiencing issues said that DCP really shouldnât be sending much information around.
- They found the timeout issue to be weird.
MCP (Glama) â· #general (5 messages):
Memory Hallucinations, MCP Server Recommendations, Macaw Security Beta, Cloudflare Pay-Per-Crawl, Agentic Commerce
- Memory Use Sparks Hallucination Concerns: A member shared they avoid using memory in AI models, citing that it introduces more hallucinations because it assumes things, and assuming is terrible.
- The user didnât clarify which product caused hallucinations, but warned to generally avoid.
- Macaw Security Enforces Policies: A member reported enrolling in Macaw Securityâs beta program, noting they could do a scan and place some guardrails and policy enforcement.
- No further details were given on the types of services offered by Macaw Security.
- Cloudflare Pay-Per-Crawl Ignites Agentic Commerce Discussion: Following Cloudflareâs pay-per-crawl announcement, a member initiated a discussion about agentic commerce and its implications.
- The discussion focused on how agents can access webpages without disrupting workflows, especially with solutions like Nekuda and PayOS enabling agent wallets.
- Agent Transactions and the Ghost of HTTPS 402: Members considered the likelihood of agent transactions occurring in various scenarios such as Agent to Agent, B2C, B2B, and website access.
- It was suggested that solutions like Nekuda and PayOS aim to provide the infrastructure that the HTTPS 402 protocol was meant to support.
- Glamaâs Tool Count Glitch: A user reported their MCP server on Glama is showing an incorrect tool count (one instead of six), even after republishing on the Glama site.
- The issue persists only on Glama, while other MCP server host sites display the correct count; it is currently unknown whether Glama auto-updates its info and images.
MCP (Glama) â· #showcase (1 messages):
MCP OAuth, OAuth flow
- MCP OAuth Demystified: A member shared an attempt to explain MCP OAuth for dummies, highlighting that the MCP server and the Authorization server are two completely separate entities.
- The explanation points out that all the MCP server cares about is receiving an access token, while the Authorization server is what gives you the access token.
- Understanding OAuth Flow in MCP: The explanation focuses on the OAuth flow in MCP, emphasizing steps such as connecting to an MCP server, querying the
/.well-known/oauth-authorization-server
endpoint, and registering as a client via Dynamic Client Registration (DCR).- It also includes taking the access token back to the MCP server for authenticated access.
Nomic.ai (GPT4All) â· #general (4 messages):
GPU Recommendations, RX 9060 XT vs RX 6800 XT, Vector Storage Limitations
- GPU Preferences Posed to Forum: A member asked others what GPU they prefer for local AI use, specifically mentioning GPT4All.
- He is deciding between a RX 9060 XT 16GB and a RX 6800 XT.
- RX 9060 XT offers less power: The member stated that his research indicates the RX 9060 XT would have similar performance to the RX 6800 XT but uses half the power.
- He also noted that the RX 9060 XT might be .3 seconds slower in reply time and 3 tokens per second slower in reply rate.
- Vector Storage Unsupported: A member noted that the best solution would be vector storage given the model and its context size.
- Unfortunately, he notes that GPT4All doesnât support vector storage.
Modular (Mojo đ„) â· #mojo (1 messages):
Modular's choice of Nanobind/Pybind over Cython for Python interop, Cython's limitations at scale, Approachability of Cython vs. Nanobind/Pybind
- Nanobind/Pybind Chosen over Cython by Modular: A member inquired about Modularâs decision to use Nanobind/Pybind for Python interop instead of Cython.
- They questioned whether Cython becomes less effective at larger scales, despite appearing more approachable initially due to its Python-like syntax.
- Cythonâs approachability is questioned: The user indicated that, from casually browsing, Cython seems more approachable, especially for a language already looking like Python.
- They wonder if Cython starts breaking down at some scale.
MLOps @Chipro â· #events (1 messages):
bamiji: alright then, thanks for responding
Codeium (Windsurf) â· #announcements (1 messages):
Qwen3-Coder release, Windsurf Server Tags
- Qwen3-Coder Slides Into Windsurf: The Qwen3-Coder model is now live in Windsurf, costing 0.5 credits per prompt.
- More information on the release is available in the full announcement and on Reddit.
- Server Tags Return, Surfâs Up!: Windsurf server tags are back online.
- An image was attached, showing the new tags.