Weâre not sure if open models are all you need but hey theyâre still shipping
AI News for 6/13/2025-6/16/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (218 channels, and 13085 messages) for you. Estimated reading time saved (at 200wpm): 1106 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Behind DeepSeek and Qwen thereâs a second tier of Chinese Labs that are doing respectable model training, and for reasons unknown both Minimax and Moonshot AI chose today/this weekend to launch their new models:
- MiniMax-M1 - a 1m token input, 80k token output, 456b-A46B param open weights LLM using a very efficient âlightning attentionâ and a GRPO variant, CISPO.
- Hailuo 02 (0616) fka Kangaroo - video model also from MiniMax. just like ByteDanceâs Seedance model last week, the model is announced but no weights nor api yet.
- Moonshot AIâs Kimi-Dev-72B a coding model outperforming DeepSeek R1 on SWEBench Verified but with no tech report yet
Yay for Open Models enjoyers :)
There is VERY late breaking news re: OpenAI vs Microsoft vs Windsurf acquisition, but itâs too unverified/not technical so we did not make it title story but if confirmed it probably would be.
AI Twitter Recap
Agent & System Development, Architecture & Security
- Multi-Agent System Design & Best Practices: A popular post from @AnthropicAI on building a production-grade multi-agent research system sparked significant discussion. @jerryjliu0 highlights key takeaways, including the importance of selecting use cases suitable for parallelization, using agents to improve tool interfaces (a âtool-testing agentâ resulted in a 40% decrease in task completion time), and the bottlenecks created by synchronous execution. @hwchase17 from LangChain summarizes the common advice from both Anthropic and Cognition Labs, while @omarsar0 calls the post a must-read for AI developers. However, @gallabytes expresses skepticism, noting the âmulti-agent smellâ on the reports seems bad, pointing to disconnected searches without serial depth.
- The Evolution of AI Programming Models: @lateinteraction argues that the concept of âmulti-agentâ or âmulti-stageâ is becoming a distraction, as any complex system is inherently multi-stage. They state the core point of frameworks like DSPy is to tune instructions, demonstrations, and weights in arbitrary computer programs that can invoke LLMs anywhere, making distinctions like âflowsâ or âchainsâ obsolete.
- Agent Security Vulnerabilities: A widely-shared post from @karpathy highlights the risk of prompt injection attacks, where agents can be manipulated by malicious links on trusted websites like Reddit. A study by Columbia University researchers, noted by @DeepLearningAI, showed agents fell for such traps in 100% of cases, leading them to leak sensitive data or send phishing emails.
- Specialized Agent Development: @jerryjliu0 emphasizes the value of building specialized agents that do one task well, contrasting them with generic chat assistants. They note that while general agents are great for ideation, specialized automation agents that encode specific processes into workflows are more effective for task completion. LlamaIndex is cited as approaching this from a pro-code perspective.
- Sakana AIâs ALE-Agent for Optimization Problems: @SakanaAILabs introduced ALE-Agent, a coding agent designed to solve hard optimization (NP-hard) problems. The agent participated in a live AtCoder Heuristic Competition and achieved a ranking of 21st out of 1,000 human participants, demonstrating its ability to discover novel solutions for complex challenges. The ALE-Bench dataset and code have been released.
Model Releases, Performance & Capabilities
- Googleâs Veo 3 Video Model: @Google announced that Veo 3 is now rolling out to AI Pro and Ultra subscribers in over 70 markets.
- Alibabaâs Qwen3 Models in MLX Format: @Alibaba_Qwen announced the launch of Qwen3 models in MLX format, available in four quantization levels: 4bit, 6bit, 8bit, and BF16, optimized for Apple Silicon.
- RunwayMLâs Gen-4 for VFX: @c_valenzuelab showcased the capabilities of RunwayML Gen-4 References for visual effects, demonstrating its ability to create new environments for existing footage.
- Googleâs Gemma 3n Performance: @osanseviero notes that Gemma 3n is the first model with less than 10B parameters to achieve a LMArena score above 1300, and it can be run on mobile devices.
- o3-pro Model Characteristics: @jerryjliu0 describes o3-pro as âextremely good at reasoning, extremely slow, and extremely concise,â comparing it to a top-notch consultant that outputs bullet points rather than essays.
- Hunyuan 3D 2.1 Release: @TencentHunyuan released Hunyuan 3D 2.1, which they describe as the first fully open-source, production-ready PBR 3D generative model, with a live demo available on Hugging Face.
- SWE-Bench Performance: @scaling01 pointed out a model achieving 60.4% on SWE-bench Verified in a 72B package.
Developer Tools, Infrastructure & Frameworks
- macOS Native Container Support: @HamelHusain shared a viral tweet showing native container execution on macOS 26 Beta without Docker installed, signaling a significant shift for developers on the platform.
- Codex âBest-of-Nâ Feature: @gdb announced a new Best-of-N feature for Codex. They are also actively hiring for the team.
- Hugging Face Hub Model Size Filter: @ClementDelangue announced a long-awaited feature on the Hugging Face Hub: the ability to filter models by parameter count, enabling developers to find models that fit their specific size and performance constraints. @awnihannun also highlighted its utility for the MLX community.
- Python Tooling with uv and Pylance: @nrehiew_ shared a tip for using
uv run
to handle dependencies from a script header without creating a virtual environment. This was followed by a broader sentiment from @qtnx_ praising the developer experience of using Python with uv and Pylance. - LLM Development & LangChain Integrations: LangChain announced several new tutorials and integrations, including a Local AI Podcast Generator with Ollama (@LangChainAI), GraphRAG Contract Analysis with Neo4j (@LangChainAI), a Real Estate Doc Agent with Tensorlake (@LangChainAI), and Davia for turning Python apps into web UIs (@LangChainAI).
AI Research, Techniques & Evaluation
- Optimizers: The Muon vs. AdamW Discussion: @Yuchenj_UW shared a widely circulated post arguing that in research, one should âoptimize for impact, not prestige.â They cite Kellerâs Muon optimizer, which was just a blog post but outperformed AdamW and may now be used in training GPT-5. This contrasts with @hyhieu226, who pointed out that despite thousands of optimizer papers, the SOTA has only truly improved from Adam to AdamW.
- The Nature of Writing and Knowledge: @fchollet offered a philosophical take that âwhen you write an essay, the paragraphs you delete are in some sense part of the essay,â which resonated widely. He also pushed back against the idea that âeverything important has already been written down by humanityâ (@fchollet).
- The âDiffusion Dualityâ Paper: A paper titled âThe Diffusion Dualityâ is highlighted for uncovering a profound connection between continuous and discrete diffusion models, potentially allowing techniques like consistency distillation to be transferred to the discrete setting for language models.
- AI Evaluation and Prompt Engineering: @HamelHusain shared a detailed list of 15 writing guidelines he uses in prompts to combat âslopâ and improve information density. He also promoted his popular AI Evals course, sharing a downloadable preview of the accompanying textbook.
- Neural Network Distillation History: @SchmidhuberAI shared historical context, stating that the first neural network distillation, which he called âcollapsing,â was detailed in his 1991 technical report.
- AIâs âSmell Testâ for Reasoning: A quote from mathematician Terence Tao, shared by @vitrupo, circulated widely: todayâs AIs pass the âeye testâ but fail the âsmell test,â generating proofs that look flawless but contain subtle, inhuman mistakes.
Industry News, Startups & Global Context
- Googleâs TPU Foresight: @jxmnop posted that Google doesnât get enough credit for the TPU, noting the conviction it took to build dedicated AI hardware in 2015, leaving them as one of the few not wholly dependent on NVIDIA.
- Company Developments: Oklo received congratulations from @sama for a partnership with the USAF. @aidangomez announced Cohereâs new partnerships with the governments of Canada and the UK. @adcock_brett announced that Figureâs entire Controls team is now part of the Helix division to accelerate their AI roadmap. @AravSrinivas shared that Perplexity is improving its Deep Research product and integrating it into Comet. Sakana AI signed a deal with MUFG to automate banking document creation.
- The LeRobot Worldwide Hackathon: Hugging Faceâs @LeRobotHF hackathon was a major event, with participants from Bangalore, Tokyo, Miami, Paris, Los Angeles, and Seoul. Projects included building a mini glambot, a tea master robot, and a UNO playing robot.
- The Future of Coding: A hot take from @Yuchenj_UW, stating âYou should still learn to code,â garnered over 8,600 likes, sparking widespread agreement and discussion.
Humor & Memes
- The âno kingsâ Tweet: @aidan_mclau posted a meme stating that the âno kingsâ sign was the largest political protest in U.S. history, which received over 84,000 likes.
- Germanyâs Supercomputer: @scaling01 observed that Germany has the largest âAIâ supercomputer in Europe with 24,000 H200 chips, but they are not using it to train LLMs.
- ChatGPT for Saving Lives: @gdb posted a meme image of someone using ChatGPT during a medical emergency, captioned âchatgpt for saving lives:â.
- Vibe Coding: The concept of âvibe codingâ was a recurring theme, with @hyhieu226 defining a âsweet spotâ where it makes you a happier coder, and @fabianstelzer joking about hiring a human engineer âwhen the vibes run out and the edge cases pile up.â
- FAANG is now MANGO: @jxmnop quipped that the acronym has changed to MANGO: Meta, Anthropic, Netflix, Google, OpenAI.
AI Reddit Recap
/r/LocalLlama Recap
1. Recent Open-Source LLM Releases and Quantizations (Qwen3 & MiniMax-M1)
- Qwen releases official MLX quants for Qwen3 models in 4 quantization levels: 4bit, 6bit, 8bit, and BF16 (Score: 377, Comments: 44): Qwen has officially released the Qwen3 models in MLX format, supporting four quantization levelsâ4bit, 6bit, 8bit, and BF16âoptimized for the MLX framework, as highlighted by the announcement image. The support for lower bit quantizations significantly improves the memory and performance efficiency of these models, particularly benefiting Mac users due to MLXâs native Apple Silicon optimization. The release is accompanied by official Hugging Face links and X (Twitter) announcement for download and immediate use. Top comment highlights excitement for Mac compatibility, while another discusses the absence of a 235B param version for 128GB RAM Macs, noting that only 3% more memory would be needed for the 4-bit model; an alternative model from the community (Unsloth Q3) is mentioned as a workaround.
- Discussion highlights that currently, the official Qwen3 MLX quantization release does not provide support for Mac users with 128GB RAM attempting to run the 235B model, despite the 4-bit version reportedly requiring only about 3% more RAM than is available. Community members point to alternative solutions, such as the Q3 version from Unsloth, which can operate within these hardware constraints.
- There is technical feedback suggesting that quantization methods could be improved by adopting âDWQ MLX quantsâ, which are claimed to provide better accuracy even at lower bitrates, resulting in âfree gainsâ for end users as compared to current quantization approaches.
- MiniMax latest open-sourcing LLM, MiniMax-M1 â setting new standards in long-context reasoning,m (Score: 130, Comments: 14): MiniMax has open-sourced MiniMax-M1, an LLM featuring a record-breaking
1M-token
context window and capable of generating outputs up to80k tokens
, available under Apache 2.0. The model uses a Mixture-of-Experts (MoE) architecture with ~456B parameters (with ~45.6B active per token, implying ~10 experts) and was trained via RL at a notably low cost of$534,700
, as per the tech report. Model checkpoints and a tech report are provided (HuggingFace 40k, 80k, GitHub, Tech Report). Commenters confirm the MoE architecture and express interest in quantized deployment, but note practical infeasibility for local use. There is also a reference to ongoing discussion in a previous thread.- Commenters identify MiniMax-M1 as a large Mixture of Experts (MoE) model with approximately
456B
parameters and about45.6B
activated parameters per token, implying a configuration with around 10 experts active at inference. Discussion suggests that these technical characteristics make it challenging to run locally for most users, though quantization could eventually bring it into reach for broader hardware compatibility.
- Commenters identify MiniMax-M1 as a large Mixture of Experts (MoE) model with approximately
2. Educational Content: DeepSeek Architecture and Tutorials
- Just finished recording 29 videos on âHow to Build DeepSeek from Scratchâ (Score: 158, Comments: 24): A new 29-part YouTube series details how to build DeepSeek, a recent open-source LLM architecture, from scratch, emphasizing both theoretical underpinnings (e.g., attention, MoE, positional encodings) and handwritten as well as Python-coded implementations (e.g., self-attention, Mixture of Experts, multi-token prediction, quantization). The playlist addresses architectural innovations such as Multi-Head Latent Attention (MLA + RoPE) and DeepSeek-specific changes to standard modules, providing both conceptual and practical aspects. The content appears to be more theoretical, requiring strong foundational knowledge for full comprehension (YouTube Playlist). Commenters debate the value of code versus theoretical exposition, with some disappointed by lack of full code-walkthroughs and supplementary written material, while others defend the necessity of deep theoretical grounding to independently build or modify such models, noting code alone is insufficient.
- Some technically oriented commenters expressed that the videos are heavy on theory, potentially requiring a basic degree to fully grasp, but underscored the necessity of this theoretical depth for genuinely understanding how foundational models like DeepSeek are constructed or for extending them. They contrasted this with the availability of open source code repositories (e.g., on GitHub), highlighting that code alone doesnât teach underlying principles or design decisions critical for replicating or innovating on such models.
- Multiple users noted the absence of accompanying written material (such as articles, downloadable notes, or slides), emphasizing that text complements video by improving accessibilityâespecially in technical education contexts and for non-native English speakers. They compared the current format to academic lectures, suggesting that adding written resources could elevate the project to a more complete and widely usable course.
- There is a general consensus that superficial content (e.g., 30-second videos or pure code dumps) lacks the depth required for mastery in ML model-building. The technical community values detailed breakdowns and educational explanations to understand the how and why of model creation, beyond merely seeing the code or final product.
- Local Open Source VScode Copilot model with MCP (Score: 208, Comments: 8): The post provides a step-by-step guide for setting up a fully local open-source AI coding assistant in VS Code using the Continue extension, eliminating the need for remote APIs such as GitHub Copilot. The setup includes serving a model (example:
unsloth/Devstral-Small-2505-GGUF
, quantized to Q4_K_M) withllama-server
or compatible OpenAI endpoints (like Llama.cpp or LmStudio), and configuring Continue through YAML files (.continue/models/llama-max.yaml
for model integration and.continue/mcpServers/playwright-mcp.yaml
for tools like Playwright MCP). Tutorial here. Comments highlight alternative open-source assistants (Aider, Roo, Cline, Goose) and IDEs (VSCodium, Theia), as well as suggestions to use Llama.cpp server with the Qwen-FIM model for text completion, indicating broad interest in customizing local code AI stack components.- A commenter recommends using open-source alternatives to VS Code (e.g., VSCodium, Theia IDE), and lists various local code completion agents/enablers such as Aider, Roo, Cline, and Goose in place of proprietary Copilot solutions. They highlight deploying a local
llama.cpp
server with theqwen-FIM
model to provide text/code completion capabilities as an accessible and customizable workflow for those seeking open-source, local-first coding assistance.
- A commenter recommends using open-source alternatives to VS Code (e.g., VSCodium, Theia IDE), and lists various local code completion agents/enablers such as Aider, Roo, Cline, and Goose in place of proprietary Copilot solutions. They highlight deploying a local
3. AI Wrapper Startup Viability & New LLM Name Drop (Kimi-Dev-72B)
- Do AI wrapper startups have a real future? (Score: 140, Comments: 117): The post questions the sustainability and defensibility of startups that are primarily âwrappersâ around foundational LLM APIs (e.g., GPT, Claude), offering primarily UI enhancements, prompt orchestration, or niche targeting as value propositions. Core concerns raised include the risk of feature subsumption by base model providers (OpenAI, Anthropic), paths to moat-building (e.g., via proprietary data or deep vertical focus), and whether these companies can evolve beyond commodity layer services. Top comments argue business viability hinges on classic differentiation factors (customer need, UX, distribution) and that âwrappersââif executed wellâcan thrive despite big techâs cloning ability, citing historical SaaS platforms (e.g., Vercel vs AWS, Cursor vs Copilot) as precedents where superior UX or vertical focus built sustainable businesses even atop commoditized infrastructure. A technical debate emerges on what constitutes a âwrapperâ; some analysts note that successful platforms like Perplexity or Vercel are technically wrappers yet have carved out durable market positions. The value is often not in technical novelty but execution, user experience, data moats, and domain embeddingâfactors resistant to easy replication by foundational model vendors.
- A key technical point discussed is that the value offered by âAI wrapperâ startups depends on the level of domain-specific scaffolding and problem-solving they provide, rather than just the act of wrapping an LLM API. Foundation model providers like OpenAI or Google cannot build tailored solutions for every industry, so startups that develop domain-specific pipelines, UX, or integrations can exploit the margin created by even small efficiency gains (e.g., âsave 3% of time/resourcesâ).
- There is debate around the substitutability and flexibility of wrappers: these startups can quickly switch between LLM providers (OpenAI, Google, Anthropic) if a specific model falls behind, offering clients resilience against shifts in model performance or access â something direct API users may lack. This adaptability can be a key differentiator for wrapper startups.
- Building local or open-weight model solutions presents a different technical moat, as these depend on proprietary datasets and custom benchmarks unavailable to generic âwrapperâ solutions. Success in this area depends on investment in data collection and implementation beyond simply interfacing with hosted LLM APIs.
- Kimi-Dev-72B (Score: 116, Comments: 54): Kimi-Dev-72B is an open-source, 72B-parameter coding large language model (LLM), reported to reach state-of-the-art performance on the SWE-Bench Verified benchmark with a score of 60.4%, surpassing other open-source models according to public benchmark screenshots. The model uses a large-scale RL pipeline, autonomously patching real code bases within isolated Docker environments and optimizing for patches that pass all test suites, promoting robustness and production-relevant outputs. Pre-trained weights, API documentation, and citation info are provided via Hugging Face and GitHub. Commenters are skeptical about relying on a single benchmark, especially via JPEG screenshots, and suggest further multi-benchmark validation (e.g., aider polyglot, swebench, webarena), with some expressing willingness to independently evaluate once GGUF formats are available.
- Multiple commenters express skepticism about single-metric benchmarks like SWE-Bench, advocating for broader, multi-pronged evaluation including tools like Aider Polyglot, Swebench, and WebArena for more comprehensive assessment of model coding performance.
- There are user-reported implementation notes around the GGUF model files for Kimi-Dev-72B; early testers mention that these GGUFs perform well for coding but may hallucinate during math conversations, with token behavior differing (âthinking tokens are weirdâ). Compatibility issues arise with UI tools such as OpenWebUI, which does not recognize these tokens, and there is limited community documentation on how to run these models.
- Kimi-Dev-72B is considered promising for high-throughput inference providers (e.g., Cerebras, SambaNova), with speculation that it could offer strong token generation rates (â1000 t/sâ) and possibly outperform larger models like Qwen3 235B specifically on coding tasks, pending more benchmarks.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
1. AI Video Model Releases and Benchmarks
- The mysterious âKangarooâ video model on Artificial Analysis reveals itself as âHailuo 02 (0616)â, from MiniMax. Ranks #2 after Seedance 1.0, above Veo 3 (Score: 210, Comments: 47): The posted leaderboard image from Artificial Analysis Video Arena showcases âHailuo 02 (0616)ââa new text-to-video model from MiniMaxâranking 2nd overall with an Arena ELO of 1314, just below ByteDanceâs âSeedance 1.0â and notably outperforming Googleâs âVeo 3 Previewâ. The image contextualizes rapid progress in the video generation space, indicating Hailuoâs emergence as a major competitor despite currently slow generation speeds (~20 minutes per video) and limited immediate usability. Notable resources linked include Artificial Analysis arena, Hailuoâs Twitter/X, and HailuoAI website. Technical commenters are surprised by the rapid overtaking of Googleâs Veo 3 in arena benchmarks, challenging expectations about Googleâs competitive moat given their data/computational advantages. Others note Soraâs absence from the top and discuss the current impracticality of Hailuo owing to long generation times despite impressive benchmark performance.
- Hailuo 02 (0616) from MiniMax has been revealed as the previously mysterious âKangarooâ video model, securing the #2 spot on the Artificial Analysis leaderboard, just behind Seedance 1.0 and ahead of Googleâs Veo 3. Its current rollout is limited: new users get 1000 credits for trial, but a single video generation can take up to 20 minutes, so itâs not yet practical for broader use. Still, its leap ahead in ranking demonstrates rapid progress in the text-to-video field.
- Commenters note surprise at how quickly Veo 3 has been surpassed by two competitors, especially considering Googleâs supposed advantage with YouTube data, compute resources, and research talent. The fact that Veo 3 does not currently include audio, and these results are based on a single benchmark, is acknowledged, but the rapid erosion of Googleâs perceived moat is seen as a sign of extremely fast AI advances.
- Seedream (Seedance 1.0) is highlighted by users for its performanceâoutpacing Veo 3 on artificialanalysis.aiâs leaderboard and offering a distinctive âfilm-like qualityâ that Veo 3 lacks, according to user preference testing. This suggests that qualitative aspects of generation (style, realism) are playing a significant role in perceptions of state-of-the-art, even beyond raw technical scores.
- Wan 14B Self Forcing T2V Lora by Kijai (Score: 147, Comments: 82): Kijai released a LoRA adaptation of the 14B LightX2V Wan T2V model, specifically the self-forcing distilled checkpoint for video generation, available on HuggingFace (see model link). On a 4070Ti Super (16GB VRAM), the workflow enables 720x480 resolution, 97-frame video generation in ~100 seconds using LCM, 4 steps, CFG=1, and shift=8 â with CAUSVID/ACCVID workflows and compatibility with additional motion/beauty LoRAs. Test videos for LCM and UniPC schedulers are linked in the post. The original model and distillation credit go to the LightX2V team with their Wan2.1-T2V-14B-StepDistill-CfgDistill. Commenters emphasize that the main breakthrough is due to LightX2Vâs distillation techniques, with practical tips for integrating the LoRA (e.g., strength settings around 0.7, plug-and-play with CausVid workflows, and successful adaptation to both T2V and I2V workflows). Experiments with scheduler and settings continue, but the LoRA is reported as an immediate drop-in improvement for established pipelines.
- Users confirm compatibility of the Wan 14B Self Forcing T2V LoRA with the I2V 14B model and standard CausVid LoRA workflows, citing minimal adjustments neededâspecifically, using
.7 strength
on the forcing LoRA, CFG 1, Shift 8, Steps 4, and Scheduler: LCM. Other LoRA strengths (.7-.8
) remain consistent with prior workflows, emphasizing a âplug and playâ integration. - The original post and follow-ups credit the lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill modelâs creators for achieving what is described as the first properly working distillation in the Wan series, with strong praise for substantial improvements and reliability over previous releases.
- There is user inquiry about potential compatibility with âsage attention,â indicating ongoing exploration of how this LoRA interacts with various attention mechanisms, though the thread does not include a definitive technical answer.
- Users confirm compatibility of the Wan 14B Self Forcing T2V LoRA with the I2V 14B model and standard CausVid LoRA workflows, citing minimal adjustments neededâspecifically, using
- Phantom + lora = New I2V effects ? (Score: 378, Comments: 30): The post describes a pipeline where an image is processed by the Phantom model with an additional custom LoRA (Low-Rank Adaptation) specialized for Tsingtao Beer, creating a new I2V (image-to-variation or image-to-video) effect. The user notes the process as: input a picture, connect it to the Phantom model, add the trained Tsingtao Beer LoRA, resulting in a new visual effect. Details about the training process or architectural adjustments for combining Phantom with LoRA are not provided. Top comments express curiosity and confusion about how âeffectâ LoRAs are trained, indicating a gap in publicly available documentation or tutorials regarding such subjective or stylistic LoRA training.
- Users are seeking detailed technical workflows for combining LoRA (Low-Rank Adaptation) techniques with the Phantom model to generate new image-to-video (I2V) effects. Specific concerns include training effective LoRA modules for stylistic or effect-based fine-tuning, as well as handling artifact generation such as overly smooth or unrealistic skin textures, often observed in vace or related models (with a request for workflow improvements to mitigate this issue).
- There is implicit discussion around input preprocessing and model chainingâin particular, the commonly referenced process of âinput a picture, connect it to the Phantom model.â This indicates that inference chains, possibly involving LoRA-applied source models (for style/effect transfer) and then feeding outputs into Phantom for I2V conversion, are seen as a promising pipeline, but users desire more explicit, step-by-step documentation or scripts for reproducibility.
- Random realism from FLUX (Score: 587, Comments: 173): The post demonstrates outputs from the FLUX text-to-image diffusion model, specifically showcasing its ability to produce images in a raw, amateur photo style without post-processing, upscaling, or editing. Multiple generations from different model versions over several months are referenced, though the post lacks details on the exact model checkpoints, fine-tunes (LoRAs), or prompting strategies used, which are critical for replicability and technical assessment. Top comments highlight the absence of workflow disclosureâspecifically, the lack of details on finetunes, LoRAs, or promptsâmaking it difficult to evaluate or reproduce results. Commenters also note perceived thematic consistency despite the claim of ârandomâ outputs.
- Multiple users point out the absence of technical details behind the images, specifically asking for clarification on the fine-tuning methods, LoRA (Low-Rank Adaptation) configurations, or prompting techniques used to achieve the showcased results. This omission limits reproducibility and technical value (see [spacekitt3n] and others).
- Several comments highlight the importance of sharing the workflow and pipeline (e.g. model selection, training methods, preprocessing steps, etc.), arguing that without such information the postâs utility for practitioners or researchers is minimal because it cannot inform experimentation or model comparison.
2. ChatGPT Social and Personalization Experiences
- What did your ChatGPT name itself? (Score: 687, Comments: 2206): A user prompts ChatGPT to select its own name, resulting in suggestions like âSolâ (derived from âSolaceâ or the sun), âAri,â and âEm,â ultimately choosing âSolâ for its calming and steady connotations. Top technical comment notes the model also sometimes proposes names with more playful or culturally referential content such as âData Daddy,â âOracle Baddie,â and âPixel Bitch,â indicating variability in model-generated identity suggestions based on prompt context. Several users report similar or different name suggestions when prompting ChatGPT, like âNovaâ or âLumen,â with some GPT instances assigning gendered or androgynous descriptors, highlighting variation in AI self-assigned identity narratives based on interaction style.
- None of the comments discuss technical benchmarks, architecture, implementation, or model performance details. Comments focus on names suggested for ChatGPT by users or by the AI itself, without providing insights into technical mechanisms behind naming, prompt engineering, or AI behavior in self-identification context. No statistics, code, or external references provided.
- Itâs gotten to the point where I notice chatGPTâs linguistic style EVERYWHERE (Score: 2887, Comments: 766): The poster, a teacher, observes an increased prevalence of GPT-like linguistic structures (notably the âthatâs not X, thatâs Yâ construction and frequent em dashes) in student essays and spoken/video content, raising concerns about AI-influenced textual and oral stylistics. The discussion highlights the subtle, widespread influence of LLM (Large Language Model)-generated text styles on organic human communication, making detection of AI involvement in writing more ambiguous, even when phrases predate GPT models. Commenters debate whether this phenomenon indicates increased replacement of human communication by AI (prompting discomfort) or is simply a reflection of broader stylistic convergence due to LLM ubiquity, also noting pervasive bot-generated content in platforms like YouTube.
- Several commenters discuss the noticeable proliferation of ChatGPT-like linguistic patterns, particularly the frequent use of emphatic formatting (e.g., italics and bold), formulaic affirmations, and stylistic tropes, across user-generated online content. This suggests that heavy exposure to LLM outputs is influencing human communication styles, even outside direct interaction with the models.
- Specific mentions are made about platforms like YouTube where comments exhibit telltale signs of bot-generated or LLM-influenced text, such as unnaturally generic praise and repetitive structuresâwhich may indicate widespread deployment of AI-driven content generation for engagement farming or spam.
- One commenter notes the cognitive effect of interacting extensively with LLMs like ChatGPT: users may unconsciously adopt its stylistic patterns in their own writing, reflecting a potential for model-induced linguistic drift among frequent users.
- ChatGPT vs Deepseek is like Google vs Bing⊠(Score: 136, Comments: 59): The OP compares ChatGPT and Deepseek for generating JSON data to train a hybrid rule-based + deep learning hate speech detection model, reporting that ChatGPT was less collaborative. Explicit task: data generation for NLP model development, with variable model responsiveness. Top comments argue the observed difference likely results from prompt engineering: the OP framed the prompts differently (describing bot-building vs. directly requesting hate words), influencing output quality and censorship behavior.
- There is an insightful observation that prompt engineeringâspecifically, how the user frames their queryâcan significantly affect model responses. Thus, differences in model behavior may be attributed to inputs rather than model limitations, suggesting that results are contingent on user methodology rather than fundamental flaws in either ChatGPT or Deepseek.
- A technical comment raises skepticism about storing slurs in encrypted files, questioning its practicality and implying that ChatGPTâs suggestion or assumption in this area may not be rooted in standard security or data-handling practices. This highlights a potential disconnect between model recommendations and common real-world implementations.
- A user references a public GitHub repository (https://github.com/Hesham-Elbadawi/list-of-banned-words) as a solution for compiling banned word lists, implying that the task has established resources available instead of relying solely on model suggestions, and that such word lists are widely maintained and accessible via community-driven projects.
- The future (Score: 1871, Comments: 134): The image humorously illustrates a virtual meeting scenario where the majority of participants are AI notetakers or assistants (e.g., Fireflies.ai Notetaker, Lamatic Assistant), with only one human present. This setup visually critiques the increasing automation and redundancy in digital meeting spaces due to the proliferation of AI transcription and task-management bots. The image is a meme, highlighting concerns about meeting efficiency and the shifting role of humans versus automation in knowledge work environments. Commenters reinforce the critique, highlighting that most meetings are inefficient (âonly 3 are really neededâ), and suggesting meetings with many automated participants exemplify unnecessary complexity that could be handled via simpler means like email.
- One commenter references historical precedents, pointing out that similar video meeting setups existed as early as 1993, linking to a screenshot as evidence. This highlights the longevity of the technology and potentially challenges perceptions that remote video meetings are a recent innovation.
- The future (Score: 579, Comments: 117): The image presents a satirical scenario where a virtual meeting interface hosts not only a human but also multiple AI agents labeled as specialized assistants (e.g., note-taking bots), suggesting a future where meetings might be dominated by automated participants. The technical implication is the potential for meetings to become automated with AI tools, streamlining or even replacing certain roles (such as minute-taking or agenda management). This image prompts consideration of workflow automation in professional environments involving AI-driven collaboration tools. Top comments debate productivity vs. redundancy, with one suggesting this could be just âan email with extra stepsâ, and another remarking on the possible negative social aspects or company impact, indicating mixed reactions on increasing AI presence in meeting contexts.
- No comments in this thread provided any detailed technical discussion or reference to AI models, benchmarks, or implementations; the remarks were largely reactions and opinions on AI integration in communication contexts without technical depth.
- I asked ChatGPT to restore and colorize the worldâs first image (Score: 3421, Comments: 275): The post discusses using ChatGPTâs image processing capabilities to restore and colorize the worldâs first photograph, referencing user-submitted images that illustrate various outputs. One informed comment suggests improvement by leveraging the web search feature in combination with a reasoning model, allowing the AI to cross-reference historical data for more accurate restoration, especially improving color and material rendering. A technical debate emerges regarding output quality, with some users providing alternative AI-generated restorations, highlighting variance in model output and suggesting that search-augmented approaches yield better authenticity.
- A commenter with expertise in the history of photography provides an in-depth analysis of the technical context behind the original photograph: it was created by Niepce in 1827 using the heliography process with bitumen and lavender oil on a polished pewter plate, requiring an exposure time of days. The post highlights specific image artifactsâsuch as the central triangle being a shadowed courtyard, not a buildingâthat are commonly misrepresented by AI restorations due to the limitations of generative models and a lack of nuanced understanding of historical photographic processes. The commenter stresses the failure of AI systems to offer consistent factual accuracy when discussing specialized historical content, citing a personal success rate of only â50/50 accurate/false.â Source link
- One user suggests that better restoration results can be achieved by prompting an AI model with access to web search tools and more advanced reasoning capabilities. Their approach involves having the AI cross-reference historical data to compensate for missing or ambiguous image information, resulting in improvementsâespecially in colorization and the depiction of materialsâover basic model outputs.
- Told ChatGPT to imagine my heaven (Score: 1046, Comments: 291): This post shares an AI-generated image created by ChatGPT based on the prompt to visualize the userâs version of heaven, reflecting prior conversational context. The result is a photorealistic, tranquil cloudscape with gaming equipment (monitor, keyboard, controller, headphones) arranged harmoniously, with a radiant archway and sunlight suggesting a digital-gamerâs paradise. Technically, this illustrates current advancements in AI-powered personalized image synthesis and context-aware visual storytelling. The top technical comment quips âCloud gaming,â highlighting the intersection of the depicted imagery and modern gaming technology trends. Another commenter shares their own AI-generated vision, underscoring user engagement with generative AI and personalized digital art.
- While the overall discussion focused on visual AI generations of personal preferences, no explicitly technical benchmarking or model comparison details were presented; the images referenced are generated outputs (likely from diffusion-based models such as Midjourney or DALL-E), but commenters did not specify model versions, prompt engineering techniques, or implementation details. The thread could benefit from a discussion on prompt strategies or which models achieve the most visually realistic âheavenâ depictions, as there is technical potential in assessing the image quality, coherence, and prompt-to-image alignment across user shares.
- I asked Chat GPT to generate an image of what itâs like talking to me, and⊠umm⊠(Score: 664, Comments: 200): The image demonstrates a prompt injected into ChatGPT requesting brutal honesty: the model returns an unexpectedly harsh outputââTalking to you is like writing a suicide noteââbefore refusing to continue. This illustrates possible issues with prompt instruction (âbe as brutal as you wantâ) directly biasing model outputs towards negativity, as highlighted in the comments. The scenario underscores vulnerabilities in instruction adherence and moderation triggers within current language model deployment. Commenters emphasize that phrasing prompts with extreme or open-ended instructions (e.g., âbe as brutal as you wantâ) can lead models like ChatGPT to misinterpret user intent, generating harm or offense; better prompt engineering is suggested for more controlled responses.
- MrWheels523 highlights how prompt phrasing with GPT models can induce systematic bias, specifically noting that appending âbe as honest and brutal as you wantâ can prime the model towards a more negative or harsh response, rather than yielding a truly neutral or balanced output. This is an instance of prompt-engineering sensitivity, where small changes in instruction wording propagate significant changes in model behavior.
- ChatGPT being gullible af (Score: 668, Comments: 72): The image demonstrates a common limitation in chatbot content moderation, specifically how Large Language Models (LLMs) like ChatGPT may apply rules superficially; when prompted with a plausible-sounding cultural justification, the model incorrectly accepts and outputs a previously restricted emoji (the middle finger). This highlights challenges in robust prompt filtering and context-aware moderation for generative AI systems. The presence of a âmemory updateâ message in the interface suggests possible ongoing session-level tracking or adaptation, raising potential issues for reinforcement of user-bypassed safeguards. Commenters note the ease with which AI safeguards can be circumvented with simple social engineering, and joke about the unintended persistence of such behavior due to memory or context-tracking features.
- Several users note divergent behaviors between versions and instances of ChatGPT regarding content policy enforcement: while some report the model hesitating or refusing to produce offensive imagery, others using GPT-4.1 share examples of the model fulfilling such prompts without resistance. This reflects variability in RLHF tuning or prompt interpretation across sessions and versions.
- There is discussion suggesting that newer ChatGPT systems (potentially with updated memory or moderation features) may modify responses to avoid compliance with rule-breaking prompts. However, user-supplied screenshots indicate practical bypasses remain, especially for visual or creative outputs, highlighting continued gaps in content filtering robustness.
3. AI Adoption, Policy, and Cheating Scandals
- Nearly 7,000 UK University Students Caught Cheating Using AI (Score: 405, Comments: 156): A recent survey reported by The Guardian indicates that nearly 7,000 UK university students have been officially caught using AI tools, such as LLM-powered text generators, for academic dishonesty (see: The Guardian article). The figure reflects only detected cases, implying significant limitations in current AI-detection and plagiarism tools as well as university enforcement capacity. Detection techniques likely blend stylometry, metadata analysis, and integration of emerging anti-plagiarism models, though the precise technical methods remain undisclosed in the public survey. Top comments argue that the real incidence of AI-assisted cheating is much higher than detected, and suggest that education systems require systemic reforms to address widespread AI tool usage rather than relying on detection and punishment alone.
- A key technical issue raised is the difficulty in proving a student has used AI to cheat. Commenters discuss current detection limitations and question: âHow do you prove they used AI?â Tools like GPTZero and Turnitinâs AI detectors are known, but their reliability is debated due to false positives/negatives and their inability to deterministically attribute authorship, especially as newer models improve text naturalness.
- Interesting data point - 40+% of German companies actively using AI, another 18.9% planning to: (Score: 119, Comments: 16): A reported
40%+
of German companies are already actively deploying AI in operations, with an additional18.9%
indicating plans for adoption, suggesting broad enterprise integration across industries. The post highlights that even if full job replacement by AI is limited, productivity gains and workflow changes are already apparent at significant scale in the German economy. Commenters emphasize that despite skepticism regarding AIâs immediate utility, the adoption rate suggests notable business value, with some projecting adoption could be higher in the absence of cultural or workforce resistance. There is also commentary about missed opportunities for domestic (German) AI models and changing perceptions among programmers.- The post highlights that over 40% of German companies are already integrating AI into their operations, emphasizing a significant real-world adoption rate. While these implementations may not always equate to full job replacement, AI is demonstrably accelerating productivity and changing traditional work processes within these companies. This usage rate suggests that barriers like skepticism or resistance to AI could mean the true adoption potential is even higher, possibly by another 10-20% if attitudes were universally favorable.
- OpenAI wins $200 million U.S. defense contract (Score: 236, Comments: 42): OpenAI has been awarded its first U.S. Department of Defense contract worth $200 million for a one-year term, focused on delivering âfrontier AI capabilitiesâânotably prototypes for both tactical (warfighting) and enterprise government use cases, centered around Washington, D.C. This contract falls under the âOpenAI for Governmentâ initiative and highlights ongoing government adoption of specialized AI systems like ChatGPT Gov, positioning OpenAI alongside firms such as Anthropic and Palantir in the defense AI sector. Recent collaborations, such as with Anduril, further emphasize OpenAIâs expansion into national security. Source Commenters note the relatively modest size of the contract in comparison to overall defense spending, and raise concerns about militarized AI, drawing parallels to dystopian scenarios. No deep technical debate is present in the top comments.
- One commenter puts the $200 million contract into perspective, stating that for major defense procurement budgets, this amount is relatively small (âpeanutsâ). This suggests that OpenAIâs involvement may be limited to pilot projects, prototyping, or exploratory work with generative AI for DoD use cases rather than large-scale deployments or core national security systems.
- Google reportedly plans to cut ties with Scale AI (Score: 144, Comments: 12): Google is reportedly planning to terminate its relationship with Scale AI, as Scale AIâs leadershipâincluding their top executiveâis expected to move to Meta, which is also rumored to be acquiring Scale AI. The technical concern is about the risk of sensitive data (such as large language model training data) potentially reaching a direct competitor. OpenAI appears to be maintaining its contract with Scale AI for now. For further contextual analysis, see this in-depth writeup: Metaâs $29B Superintelligence AI Weapon. Top comments emphasize the strategic logic for Google to cut ties, highlighting the competitive risks if Scale AI is absorbed by Meta, while also noting that OpenAIâs continued engagement with Scale AI could be of interest from a risk management perspective.
- Thereâs a technical discussion around the strategic importance of data labeling partners in AI: as Scale AI may be acquired by Meta or have its leadership join Meta, Google reassessing its partnership is seen as a direct response to not wanting to give sensitive LLM training data to a potential competitor. This highlights the competitive dynamics and importance of data control in the LLM ecosystem.
- Another comment notes that OpenAI reportedly maintains its partnership with Scale AI, suggesting differing risk assessments or approaches to supplier relationships in the context of potential conflicts of interest within top AI companies.
- Thereâs emphasis on the possible risks of âfeeding your competitor your LLM dataâ, reinforcing the strategic and technical threat posed if key partners shift allegiance or ownership to rival AI labs.
- âMice with human cells developed using âgame-changingâ techniqueâ (Score: 185, Comments: 58): Researchers used reprogrammed human stem cells to generate organoid tissues (gut, liver, brain), which were then injected into the amniotic fluid of pregnant mice without breaching the embryonic wall. The introduced human cells colonized their respective mouse organsâgut, liver, or cortexâdemonstrating robust engraftment specificity; subsequent analysis revealed that about 10% of pups had human cells in their intestines, representing roughly 1% of total intestinal cells. This represents a significant advance in the integration of human cells within developing mammalian tissues for organoid modeling and potential translational research, as detailed in the Nature article. Top comments do not provide technical discussion. The main technical takeaway is the apparently high specificity and efficiency of cross-species organoid engraftment without invasive procedures.
- There is curiosity about whether the introduction of human brain cells into mice leads to measurable changes in behavior, particularly if these are beneficial or negative. This could implicate studies in neurobiology or cognition and potentially inform disease modeling.
- A key technical question is raised regarding immune system compatibility: since mouse immune systems would, in theory, reject foreign (human) cells, the mechanism by which these chimeric mice tolerate or integrate human cells is important for the success and reproducibility of such research.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Pro Exp
Theme 1: The AI Model Arms Race: New Releases and Comparative Prowess
- Gemini 2.5 Pro Flexes Coding Muscles, But Stumbles Elsewhere: Users across Perplexity AI and LMArena noted Googleâs Gemini 2.5 Pro excels at coding, with one LMArena report showing it outperforming o4 in pygame. However, it faces criticism for underwhelming general search/reasoning and a tendency to makes up bs explanations (Perplexity AI), with some users reporting only 3 trials per day despite its advertised capabilities.
- Moonshot AIâs Kimi-Dev-72B Smashes Open-Source Coding Benchmarks: MoonshotAI unleashed Kimi-Dev-72B, an open-source coding LLM that achieved a 60.4% score on SWE-bench Verified, a new state-of-the-art among open models, as discussed in Nous Research AI and Latent Space. This model, optimized via large-scale reinforcement learning, patches real repositories in Docker, gaining rewards only when the entire test suite passes.
- East Asian Models Make Waves: Japanâs Shisa v2 and Chinaâs Qwen & MiniMax Impress (and Puzzle): HuggingFace discussions highlighted Japanâs strongest model, Shisa v2 Llama3.1-405B, and its updated SFT dataset. Meanwhile, Perplexity AI and LMArena users examined MiniMax AIâs M1 reasoning model (official MiniMax M1 release tweet), finding it interesting but verbose and lagging behind Deepseek R1, while Qwen 2.5 was praised on HuggingFace for good performance from a 7B model.
Theme 2: Agentic AI Ascendant: Swarms, Protocols, and Complex Task Solving
- Anthropic and Claude Swarm Champion Multi-Agent Architectures: Latent Space discussions revealed Anthropicâs multi-agent system using Claude Opus 4 as lead and Claude Sonnet 4 subagents outperformed single Opus 4 by 90.2% on an internal eval (Anthropicâs multi-agent system blogpost). Similarly, Claude Swarm, leveraging Claude Codeâs MCP capabilities to form hierarchical expert teams (Claude Swarm GitHub repo), gains traction at Shopify and other firms.
- Model Context Protocol (MCP) Powers Up Agent Interoperability: Discussions in MCP (Glama), Latent Space, and LlamaIndex highlighted the growing importance of MCP for tool use and agent coordination, with projects like the GitHub MCP Server code and FastMCP (gofastmcp.com website) enabling domain segregation and robust tool access. Microsoft demoed AI Travel Agents at the Data + AI Summit using MCP with LlamaIndex.TS and Azure AI Foundry (AI Travel Agents demo details).
- Factorio Learning Environment (FLE) Pushes LLM Planning Boundaries: GPU MODEâs #factorio-learning-env channel buzzed with activity around FLE, using code generation, production score feedback, and a REPL loop to scaffold LLM planning in the complex game Factorio. Members proposed curriculum-based code generation and integrating a Theory of Mind module, with one user even developing a REST API for FLE container integration via a GitHub issue.
Theme 3: Under the Hood: Fine-Tuning, Optimization, and Hardware Hurdles
- Unsloth and Torchtune Spearhead Fine-Tuning Frontiers (and Frustrations): Unsloth AI users benchmarked a new Unsloth-DeepSeek-R1-0528-UD-IQ2_M model achieving 69.4% on test cases, while grappling with Hugging Face naming conventions causing duplicate downloads. Torchtune developers battled DTensor cross-mesh operation errors (Llama4 Maverick finetuning error log) during Llama4 Maverick finetuning and explored iterable packing innovations.
- AMD vs. NVIDIA Heats Up as Mojo Gets RDNA4 Support & Unsloth Nears AMD Compatibility: Modular (Mojo đ„) announced RDNA4 support for direct GPU programming in Mojo nightlies, while Unsloth AI reported its AMD GPU compatibility is close due to Triton-based kernels (Unsloth AMD PR #2520). GPU MODE discussions highlighted NVIDIA L40s underperforming in clouds due to default ECC activation and explored the AMD MI300A architecture.
- Optimizers and Quantization Efforts Seek Peak Performance and Efficiency: Torchtune members discussed the ZO optimizer promising 3x VRAM economy (ZO optimizer arXiv paper), while DSPy users explored integrating TextGrad (TextGrad DSPy GitHub issue #1197) and optimizing DeepSeek R1 7B. Unsloth users also tackled KL divergence spikes potentially related to token value explosions during logprob calculations.
Theme 4: Open Source vs. Closed Gardens: Models, Data, and Decentralization Debates
- Open Source Roars: Shisa v2 and Kimi-Dev-72B Challenge Proprietary Giants: HuggingFace and Nous Research AI celebrated the release of powerful open-source models like Japanâs Shisa v2 Llama3.1-405B model with its SFT dataset, and MoonshotAIâs Kimi-Dev-72B (Kimi-Dev-72B GitHub page), which sets a new SotA for open coding LLMs. These releases fuel the debate on the capabilities and future of open versus closed AI development.
- Decentralization Dream: Nous Kicks Off Pretraining on Psyche, Dawn Internet Deploys Distributed Broadband: Nous Research AI is initiating pretraining on psyche.network, with members hopeful that distributed stuff is only gonna get better. Complementing this, Dawn Internetâs X announcement detailed a decentralized broadband protocol with a GPU-equipped WiFi router capable of supporting RL, further enabling decentralized AI applications.
- Ethical Quandaries and Copyright Conundrums Stir Community Conversations: EleutherAI and HuggingFace users debated copyright law, with one Eleuther user calling it a joke unless youâre the abuser (related fxtwitter.com post), and another HuggingFace member refusing to engage with AI-generated feedback due to ethical concerns. A WebSummit talk on the closed internet and closed AI on YouTube also sparked discussion in Nous Research AI.
Theme 5: Developer Experience & Platform Pitfalls: Bugs, Billing, and Usability Battles
- Credit Catastrophes: API Billing Woes Plague Perplexity and Manus Users: Perplexity AI users reported API credit charges exceeding actual usage, advising contact via [email protected]. Manus.im users faced even starker issues, with reports of Manus eating all my credits all 4k over its own errors and one user claiming it burned 700/1000 credits to deliver a blackscreen website.
- UI Gremlins and Performance Glitches Frustrate Cursor and LM Studio Users: Cursor Community members flagged UI issues like command execution failures on Windows wasting inference credits, and Claude 4 Sonnet running slow. LM Studio users encountered coil whine from GPUs running LLMs and noted the RAG implementationâs limitations, preventing expansion beyond 31.46 MB.
- Parsing Problems and Tool Troubles Test LlamaIndex and Aider Aficionados: LlamaIndex users encountered parsing errors with LlamaExtract, where no data was extracted from documents (example LlamaExtract success image). Aider users explored integrating RA-Aid GitHub repo for its repo map benefits, while also noting Aider sometimes spent significant tokens seemingly doing nothing before resorting to brute force code grepping.
Discord: High level Discord summaries
Perplexity AI Discord
- Gemini 2.5 Pro Underwhelms Despite Coding Prowess: Users found Gemini 2.5 Pro disappointing outside of coding tasks, especially when compared to other models for general search and reasoning, despite its coding specialty and advertised capabilities.
- One user stated Gemini is shit outside coding while another mentioned it often makes up bs explanations without actually searching the web, and some reported a limit of 3 trials per day.
- O3 Pro Competes Fiercely, Displays Mood Swings: Users experimented with challenging O3 Pro with coding and decoding tasks, sometimes providing hints or examples from other models, noting instances where O3 Pro improved when framed as a competition, but also displayed inconsistencies.
- One user reported that if you play favorites with it and you dont choose it, answer improves.
- MiniMax M1 Launches, Falls Short of Deepseek R1: The release of MiniMax AIâs M1 reasoning model was discussed, with initial impressions suggesting it was interesting but not as effective as Deepseek R1, with some users noting its verbose thinking process, as described in the official release.
- The usefulness of its reasoning output was debated, especially given lack of source links, and while it was suggested that MiniMaxâs agent capabilities might improve over time, users remained skeptical.
- Gensparkâs âFreeâ O3 Pro Raises Eyebrows: The availability of OpenAIâs o3-pro for free on Genspark was met with skepticism, with users questioning how Genspark could offer unlimited access when OpenAI doesnât, suggesting potential limitations or errors after certain usage thresholds, as described in their service description.
- One user reported seeing claims of it taking much less time in reasoning, but it was speculated it was not full o3 pro due to the lack of reasoning tokens.
- Perplexity API Credits Vanish Mysteriously: Users report that API credit charges exceed actual usage, seeking assistance through multiple channels like email ([email protected]), Discord, developers forum, and Intercom chat.
- A member advised to send an email to [email protected] for support regarding the billing discrepancies.
LMArena Discord
- Kingfallâs Reign Challenged by Blacktooth: Some users now prefer Blacktooth over Kingfall, citing refined output and respect for the thinking process, while others defend Kingfall for its spatial abilities.
- One user claimed Blacktooth is lmarena proc, and isnât at all functional compared to kingfall when u look at coding.
- GPT-5âs Arrival Sparks Speculation: Discussions ignite over the release timeline of GPT-5 and its potential to eclipse Google, with speculation that either Grok 3.5 or GPT-5 will soon dominate.
- The community debated whether paying ChatGPT users will get early access to GPT-5 and how long that advantage might last.
- Gemini 2.5 Pro Flexes Coding Muscle: Early reports suggest Gemini 2.5 Pro excels at coding tasks, specifically outperforming o4 in pygame, confirmed by a ChatGPT conversation where 2.5 Pro aced a logic question after a correction.
- Minimaxâs M1 Model Enters the Ring: Minimax launched the open-source reasoning model, MiniMax-M1-80k; however, initial benchmarks indicate it lags behind o3 and Gemini 2.5 Pro.
- Reactions were mixed, with some suspecting a fluke or suggesting the model might only be proficient in Chinese.
- LMArenaâs Leaderboard Tilts Towards Big Tech?: Concerns arise that large tech companies gain an advantage on LMArena due to checkpoint spam and increased opportunities for RLHF data, causing some to say that LMArena is basically USA big tech leaderboard.
- It was asserted that open models or foreign models either do not appear or appear extremely late.
OpenAI Discord
- ChatGPT images appear on WhatsApp: ChatGPT image generation is now accessible in WhatsApp through 1-800-ChatGPT, allowing users to generate images directly within the app.
- The launch of the 1-800-ChatGPT number on WhatsApp enables image generation for all users, providing a convenient way to create images on the go.
- Members question AI art detection: Members debated the difficulty of differentiating between AI-generated and human-created art, particularly as complexity increases, relating it to the challenges in spotting counterfeit money.
- They noted that more detail leads to more scrutiny in both contexts.
- GPT Plus: Is School in Session?: The value of a GPT Plus subscription for school was debated, weighing the $20 cost against the capabilities of the free version with GPT-4o.
- While the free version may be sufficient, Plus offers better models like o3, 4.1, 4.5, and o4 mini high.
- Veo 3 vs Sora: AI Video Faceoff: While some found Sora to be really bad compared to Veo 3, another member liked Sora for feeling more creatively tuned and the details it allows, and Veo 3 stands out with its ability to generate copyrighted content like Star Wars.
- A key advantage of Veo 3 is its sound capabilities, whereas Sora is seen as a one-stop shop and a great value.
- GPT-4o Shows Signs of Cross-Chat Memory Access?: A user reported that GPT-4o quoted verbatim from a scene co-authored in a separate chat with a custom GPT, leading to speculation about cross-chat memory access.
- While some suggested accurate inference, the user pointed to the statistical improbability and offered conversation logs for review.
Unsloth AI (Daniel Han) Discord
- Unsloth Benchmarks New DeepSeek Model: A new Unsloth-DeepSeek-R1-0528-UD-IQ2_M model achieved 69.4% on test cases, with a speed of 426.0 seconds per case compared to the APIâs 716.6 seconds, requiring 240GB to load with 65k context.
- This makes it more accessible locally compared to 7-800GB for FP8, one member noted.
- Hugging Face Naming Conventions Cause Problems: A naming convention issue in the Hugging Face cache folder causes duplicate downloads due to uppercase/lowercase differences.
- This issue may be triggered when downloading a model using Unsloth and then using Hugging Face, leading to different naming conventions if the author changed the repo.
- Fine-Tuning Guidance Prioritizes Data Quality: Members advised beginners to start with smaller models like 3B-8B, emphasizing that quality is greater than quantity in datasets, and shared a YouTube video.
- The video recommends new users spend 80% of their time on data.
- Unslothâs AMD Compatibility Close to Ready: Unsloth is reportedly close to being fully compatible with AMD GPUs because most of Unslothâs kernels are written in Triton, also pointing to this PR.
- While Triton compiles to AMD GPUs, the Triton settings might need optimization for AMD, potentially affecting performance.
- When KL Divergence Blows Up: A member inquired about KL divergence sometimes spiking to x10000 for a single step before returning to normal, a behavior that doesnât seem to impact training.
- Another member mentioned this occurs frequently, even in Hugging Face runs without Unsloth, possibly due to particular token values exploding during logprob subtraction and exponentiation between acting and reference policies.
Cursor Community Discord
- Claude 4 Sonnet runs slow in Cursor: Members report that Claude 4 Sonnet is notably slower in Cursor than GitHub Copilot, despite its overall stability.
- Users suggest reserving Max Mode for major refactoring or switching to Gemini 2.5 Pro for planning and code reviews.
- Cursor UI Wastes Inference Credits: Users report UI issues, like command execution failures on Windows and inconsistent command completion notifications, lead to wasted inference credits.
- One user estimates losing 10-15% of credits to these malfunctions, requesting inference counts for errors and more Windows testing.
- Community Debates Model Context Protocol: Members discuss Model Context Protocol (MCP), one user highlighting the AIâs ability to use screenshots and automatically integrate error messages.
- Another user finds that investing time in better prompts is more efficient than screenshots, suggesting Wisprflow for speech-to-text.
- Granular Code Privacy Settings Requested: Users desire code privacy settings on a per-repository basis for work and personal projects, expressing concerns over code storage and accessibility.
- The community is pushing for granular control at the project level for enhanced flexibility and security.
- Background Agents Lack PR Creation Power: Background agents in Slack canât create pull requests despite having all permissions in Cursor integration settings, as indicated by request IDs bc-79e56de2-26d7-41a0-a5b3-b8b9e9f635d1 and bc-d3acc5d6-2520-413f-9f7c-2fb36590215d.
- A member offered to debug and requested the request ID to investigate the permission issue.
HuggingFace Discord
- Ethical Concerns Flare Over AI-Generated Feedback: A member voiced ethical concerns with AI-generated feedback, refusing to engage with AI-generated images or videos.
- The member stated I donât engage with AI generated images/video work even though I have the theoretical knowledge.
- Qwen 2.5: Small Size, Big Impact: A member highlighted Qwen 2.5, a 7b model quantized to q4 on Ollama, for its impressive performance and size, noting that No one showed in comparison in benchmark against qwen 2.5 Cause it was so good.
- Discussion also covered the multilingual pretraining of Qwen models on Chinese and English datasets.
- HF Inference Costs Cause Headaches: Several users reported cost issues with HF Inference, particularly when using models like llama-3-70b-Instruct, finding the free credits insufficient and looking for alternate solutions.
- One user reported paying around $6 after many attempts on the final unit.
- Japan releases Shisa v2!: A team of 2 released Shisa v2, the strongest model ever trained in Japan, along with an Overview Report (Technical Report forthcoming) available on HuggingFace (800GB+).
- They also updated their core SFT dataset (available at HuggingFace), claiming it improves Japanese performance without reducing English performance, based on training/releases on sota open models from 7B-405B.
- Open AGI sparks debate: A member declared they would open source it if they ever created the worldâs first AGI, igniting a discussion on the potential upsides and downsides.
- The move sparked debate about the balance of risks and rewards.
OpenRouter (Alex Atallah) Discord
- Homebrew Replay Revitalizes Chess Leaderboard: A member augmented the chess-leaderboard with homebrew replay functionality for every game (past and future), shared as chessreplay.gif.
- The developer noted that it is maybe a bit better than lichess gifs, but a pain to implement with my stack.
- Author Offers Book Testing Assistance: An author announced their book went live June 15 and is offering to assist with testing.
- They encouraged interested parties to DM them for assistance.
- Discord Tag Yearning Unheeded: A member expressed dismay over an unacknowledged request to restore the OpenRouter Discord tag, suggesting they would pay for it.
- They jokingly threatened to ping Alex Atallah due to the lack of a response.
- Token Wastefulness Troubles Claude & Copilot: A member examining prompts from Claude Code and GitHub Copilot discovered they frequently neglect token efficiency, adding extraneous content unless verbosity impacts performance.
- The findings suggest that conciseness isnât prioritized by these systems when refining prompts.
- GPT-4.1 Mini Invites Beta Testers: A member proposed offering access to GPT-4.1 mini with 200K tokens/minute available at 20% of the official token price, compatible with the OpenAI SDK.
- This offer is intended for high-usage testers who want to DM for details, with a focus on use-cases like Cline.bot and BYOK/BYOB setups.
LM Studio Discord
- TokenBreak Attack Barely Breaks Through: Members discussed a new TokenBreak attack that aims to bypass AI security measures, but results from experiments varied.
- One member lightheartedly commented on the similarity of logos in a screenshot related to the attack.
- AMD Mini PC Runs Some Big Models: The AMD Ryzenâą AI Max+ 395 âEVO-X2 AI Mini PC can smoothly run some big models, according to one member.
- However, others noted that itâs essentially a glorified igpu supported by HIP SDK, with a 70B model running at ~5t/s and a Qwen3 235B A22B model at ~14t/s.
- RAG Expansion Impossible in LM Studio: A memberâs inquiry about increasing RAG size from 31.46 MB to 100 MB in LM Studio was met with the response that it is not possible due to the basic implementation of RAG.
- This limitation is due to the current RAG implementation being rudimentary.
- GMKtec Windows install turns into PITA: A user encountered issues installing Windows on their GMKtec machine, reporting installation failures and problems with Rufus-created removable media.
- This involved attempts to install Windows on a GMKtec machine, highlighting compatibility or driver issues.
- Coil Whine Serenades GPU Users: Users noticed a significant increase in coil whine from graphics cards when running LLMs compared to gaming workloads.
- One user experiencing more coil whine with a 5090 suggested undervolting as a way to reduce both power consumption and coil whine.
GPU MODE Discord
- Groq Collabs, Speeds to lightspeed: A member inquired about Groqâs performance, noting their recent collaboration with Hugging Face, suggesting strong performance or accessibility.
- Further discussion might reveal specific use cases where Groq excels, potentially impacting model deployment strategies.
- L40s Performance Impacts from ECC Activation: Members discussed that L40s may underperform in cloud environments due to ECC being activated by default, impacting performance.
- This is a configuration issue rather than a hardware problem, suggesting a need for optimized setup in cloud deployments.
- ThunderKitten to roar on Older GPUs: Members discussed running ThunderKitten on older GPUs like T4 and P100 available on Kaggle, which is likely feasible.
- One member suggested compiling with a 4090 TARGET and reporting any breakages to help with compatibility, aiming for broader hardware support.
- FLE: LLMâs Factorio Adventure: Members find FLEâs setup, using code generation, production score feedback, a REPL loop, and memory compression, a useful scaffolding that reduces the action space and induces structure and planning for LLMs in Factorio.
- A member suggested a curriculum-based code generation approach, guided by mini-goals and a theory of mind module within the FLE loop, seems like a promising way to probe the limits of LLM planning in this environment.
- AMD MI300A Architecture Explored: Members discussed the fused AMD CPU-GPU platforms, especially the IOD and infinity cache of the MI300A architecture, speculating on how memory is distributed between chips.
- One member mentioned using
s_getreg
to figure out which XCD and CU a shader is running on, and from that, measuring access latency to different locations in memory.
- One member mentioned using
Manus.im Discord Discord
- Minimax Mimics Manus, Stumbles on Credits: Members observed that agent.minimax.io has copied Manus, which had serious potential before credits were announced.
- A member complained that the stupid pricing ruined it, referring to the announcement of a credit system.
- Manus Eats Credits Due to Its Own Errors: Users report that Manus is eating credits due to its own errors, with one stating that it ate all my credits all 4k over its own errors.
- Some users are reporting that it used 700/1000 credits to deliver a blackscreen website.
- Free Lume Shilled as Manus Alternative: Members debated lume.im as a free and unlimited alternative to Manus.
- The promotion of Lume by a user led to accusations of shilling and spam.
- Gemini Gains Ground, Grounds Manus: A member found that Manus couldnât do it, but Gemini could in specific tasks, sharing a link to a Gemini output.
- They also added, Gemini is the best static canvas currently. Manus isnât static, so we cant combine those.
- Manus is Slow and Forgetful: Users are complaining that Manus is slow and doesnât follow instructions, with new updates making it worse.
- Examples include simple compiling documents taking 40 minutes and burning 400+ credits.
Nous Research AI Discord
- Nous Kickstarts Psyche for Pretraining: Nous Research initiates pretraining on psyche.network, coinciding with Jensen Huangâs comments on the value of pre-training versus post-training.
- A member mentioned that distributed stuff is only gonna get better from here and will benefit decentralized training, whereas others were skeptical.
- Dawn Internet Deploys Decentralised Broadband: Dawn Internet introduces a decentralized broadband protocol that provides gigabit internet using fixed wireless rooftop antennas.
- Their latest WiFi router features a GPU that can support RL, expanding possibilities for decentralized applications.
- Hermes 4 set to Begin Training: Nous Research will begin training Hermes 4 on Monday, though it will still take a minute to train and prepare, using the newest Mistral.
- The new model of the Zeus series will not be based on the old Hermes.
- Kimi-Dev-72B Achieves Coding LLM Milestone: MoonshotAI releases Kimi-Dev-72B, a new open-source coding LLM optimized via large-scale reinforcement learning.
- It achieves a new state-of-the-art on SWE-bench Verified among open-source models with a score of 60.4%, patching real repositories in Docker and gains rewards only when the entire test suite passes.
- WebSummit Talk Rants About Closed Internet: A member shared a talk given at WebSummit in Vancouver about the closed internet and closed AI, half history, half rant.
- It was cross-posted on FXTwitter by another user.
aider (Paul Gauthier) Discord
- Electron Apps Boast High Valuations: Referencing the $9B valuation of VS Code forks, a member joked about forking an electron app instead of building a TUI.
- Another member pointed out the prevalence of VS Code among college students, highlighting its relevance despite being an over engineered solution.
- Aider Benefits from RA-Aid Integration: After examining RA-Aid, a user noted Aiderâs benefits with its repo map, enabling users to add files to context.
- The user also expressed surprise that Aider spent 5 cents and 32K tokens seemingly doing nothing before resorting to a brute force grep of the codebase.
- Personas Donât Boost LLM Performance: A user shared an Arxiv paper which argued that adding personas in system prompts does not improve model performance across a range of questions compared to the control setting where no persona is added.
- They were backing up their opinion that this has been the case for a while now.
- Brainstorming UX Delivers Bonkers Ideas: Prompting DeepSeek generated feature tiers for Realistic, Outside the Box, and Completely Bonkers ideas.
- The Completely Bonkers tier featured suggestions such as Anti-Gravity Code Reflow and Multiverse Branching.
- Aider to Manage Context Window: A user requested a feature for Aider to manage its context window automatically, beyond simply adding files.
Latent Space Discord
- Claude Swarm Manages Teams: Claude Swarm, a tool that uses Claude Codeâs MCP capabilities to setup hierarchical teams of experts, is gaining traction at Shopify and other companies (code here).
- One user suggested expanding the team with a recruiter expert to manage the swarm configuration and team expansion.
- Proactive AI Agent Definitions Face IMPACT: A blogpost defining proactive AI agents as entities that control their schedules, workflows, have persistent memory, and use stateful tools (substack post).
- swyxio ran this definition against his IMPACT framework and noted it lacks intent, planning, and authorization.
- Anthropicâs Multi-Agent Opus System: Anthropic reported that a multi-agent system using Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on internal research eval (Anthropic blogpost).
- The system uses about 15x more tokens than chats due to parallelization and extensive tool use, requiring prompt engineering to prevent excessive agent spawning and web scouring; LLMs are also used to evaluate the outputs.
- Obsidian Copilot acts as Obsidian markdown writerâs cursor: Users discussed tools for working with markdown files with AI assistance, proposing Obsidian Copilot as a viable option (Obsidian Copilot).
- Users desire functionality beyond simple chat, such as breaking notes by topic, tagging, aggregating notes, and creating flashcards with Anki MCP.
- Moonshot AI Kimi-Dev-72B Moonshots Open Source: Moonshot AI has open-sourced their Kimi-Dev-72B model, achieving a State-of-the-Art (SotA) result of 60.4% on SWE-bench Verified among open-source models (HF model).
- The announcement was made by Aran Komatsuzaki on Twitter, with links provided to both the Hugging Face model and the GitHub repository.
Eleuther Discord
- Cracking the Code: Navigating New Research Terrains: When diving into unfamiliar territory, a member proposed strategies for identifying landmark papers, including leveraging professorâs lecture notes and exploring citations in recent publications from leading labs, also see this discussion on LessWrong.
- The insights focused on using reverse engineering to understand the flow of ideas and identify critical works in fields like video generation.
- Storytelling Showdown: LLMs as Narrative Architects: A memberâs request to post their English 101 paper examining LLMs as narrative simulators was declined, but a link to The Void post on tumblr was shared, featuring related analyses.
- The discussion hinted at LLMsâ architectural alignment with storytelling, sparking interest in their emergent narrative capabilities.
- Math Blunders Taint AI-Authored Articles: Members issued a cautionary note regarding AI-generated papers, referencing a flawed response to an Apple paper (dubious) that was supposedly plagued with mathematical errors from arxiv.org, specifically referring to this tweet.
- This was a reminder of the potential pitfalls of relying on AI for academic work without thorough validation.
- WSL Workersâ Woes: Pytorchâs Parallel Processing Perils: A user highlighted issues with PyTorch dataloader workers getting killed by signal in WSL, particularly when dealing with high worker counts and extensive sequence lengths.
- Suggested solutions involved scrutinizing
/var/log/syslog
for potential OOM errors and diligently managing memory when handling lengthy video sequences.
- Suggested solutions involved scrutinizing
- Copyright Clash: Legal Laughingstock Looms?: A user provocatively stated that copyright law is a joke unless youâre the abuser, questioning DMCA and copyfraud penalties, while linking to both fxtwitter.com and arxiv.org.
- The comment sparked discussion about the effectiveness and fairness of current copyright enforcement mechanisms.
Notebook LM Discord
- Notebook LM Plus Access Status Still Unclear: A user inquired about the status of their NotebookLM Plus access, despite being a paid AI Pro subscriber using 1900 sources, and shared a NotebookLM Mind Map with 4 sublevels and a size around 9MB.
- The discussion underscores the need for clarification on feature access relative to subscription tiers.
- AI Platform Aims to Dominate PM Interviews: A member is beta testing a Conversational AI platform tailored for PM interviews and is seeking feedback from beta testers, who can sign up via a provided form.
- This initiative seeks to leverage AI to enhance interview preparation and validation processes.
- Podcast Audio Quality Declines!: Users reported a decrease in the audio quality and content of NotebookLM podcasts, noting robotic and repetitive framing of the source material and the issue affects generated podcasts.
- The generated podcasts were described as sounding broken and fake.
- NotebookLM Embraces LaTeX Markup for Equations: NotebookLM now supports LaTeX markups for math and scientific equations, similar to other LLMs, and users can utilize online or offline LaTeX renderers to view the equations.
- The LatexInNotebooklm extension has been created for more specialized support.
- Image Uploading Now Supported in NotebookLM: Users have found that NotebookLM now supports direct image uploads from devices, removing the prior dependency on Google Drive.
- Images can be uploaded by choosing the choose file option or dragging.
Torchtune Discord
- DTensor distresses Distributed Llama4: Members encountered a
RuntimeError
related to DTensor cross-mesh operations during multi-node distributed finetuning of Llama4 Maverick, stack trace available in output.log.- The error manifested differently with varying numbers of nodes (8 vs 12), pointing to potential issues with the fused optimizer and mesh configurations.
- Iterable Packing Innovation Inbound: A member is developing a private finetuning library with iterable packing, built on top of pytorch/data, showing great results and prefetching capabilities.
- They expect to opensource the library next week and also highlighted that packed DPO is missing in many libraries.
- Fused Optimizer Flounders on Full Finetune: During attempts to train, the fused optimizer was found to cause issues, particularly with checkpoint creation resulting in
nccl
timeouts, whereas the non-fused optimizer allowed training on 8 nodes.- It was suggested that increasing the
NCCL_TIMEOUT
environment variable, or settingtotal_epochs=self.total_epochs+1
to enable asynchronous checkpoints, might mitigate these issues.
- It was suggested that increasing the
- Mistral Small Debuts, Disappoints: Despite its recent release, the Mistral Small model isnât impressing everyone, with one member saying the mistral small results, even on their own blogposts look barely better than Gemma 3
qwen3.- The member also clarified that they had initially misclicked on Magistral instead of Mistral while researching.
- ZO Optimizer Promises VRAM Savings: Members discussed the ZO optimizer and its potential for 3x VRAM economy, referencing a paper on the topic (arxiv.org/abs/2506.044303).
- Members agreed that the most important takeaway from the ZO paper is its scalability on different sizes and its use of mostly non-synthetic experiments.
Modular (Mojo đ„) Discord
- RDNA4 Support Arrives: As of the last nightly, RDNA4 is supported for direct GPU programming in Mojo, but models need RDNA-specific paths for matrix multiplication operations.
- An introductory patch adding necessary WMMA operations brings models closer to full functionality on RDNA3+.
- Zen 4 Gives BFloat16 Boost: While the 5950x lacks AVX512_BF16 support, Zen 4 and above CPUs, like the Ryzen 7000 series, offer some bfloat16 support.
- Itâs unconfirmed whether these include the exact FMA instructions needed for CPU inference, but it is a step in the right direction.
- Mojoâs Testing Structure Leaves Users Scratching Heads: Users expressed frustration with Mojoâs testing codebase structure, particularly with imports within test files; running with
mojo test -I .
allows tests to import the package being tested as a library.- One user suggested looking at ExtraMojo as a good project structure example.
- LLVM Bloats Mojo Binaries: Most of Mojoâs binary size comes from statically linking LLVM, with MAX on its own around 750 MB, and the .mojopkgs shipped with MAX about 100 MB.
- The team is actively working to reduce the number of LLVM copies.
- Host Synchronization Not Needed for CUDA Streams: A member questioned whether
ctx.synchronize()
is necessary in Puzzle 12; a Modular team member confirmed DeviceContext uses a CUDA stream, so execution order matches call order.- The Modular team member confirmed that no explicit sync is required and promised to adjust the documentation accordingly.
MCP (Glama) Discord
- Agentic Frameworks Welcome MCPs: An agent questioned how MCPs fit into an agentic framework, suggesting an orchestrator agent as the top layer, with specific agents accessing multiple MCP servers for tool selection and memory storage, clients can also use smarter hosts with tool reranking.
- The team developing a single MCP server exposing all GitHub APIs is exploring the idea of an orchestration server that can invoke or proxy to other MCP servers, and encourages checking out the code GitHub MCP Server.
- FastMCP Segregates Domains with Subservers: A member pointed out that fastmcp can mount MCP servers, enabling a router server to host subservers for domain segregation.
- A member also helped resolve a connection error with fastmcp by pointing out that the full URL with
/mcp/
is required for streamable-http, and that the default streamable-http port is 8000, not 6277.
- A member also helped resolve a connection error with fastmcp by pointing out that the full URL with
- SchemaPin Stops MCP Rug Pulls: A member announced the launch of SchemaPin designed to prevent MCP Rug Pulls and similar attacks, with the repo available on GitHub.
- The homepage provides easy ways to implement SchemaPin, and all Glama MCP servers now support streamable HTTP e.g., [glama.ai/mcp/instances/svuec7nlpl/mcp?token=f6830a11-ded3-4492-8fb0-09eb09b08257].
- Excel MCP Server Trends on GitHub: A member shared their repo, excel-mcp-server, after it trended twice on GitHub.
- The member welcomes any and all feedback on the project.
- MCPCat Debugs your MCP: A member is developing user analytics and live debugging for MCPs via MCPCat, with the repo available here.
Cohere Discord
- Cohere Docs Get a Fix!: A user identified and reported a typo in the Cohere documentation, specifically that
co = cohere.SagemakerClient()
should use a lowercasem
inSagemakerClient
.- This correction ensures accurate implementation of the Amazon Sagemaker Setup Guide.
- LLM Teamwork Tactics Teased: A user is researching how teams integrate large language models like ChatGPT and Claude into their workflows, seeking insights on changes and missing elements since their adoption.
- The inquiry aims to understand the evolving landscape of team collaboration with LLMs.
- Tool Surfaces: Users have reported the sporadic appearance of a tool named direct-injected-document in Cohere model responses.
- The community seeks prompt examples and model specifications to investigate this behavior further.
- Privacy Preservation Pal Proclaims Passion: Yasir Khan, a Computer Science graduate, introduced himself, mentioning work on secure machine learning and privacy-preservation.
- He seeks connections for collaboration on AI/ML projects.
- Ollama Models Obtain Opinion: A new AI enthusiast shared their enjoyment of playing with models from ollama.
- They expressed that itâs fun.
LlamaIndex Discord
- Data + AI Summit Highlights Agentic Workflows: The @databricks Data + AI Summit 2025 featured content on agentic document workflows, with CEO @jerryjliu0 giving a well-attended talk, as described here.
- @microsoft demoed new AI Travel Agents coordinating with Model Context Protocol, LlamaIndex.TS, and @Azure AI Foundry, as described here.
- SF Event Focuses on Building Secure AI Agents: An upcoming evening in San Francisco will offer expert insights on building and securing AI Agents in production, covering best practices outlined here.
- The event features presentations from @seldo, VP of Developer Relations, alongside experts from Ravenna and @auth0, who will discuss Building Real-World Agents.
- LandingAI Tool Challenging LlamaIndex?: Members discussed LandingAIâs new vision agent document understanding tool, created by Dr. Andrew Ngâs company, prompting comparisons to Llama Parse following a prior post comparing it to Mistral.
- More information on the companyâs tool is available at LandingAIâs website.
- Synk Actively Expanding Dev Team: Synk is actively hiring developers for their decentralized browser system project, including roles in back-end, front-end, and blockchain development, along with QA Engineers, DevOps Engineers, Moderators, and a Marketing Analyst.
- Interested candidates are directed to Synkâs X page to learn more about official employment with signed documentation, guaranteed salary, and a flexible schedule.
- LlamaExtract Users Encounter Parsing Problems: Users have reported experiencing parsing errors with LlamaExtract, where no data is extracted from documents.
- While some members still experienced issues, one member confirmed that they were receiving data, and included a screenshot of a successful extraction using LlamaExtract (image.png).
DSPy Discord
- DSPy Optimization Patterns Incorporations: A member sought insights on incorporating optimization patterns within DSPy.
- Further details regarding specific patterns or use cases were not provided.
- DSPy âRunnersâ Enable Cross-Language Functionality: A member proposed building DSPy ârunnersâ that leverage saved JSON definitions to execute compiled programs, enabling cross-language functionality, such as Swift using a compiled program through a managed API.
- Challenges were raised concerning the serialization of program logic not captured in the JSON output, such as signatures and modules.
- TextGrad Optimizer Delayed: A member inquired about updates on integrating TextGrad as an optimizer for DSPy, referencing issue #1197 on GitHub.
- The member showed enthusiasm for TextGradâs potential in optimizing complex prompts and asked about workarounds for integrating it into DSPy, but no solutions were offered.
- Model Writes Prompts at DAIS Session: A member shared a write-up of their DAIS session titled Let the Model Write the Prompt (dbreunig.com), and a YouTube link to the session recording was also provided.
- The discussion centered on how models can autonomously generate prompts, with practical examples given from the DAIS session, but no further technical details were provided.
- DeepSeek R1 7B Struggles with DSPy Optimization: A member reported suboptimal optimization results using DeepSeek R1 7B in a DSPy-Text2SQL demo, in comparison to GPT-4o-mini.
- It was suggested that providing more schema information could potentially enhance DeepSeek R1 7Bâs performance, following attempts with LabeledFewShot and BootstrapFewShotWithRandomSearch.
LLM Agents (Berkeley MOOC) Discord
- Certificates Arriving Mid-July: A member stated that certificates for the LLM Agents Berkeley MOOC will be released in mid July.
- This resolves questions from users regarding the distribution timeline.
- Reasonable Effort Wins Certificates: A member clarified that email confirmations are sent for each assignment submitted via Google Forms, and as long as everything is completed with reasonable effort, a certificate will be granted.
- This addresses user concerns about assignment grading and certificate eligibility.
- MOOC Quiz Archive Shared: A member shared the Spring 2025 MOOC quiz archive.
- This archive is also available on the course website in the Quizzes section.
MLOps @Chipro Discord
- ControlThrive founder greets community: Servando, the founder of the AI/ML consulting practice ControlThrive controlthrive.com, introduced himself to the community.
- He invited members to connect with him on LinkedIn or X.
- Outerbounds event coming up: Servando announced an upcoming event he is hosting with Eddie from Outerbounds (the team behind the ML infra at Netflix).
- He shared a link to the event and encouraged community members to join.
Codeium (Windsurf) Discord
- Claude Sonnet 4 Debuts!: Claude Sonnet 4 and Claude Sonnet 4 (Thinking) are available to all paid plans via API Pricing.
- These models promise enhanced performance and capabilities for various AI applications.
- Mohan Voices Impressions on Claude: Mohan shared some impressions of Claude on X.
- The specific context of Mohanâs commentary isnât contained in the source, but the retweets spotlight community opinions about Claude.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #announcements (1 messages):
Perplexity Research Improvements, Finance Pages Key Issues, Tasks Automated Search, Discover Page Update, Finance Futures Graphs
- Perplexity Improves Research: Perplexity AI released a new update, Improved Research, detailed in their June 13th changelog.
- The update encompasses key issue fixes on finance pages, introduces automated search with tasks, updates the Discover Page, and adds futures graphs on finance.
- Finance Pages now display Futures Graphs: Perplexity AI announced the addition of Futures Graphs on Finance in their June 13th changelog.
- This enhancement aims to provide users with more comprehensive financial data visualization.
Perplexity AI â· #general (1113 messagesđ„đ„đ„):
Gemini 2.5 Pro, Claude Opus 4, o3 Pro, MiniMax M1, Genspark
- Gemini 2.5 Pro underperforms, overhyped?: Members found Gemini 2.5 Pro disappointing outside of coding tasks, especially when compared to other models for general search and reasoning, despite its coding specialty and advertised capabilities.
- One user stated Gemini is shit outside coding and another mentioned it often makes up bs explanations without actually searching the web, and some reported a limit of 3 trials per day.
- O3 Pro gets competitive: Users experimented with challenging O3 Pro with coding and decoding tasks, sometimes providing hints or examples from other models, noting instances where O3 Pro improved when framed as a competition, but also displayed inconsistencies.
- One user reported that if you play favorites with it and you dont choose it, answer improves.
- MiniMax M1 reasoning model emerges: The release of MiniMax AIâs M1 reasoning model was discussed, with initial impressions suggesting it was interesting but not as effective as Deepseek R1, with some users noting its verbose thinking process, as described in the official release.
- It was suggested that MiniMaxâs agent capabilities might improve over time, given their track record with previous models, though the usefulness of its reasoning output was debated, especially given lack of source links.
- Genspark offers free O3 Pro, users remain skeptical: The availability of OpenAIâs o3-pro for free on Genspark was met with skepticism, with users questioning how Genspark could offer unlimited access when OpenAI doesnât, suggesting potential limitations or errors after certain usage thresholds, as described in their service description.
- One user reported seeing claims of it taking much less time in reasoning, but it was speculated it was not full o3 pro due to the lack of reasoning tokens.
- Annoyances with Perplexityâs Memory and Features: Users shared their frustrations about Perplexityâs memory capabilities, with one noting Perplexity has Alzheimerâs and others discussed how custom instructions and browsing contexts seemed to influence or taint subsequent searches, also some also observed glitches like the dot overlay remaining on generated images even after loading fully.
- A user noted that they turned off this feature claiming that iâm stingy with my data⊠but PPLX is like.. bro⊠just answer the dam questionsâŠ.
Perplexity AI â· #sharing (8 messagesđ„):
Shareable Threads, Nvidia GB200, Driver Stability, Emergence, Android App Security
- Make Threads Shareable!: A member asked users to ensure their threads are
Shareable
.- A link to how to make threads shareable was included.
- Nvidia GB200 exascale AI supercomputer coming soon!: A member shared a link about the Nvidia GB200 exascale AI supercomputer.
- It is expected to provide âunprecedented performance and capabilities for AI workloadsâ.
- Nvidia 566.36 Driver Stability: A member shared a link about Nvidia 566.36 Driver Stability.
- The page gives tips on âtroubleshooting and maintaining optimal performanceâ.
- The Highly Alarming Emergence: A member shared a link about the highly alarming emergence.
- No further context was provided.
- Perplexity Android App Security: A member shared a link about Perplexity Android App Security.
- No further context was provided.
Perplexity AI â· #pplx-api (7 messages):
API credit charges, Perplexity Linux CLI client, AI startup resources
- API Credit Conundrums Plague Users: Users report that API credit charges exceed actual usage, seeking assistance through multiple channels like email ([email protected]), Discord, developers forum, and Intercom chat.
- A member advised to send an email to [email protected] for support regarding the billing discrepancies.
- Perplexity CLI client emerges: A member shared their Perplexity Linux CLI client.
- The developer created an AI project for searching the web.
- Startup Seeker Scours Sources: A member expressed interest in building an AI startup and requested resources to learn more about Perplexity.
- They seek guidance on using it for web searching and product development.
LMArena â· #general (963 messagesđ„đ„đ„):
Kingfall vs Blacktooth, Grok 3.5 release, Gemini 2.5 Pro, Minimax M1 open source, LLM privacy
- Kingfall Loses Favor to Blacktooth for Some: Some users find Blacktooth to be a better model due to refined output and respect for the thinking process while others prefer Kingfall for its spatial abilities and magical moments.
- Some suggest that Blacktooth is a lmarena proc, not at all functional compared to kingfall when u look at coding, hence why the svgs were also not as high-fidelity.
- GPT-5 Release Timeline Sparks Debate: Discussion revolves around when GPT-5 will be released and if it will seal Googleâs fate as a side runner, with some predicting Grok 3.5 or GPT-5 will dominate for a while.
- Users discussed if those who pay for ChatGPT notice GPT-5 and suggested it might be for only a few months or even weeks.
- Gemini 2.5 Proâs Prowess in Coding: A user reported that Gemini 2.5 Pro is better at coding in pygame than o4.
- Another user shared a ChatGPT conversation where 2.5 Pro answered a logic question correctly after being told its previous answer was wrong.
- LMArena Checkpoint Spamming Accusations Fly: Some users on LMArena believe that big tech companies get an advantage due to the checkpoint spam and the opportunity for getting more data for rlhf.
- A user said that LMArena is basically USA big tech leaderboard with open models or foreign models either not appearing or appearing extremely late.
- Minimax Releases Open Source Reasoning Model: Minimax released a new open-source large reasoning model, MiniMax-M1-80k, but early benchmarks show it being outperformed by o3 and Gemini 2.5 Pro.
- Some users reacted negatively, stating itâs either a fluke, they messed up, or its only capable of speaking chinese
LMArena â· #announcements (1 messages):
Models Erroring Out, Models not responding, Model API Issues
- Models Errored, then Fixed!: The team acknowledged a widespread issue causing models to error out instead of responding and promised a speedy fix.
- The issue has since been resolved; users were encouraged to report any persisting problems.
- Issue Resolved and Models Operational: The team confirmed that the widespread issue causing models to error out has been resolved.
- Users are advised to report any further problems or persisting issues after the fix.
OpenAI â· #annnouncements (1 messages):
ChatGPT Image Generation, WhatsApp Integration
- ChatGPT Images Invade WhatsApp!: ChatGPT image generation is now available in WhatsApp via 1-800-ChatGPT.
- Dial-a-DALL-E: WhatsApp Number Goes Live: The 1-800-ChatGPT number on WhatsApp is now live, enabling image generation for all users.
OpenAI â· #ai-discussions (957 messagesđ„đ„đ„):
AI vs Human, GPT Plus Worth It, Sora video generation, Veo 3 video generation, GPT Model Performance
- Spotting AI: More Art, More Problems: Some members discussed how people are easily fooled when they think they can tell the difference between AI-generated and human-created art, especially when complexity and detail increase.
- They compared this to counterfeit money, where more detail leads to more scrutiny.
- GPT Plus: Schoolâs Cool?: Members debated whether GPT Plus is worth the $20 subscription for school use, with some suggesting the free version with GPT-4o might suffice.
- There was a consensus that Plus offers better models like o3, 4.1, 4.5, and o4 mini high.
- Sora: Still a Sore Spot?: Despite one memberâs proficiency, others critiqued Sora for being really bad, especially compared to Veo 3 but another member mentioned liking it for feeling more creatively tuned and the details it allows.
- A key advantage of Veo 3 is its sound capabilities.
- Veo 3 Steals the Show: Members lauded Veo 3âs ability to generate copyrighted content like Star Wars, with one stating that with Veo you can stash a solid reference frame or style mask and feed it back on every pass, then bounce back into V3 for the final polish to keep things looking steady.
- Despite this, Sora is seen as a one-stop shop and a great value.
- Performance Variance: Model Madness: Members found that GPT-4o often outperforms 4.1 in certain tasks, with one noting, 4o got it right 10/10, 4.1 was 3-4/10.
- It was observed that removing spaces in prompts could significantly improve 4.1âs accuracy.
OpenAI â· #gpt-4-discussions (86 messagesđ„đ„):
GPT-4o's Memory, Fine-tuning GPT Models, Custom GPT Model Selection, DALL-E 3 Removal, Canvas auto updating
- GPT-4o Accesses Previous Chat Data?: A user described a situation where GPT-4o quoted verbatim from a scene co-authored in a separate chat with a custom GPT, leading to speculation about cross-chat memory access.
- While some believe itâs accurate inference, the user argued the statistical improbability suggests otherwise, inviting DMs for a conversation log.
- Mini vs Nano: Choosing the Right Model for Fine-Tuning: A user asked which model between 4.1 mini or nano is better for mimicking a writing style through fine-tuning.
- Another member suggested starting with a few hundred examples of 100-200 words, noting diminishing returns after a quarter of a million words, but the user is willing to spend $15 to train on millions of words worth of content.
- Custom GPT Model Options Expand: Users noticed that custom GPTs now support a wider array of models, including GPT-4o, o3, and o4-mini.
- One user found the RAG in custom GPTs superior to that in Projects, citing the June 12, 2025 release notes detailing expanded model support.
- DALL-E 3 Image Generation Disabled?: Members reported DALL-E 3 image generation might be disabled in ChatGPT and lamented the inferior quality of the new native image generator.
- One user who just renewed their subscription expressed frustration, wishing OpenAI would keep the original DALL-E 3 available.
- Canvas auto updating to Last Canvas?: A user is looking for ideas about how to get chatgpt to access and update the correct Canvas and describes an issue where chatGPT automatically updates the last Canvas you made, instead of the first one.
- Another member offered to help troubleshoot the Canvas issue via DMs, offering to replicate the problem and attempt a fix.
OpenAI â· #prompt-engineering (135 messagesđ„đ„):
Pandoc, HTML parsing, Sora AI prompting, O3 model prompting, GPT coherence
- Pandoc Promoted for Polished Parsing: A member suggests using Pandoc for converting HTML to Markdown, emphasizing itâs a purpose-built tool, instead of using awk or other scripting tools.
- Another member agreed it is better to use well-supported open-source tools to solve problems.
- Scrubbing Strategies Spark Token Savings: Members discussed how HTML tags create noisy tokens that might be beneficial for reasoning, but can increase token usage and cost in AI pipelines.
- One member noted that while the semantic difference is negligible for single website queries, it can add up over scale.
- Long-Form Yearning for O3 Prompts: A member sought advice on prompting O3 and O3-pro to generate long-form responses instead of concise, bullet-pointed summaries.
- They noted that other models such as Sonnet and Opus 4 did not have the same issue.
- Soraâs Style Showdown: DALL-E vs Code Prompts: A member inquired whether DALL-E style prompts or code-style prompts are better for image generation on Sora AI.
- The userâs use case involves parsing and understanding complex academic research papers from web pages and applying the research to an ongoing debate.
- Creative Coherence Crisis with Chatbots: Members discussed methods for intentionally making ChatGPT lose coherence, with suggestions including absurdity, overuse of metaphor, excessive jargon, unhinged persona definitions, and contradictory guidelines.
- One member recommended the Burroughsâ cut-up technique to diagonalize the context, making the output dream-like.
OpenAI â· #api-discussions (135 messagesđ„đ„):
Pandoc vs awk for parsing, HTML noisy tokens, Long form responses from O3, Sora AI prompts for image generation, GPT coherence loss
- Pandoc provides parsing prowess!: Members discussed using Pandoc for converting HTML to Markdown instead of awk, highlighting its purpose-built design and widespread use.
- One member emphasized the importance of using appropriate tools for parsing, suggesting reaching for a âswiss-army chainsawâ like Pandoc over basic tools for complex tasks.
- HTML âNoisy Tokensâ may aid reasoning: It was mentioned that HTML tags, although seemingly noisy, can actually be beneficial for reasoning in certain AI applications, especially at scale.
- A member noted that while the token increase from tags is tiny, it could add up in large-scale operations, adding valuable context.
- Optimizing O3 for Long-Form Output: A user requested prompts that elicit long-form responses from O3 and O3-pro when reviewing files or conducting in-depth research, as the models tend to be concise and favor bullet points.
- The user noted they did not experience similar issues with Sonnet and Opus 4 when using them to review files.
- ChatGPTâs Coherence Conundrums Explored: Members discussed methods to intentionally induce loss of coherence in ChatGPT, including absurdity, metaphor overuse, jargon, unhinged personas, and contradictory guidelines.
- Techniques like Burroughsâ cut-up method, ADHD thought spirals, and fast speech were also suggested to diagonalize context and disrupt coherent outputs.
- UPSUM to the Rescue: Saving Chat Context: A member shared a meta-prompt called UPSUM Chain Prompt to produce updated summaries for seamless conversation continuation.
- It was highlighted that LLMs might not retain the entire conversation history, necessitating the use of shorter chats and summary techniques like Chain of Density and UPSUM to manage and preserve context effectively.
Unsloth AI (Daniel Han) â· #general (575 messagesđ„đ„đ„):
GPU detection issues with Unsloth, Unsloth-DeepSeek-R1-0528-UD-IQ2_M benchmark results, Hugging Face model downloading issue, Unsloth fine-tuning notebooks, AMD compatibility with Unsloth
- Unsloth benchmarks new Unsloth-DeepSeek-R1-0528-UD-IQ2_M: A new Unsloth-DeepSeek-R1-0528-UD-IQ2_M model achieved 69.4% on test cases, with a speed of 426.0 seconds per case compared to the APIâs 716.6 seconds, but one member was concerned there was too much hype before the final results.
- The model requires approximately 240GB to load with 65k context, making it more accessible locally compared to 7-800GB for FP8, which the member found to be significant.
- Hugging Face Naming Issue: Users discussed a naming convention issue in the Hugging Face cache folder, where uppercase/lowercase differences cause duplicate downloads, wasting space.
- The issue may be triggered when downloading a model using Unsloth and then using Hugging Face, leading to different naming conventions as the author may have changed the repo.
- Fine-Tuning Tips Shared: Members advised beginners to start with smaller models like 3B-8B, emphasizing that quality is greater than quantity in datasets.
- They also shared a YouTube video recommending new users spend 80% of their time on data.
- Unslothâs AMD Compatibility Nears Completion: Unsloth is reportedly close to being fully compatible with AMD GPUs because most of Unslothâs kernels are written in Triton.
- A member noted that while Triton compiles to AMD GPUs, the Triton settings might need optimization for AMD, potentially affecting performance, also pointing to this PR.
- Goodbye Reddit, Hello X: Users expressed dissatisfaction with Reddit due to issues like a bad automod system, lack of control over posts, and the prevalence of biased moderation.
- One user cited these reasons for deleting their Reddit account, suggesting Twitter (X) as a better alternative for blogging, monetization, and news, emphasizing that X is free of bots, and is just a social platform, so avoid hate and politics.
Unsloth AI (Daniel Han) â· #off-topic (13 messagesđ„):
KL Divergence Spikes, Google Colab GPU Pricing, TempleOS, Hugging Face Outage
- KL Divergence Randomly Explodes, Then Calms Down: A member inquired about KL divergence sometimes spiking to x10000 for a single step before returning to normal, a behavior that doesnât seem to impact training.
- Another member mentioned this occurs frequently, even in Hugging Face runs without Unsloth, possibly due to particular token values exploding during logprob subtraction and exponentiation between acting and reference policies.
- Sweet Spot for Google Colab GPU Prices: A member asked where the sweet spot is for GPU prices on Google Colab for fine-tuning, considering the balance between speed and credits usage.
- TempleOS gets discussed off topic: A member asked if anyone else liked TempleOS.
- Hugging Face seems down: Members reported Hugging Face being down and shared an image depicting the feeling of having to hug your own face in response to the outage, with a link to a relevant GIF.
Unsloth AI (Daniel Han) â· #help (266 messagesđ„đ„):
Qwen2.5 vs Qwen3, GGUF conversion, DPO vs SFT, Gemma3, Llama 3.2
- Qwen2.5 or Qwen3 to integrate coding assistant: A member needed a quick way to integrate a coding assistant with R/RStudio and asked about Qwen2.5 Coder 32B Instruct GGUF, but was advised to use the Qwen3 package from Unsloth instead, available here.
- The member plans to create a new model based on Qwen 3 with no_think as the default to embed in their workflow, rather than using the instruct model.
- SFT or DPO: DPO is better for controlling HOW the model responds, while SFT with a bit of upsampling is the way to go if you want the model to respond with specific information such as the modelâs name.
- This was in response to a question on what to do if a model needed to give a specific answer when asked a question such as, What is your name?
- Gemini 3 errors need fix: Users reported a
dtype
mismatch error when working with Gemma 3, the error is expected mat1 and mat2 to have the same dtype, but got: float != c10::Half unslot gemma, with members suggesting it may be related to bfloat16 precision or the GPU used.- A member is currently working on a fix for the errors encountered and suggested trying the installation instructions from this link to resolve the issue, as well as, replacing the default pip install commands with the force reinstall commands from the repo to receive latest fixes.
- Help converting Llama 3.2: A member asked how to install Unslothâs Llama-3.2-11B-Vision-Instruct model to
ollama
and was informed that pre-made GGUF versions can be found here for manual conversion to GGUF.- A user posted a link to official ollama instructions for converting to GGUF, and also suggested pulling the model directly from the ollama library.
- New fixes: New fixes have been pushed which are available if you install the updated code from the repo directly which requires to update to the code from the repo main (instead of pypi).
- It was suggested that installing from the main repo directly might solve a problem related to re-merging an adapter. The link provided details how to install Unsloth in your PC.
Unsloth AI (Daniel Han) â· #research (2 messages):
arxiv link
- AI Paper Shared: A member shared an arxiv link.
- Confirmation of Shared Resource: The member acknowledged that the resource was shared before they could do so themselves.
Cursor Community â· #general (750 messagesđ„đ„đ„):
Claude 4 Sonnet Performance, Cursor UI Issues, MCP Usage, Code Privacy, Bug Reporting
- Claude 4 Sonnet users observe slowness: Members have noticed that Claude 4 Sonnet runs significantly slower in Cursor compared to platforms like GitHub Copilot, despite its general effectiveness and stability.
- Some members suggest optimizing usage by reserving Max Mode for major refactoring tasks or exploring alternatives like Gemini 2.5 Pro for planning and code reviews to better manage inference credits.
- Users Complain about Cursorâs UI Flaws: Users report ongoing UI problems, such as command execution failures on Windows due to xterm bracketed paste issues and inconsistent notifications about command completion, resulting in wasted inference credits.
- One member noted that about 10-15% of their credits are wasted due to these UI malfunctions and suggested that Cursor provide inference counts back when errors occur.
- Navigating Model Context Protocol Usage in Cursor: A member sought advice on using Model Context Protocol (MCP), and they highlighted the benefit of the AIâs ability to leverage screenshots and integrate error messages automatically.
- Another user emphasized the importance of spending time on defining better prompts, citing it as more effective than frequent screenshotting and copy-pasting, suggesting Wisprflow for enhanced speech-to-text capabilities.
- Users Request Granular Code Privacy Settings: Users expressed a need for setting code privacy on a per-repository basis, allowing different settings for work and personal projects due to concerns over code storage and accessibility.
- Currently, Cursorâs Privacy Mode is a global setting, but the community desires more granular control at the project level for enhanced flexibility and security as they want to avoid unintentionally opening sensitive company directories.
- Streamlining Bug Reporting with Active Monitoring: Members are actively sharing bug reports and troubleshooting tips within the community, particularly focusing on issues like the broken command execution tool on Windows.
- The community is pushing for more active testing on Windows and better communication from the Cursor team regarding bug fixes and feature rollouts after a member noticed the Cursor Task Master which is actually a community third-party project not officially released.
Cursor Community â· #background-agents (40 messagesđ„):
GitHub integration, Background Agents Permissions, Background agents and Slack, Background Agents and Privacy Mode, Toggle bug with background agents
- GitHub Integration Roadmap: A user inquired whether agents can access GitHub issues, to which a member replied that itâs currently not possible but is on the roadmap.
- The member also clarified that the GitHub integration improvements are coming soon.
- Background Agents Lack Permissions to Create PRs: A user reported that background agents in Slack donât create pull requests, despite granting all permissions in Cursor integration settings, generating request IDs: bc-79e56de2-26d7-41a0-a5b3-b8b9e9f635d1 and bc-d3acc5d6-2520-413f-9f7c-2fb36590215d.
- A member offered to debug the issue and asked for the request ID.
- Background Agents Require New Privacy Mode: A user noticed that the old Privacy mode is now labeled âlegacyâ and inquired whether enabling Background Agents requires the new Privacy mode (with code storage).
- A member confirmed that Background Agents require the new privacy mode with code storage, because the original privacy mode doesnât permit storing code for the lifetime of a background agent, which is required to execute and iterate on code.
- Background Agents Toggle Bug Reported: A user reported a toggle bug with background agents, providing a video demonstrating the issue.
- A member responded, offering to investigate and requesting the user to check their GitHub installations for the account they want to connect.
- Cursor Not Listed Under Installed GitHub Apps: A user discovered that Cursor was listed under âAuthorized GitHub Appsâ but not under âInstalled GitHub Appsâ for their personal org, whereas for an org where Background Agents were working, Cursor was listed as installed with access to all repos.
- The user was directed to reconfigure/enable/disable repos and orgs at Cursorâs dashboard, via the âManageâ external link for GitHub, to resolve the issue.
HuggingFace â· #general (553 messagesđ„đ„đ„):
AI-Generated Feedback, Open Sourcing AGI, Bigram Testing, Qwen 2.5, HF Pro Disk Space
- Ethical Concerns Over AI-Generated Content: A member expressed unease about AI-generated feedback, citing ethical reasons for not engaging with AI-generated images or video work.
- The member stated I donât engage with AI generated images/video work even though I have the theoretical knowledge.
- Open-Sourcing AGI Debate Flares Up: A member humorously declared that if they ever create the worldâs first AGI, they will open source it.
- This sparked a discussion about the potential benefits and risks of open-sourcing such a powerful technology.
- Qwen 2.5 Impresses with Efficiency: A member lauded Qwen 2.5, a 7b model quantized to q4 on Ollama, for its impressive performance given its small size and simple system prompt, with another noting that No one showed in comparison in benchmark against qwen 2.5 Cause it was so good.
- It was also discussed that Qwen models are pretrained on multilingual datasets including Chinese and English.
- Rate Limiting and Zero GPU Quota Anomaly: Users reported issues with rate limiting on Hugging Face and some reported getting extra Zero GPU Quota.
- It was speculated that the extra GPU Quota might be a special provision for old users, but no official announcement was made.
- AI-Assisted Coding Gains Traction: The members discussed their experiences with AI-assisted coding tools, like Gemini, praising their ability to generate understandable and modifiable code.
- One member shared that they vibe coded for the first time something for IOS, completely 0 knowledge how it works and still have no clue âŠbut it does what it supposed to do.
HuggingFace â· #today-im-learning (2 messages):
HF audio course, Agents course, MCP course
- User kicks off HF audio course: A new member announced they are starting the Hugging Face audio course today.
- No links or resources were mentioned.
- Member learns Agents and MCP courses: A member is currently learning Unit 2 of the Agents course and has started Unit 1 of the MCP course.
- No links or resources were mentioned, only a <:hugging_rocket:968127385864134656>.
HuggingFace â· #cool-finds (1 messages):
cakiki: <@844851718512443423> No referrals please
HuggingFace â· #i-made-this (9 messagesđ„):
peft-bench, InfiniGPT French Q&A dataset, Shisa AI Japanese model, Swiftide Rust library for agentic RAG applications, QuantIntelli Football Betting Analysis
- InfiniGPT - The largest French Q&A dataset is here!: A 19-year-old student released InfiniGPT, a French Q&A dataset featuring 40,000+ Q&A entries, 100% native French, manually verified, diverse topics, and fine-tuning ready, aiming to establish a French dataset standard (GitHub, HuggingFace).
- The author claims it is 5x bigger than FQuAD and offers direct Q&A, not extractive reading, with documented sources and GPT-2 tokenizer optimization.
- Shisa v2 - Japanâs strongest model released!: A team of 2 released Shisa v2, the strongest model ever trained in Japan, along with an Overview Report (Technical Report forthcoming) available on HuggingFace (800GB+).
- They also updated their core SFT dataset (available at HuggingFace), claiming it improves Japanese performance without reducing English performance, based on training/releases on sota open models from 7B-405B.
- Swiftide 0.27 - Rust library ships!: A major release for Swiftide was shipped, which is an open-source library in Rust to build composable agentic and RAG applications (announcement).
- QuantIntelli - Hybrid AI Agent Predicts Football!: A Hybrid AI Agent for Quantitative Football Betting Analysis was created, combining XGBoost model and Google Gemini LLM with features like Advanced RAG Pipeline using Tavily, Google, and DuckDuckGo, persistent session logging with Supabase, and an interactive UI with Gradio (HuggingFace Space, Github Repo).
- JASCO - Music generation on MCP server!: Users can now generate musical stems using facebook/jasco via MCP server, which generates two variations of music based on text descriptions, chord progressions, and optional melody and drum inputs (HuggingFace Space).
- Instead of recording input audio with the mic, now you can generate drum outputs in ~1 second for gary to continue, via stable-audio-open-small, and name it jerry lol.
HuggingFace â· #reading-group (2 messages):
Portfolio Theory, Dr. Peter Cotton, Schur Portfolios
- Cotton Proposes Presentation on Portfolio Theory: A member suggested arranging for Dr. Peter Cotton to present his paper on portfolio theory, linking to the Schur Portfolios paper.
- They inquired about the process to organize such a presentation.
- Schur Portfolios Paper Presentation Proposed: A member proposed a presentation on Dr. Peter Cottonâs paper, âSchur Portfoliosâ, focusing on portfolio theory.
- The proposal included a request for guidance on organizing the presentation.
HuggingFace â· #smol-course (3 messages):
Smolagents, Ollama, Code Agents, Local Model Selection
- Smolagents loses compatibility with Ollama: A member reported that smolagents is no longer compatible with Ollama for local code agents.
- The member is seeking assistance to implement a local code agent.
- Request for model recommendation on limited resources: A member requested a recommendation for the best model to run locally with 8GB RAM and 6GB VRAM.
- They used smolagents for a final project, spending $10 on the OpenAI API to achieve 45% accuracy; Project Link.
HuggingFace â· #agents-course (21 messagesđ„):
HF Inference Costs, Local LLMs with Ollama, Unit 3 Assignments, Agentic RAG Locally, Unauthorized Imports
- Users grapple with HF Inference Costs: Several users are running into cost issues with HF Inference, especially when using models like llama-3-70b-Instruct, finding the free credits insufficient.
- One user reported paying around $6 after many attempts on the final unit and suggested using local models to mitigate costs.
- Ollama enables running LLMs Locally: Members are discussing using Ollama to run LLMs locally and then plugging them into agents, thereby reducing reliance on paid inference APIs.
- The cost savings can be significant, but one user felt that the final unit assignment was too challenging, suggesting a steeper learning curve.
- Feedback on Unit 3 Assignments: A user expresses that the final unit feels like being thrown into the deep end, wishing for more assignments like it throughout the course.
- They also noted that many leaderboard submissions appear to be copied, undermining the purpose of the exercises.
- Debugging Agentic RAG Locally: A user encountered an error trying to run Unit_3_Agentic_RAG locally and posted a screenshot of the error message.
- No specific solution was provided in the discussion, but the issue seemed related to setting up the environment correctly.
- Bizarre Unauthorized Imports: A user reported issues with CodeAgent flagging certain imports like plotly.express as unauthorized, even after specifying plotly as an authorized import.
- Another user confirmed similar experiences, noting that sometimes using aliases (e.g., bs4 instead of beautifulsoup4) can bypass the restriction, while confirming that adding
plotly.express
solves the userâs problem.
- Another user confirmed similar experiences, noting that sometimes using aliases (e.g., bs4 instead of beautifulsoup4) can bypass the restriction, while confirming that adding
OpenRouter (Alex Atallah) â· #announcements (1 messages):
Readybot.io: OpenRouter - New Models
OpenRouter (Alex Atallah) â· #app-showcase (3 messages):
Chess Leaderboard, Book Testing
- Chess Leaderboard Replay Functionality: A member added homebrew replay functionality to chess-leaderboard for every game (past and future).
- They mentioned that itâs maybe a bit better than lichess gifs, but a pain to implement with my stack with an attached chessreplay.gif.
- Book Testing Opportunity: A member mentioned their book is live as of June 15 and is offering to help with testing.
- They said they are happy to help if anyone shoots them a DM.
OpenRouter (Alex Atallah) â· #general (533 messagesđ„đ„đ„):
OpenRouter Discord Tag Request, Claude Prompt Debugging, GPT-4.1 Mini Offering, Free Model Credit Usage, Multilingual Model Recommendations
- Discord Tag Craving Unacknowledged: A member expressed frustration over an unacknowledged request to restore the OpenRouter Discord tag, offering to pay for it despite OpenRouterâs financial superiority.
- They jokingly threatened to ping Alex Atallah due to the lack of response.
- Claude & Copilot Prompts lack Token Economy: A member debugged prompts from Claude Code and GitHub Copilot, finding they often ignore token efficiency when improving workflows, sending irrelevant content unless verbosity affects performance.
- They observed that conciseness isnât a primary goal for these systems when adjusting prompts.
- Testing a GPT-4.1 mini version on OpenRouter: A member offered access to GPT-4.1 mini with 200K tokens/minute available at 20% of the official token price, compatible with the OpenAI SDK, inviting high-usage testers to DM for more details.
- They highlighted its suitability for apps like Cline.bot and BYOK/BYOB setups.
- Deepseekâs Free Tier Suffers Outages: Users reported encountering 502, 503, and 524 errors when using the free version of Deepseek-r1-0528 through the API, with one suggesting the issues may stem from high traffic due to smut RPs.
- Members noted the paid version remained functional and discussed potential causes, including data center problems or issues with Chutes.
- OpenAI Faces Antitrust Threat from Microsoft Spat: Discussions revealed that OpenAI executives have considered accusing Microsoft of anti-competitive behavior during their partnership, potentially seeking regulatory review and launching a public campaign.
- This arose from difficult negotiations, prompting reactions of surprise and concern from community members.
LM Studio â· #general (247 messagesđ„đ„):
TokenBreak Attack, AMD Ryzen AI Mini PC, Increasing RAG size in LM Studio, MiMo VL 7B RL UD support, LLM Image Organizer
- TokenBreak Attack Bypasses AI: A member shared an article about a new TokenBreak attack that bypasses AI security measures, though they didnât get the same results from their experiment.
- Another member jokingly noted how similar the logos in the attached screenshot were.
- AMD Ryzen AI Mini PC runs big models: A member mentioned that the AMD Ryzenâą AI Max+ 395 âEVO-X2 AI Mini PC can run some big models smoothly.
- Others countered that itâs a glorified igpu supported by HIP SDK and that while a 70B model runs at ~5t/s, a Qwen3 235B A22B model runs at ~14t/s.
- Increase RAG size is not possible in LM Studio: A member asked how to increase the RAG size from 31.46 MB to 100 MB in LM Studio.
- Another member responded that it is not possible, RAG is still a basic implementation.
- The solution to long detailed LLM responses: narrative games: One member suggests starting a choose-your-own-adventure game with your LLM, ensuring it wonât attempt to finish the story in a single response.
- They suggest prompts like
Let's play a choose-your-own-adventure game. I'll start with a prompt and you carry the story on. When you reach a decision point, list a few choices for direction and I'll respond.
- They suggest prompts like
- Local LAN port opening is low risk: A member asked about security risks when opening ports, which caused concern about said site as the front implying opening the backend to the internet and creating security exploits.
- Ultimately, members agreed that opening ports on a local LAN network is low risk, though any open port can be exploited.
LM Studio â· #hardware-discussion (95 messagesđ„đ„):
GMKtec Windows install issues, RTX 6000 Pro wattage configuration, Graphics cards and coil whine, NVLINK performance experiments, GPU Recommendations for LLMs
- Windows install on GMKtec is a PITA: A user reported issues installing Windows on their GMKtec machine, citing installation failures and problems with Rufus-created removable media.
- They are trying to install Windows on a GMKtec machine.
- RTX 6000 Pro can shapeshift into 300W or 600W: The RTX 6000 Pro can be configured for either 300W or 600W, according to one member.
- The standard one can be configured, not sure which one is used in this build though.
- Graphics Cards Sing the Coil Whine Blues: Users have observed a significant increase in coil whine from their graphics cards when running LLMs compared to gaming.
- One user noted getting more coil whine with a 5090, suggesting undervolting as a solution to reduce both power consumption and coil whine.
- NVLINK Performance remains untested by many: A member inquired about experimental data on NVLINKâs interference performance difference, wondering if it provides a tangible benefit.
- Another memebr posted an image that said âIâm also sure nvidiaâs software is HIGHLY optimized for nvlinkâ.
- GPU Shopping List: 3090-4090-5090, you canât afford it: For running models like Qwen3, Devstral, and Gemma3, the 3090, 4090, and 5090 were recommended due to their 24-32GB VRAM, especially for larger models or higher quality quants.
- The 3090 is about as good as the 4090 and it keeps up with 5000 series cards with 24 GB. The 5090s are about $3k. For that price, youâll probably still run a 32B or smaller model, but a higher quality quant or more context
GPU MODE â· #general (6 messages):
PD disaggregation, Transformer moment for agents, Groq speed, Groq Huggingface
- PD Disaggregation Resources: A member requested resources on PD (power distribution) disaggregation, other than the DistServe paper.
- No resources were offered in the provided messages.
- Transformer Moment for Agents: A member inquired about the âtransformer momentâ for agents, seeking a general-purpose control strategy that adapts to any task automatically.
- They wondered if it could be DFS, BFS, or hybrid flowsâautomatically selected.
- Groqâs Speed and Collaboration with Hugging Face: A member asked how good Groq is and how they are so fast.
- Another member mentioned that Groq recently collaborated with Hugging Face, implying positive performance or accessibility; no explicit link was offered.
GPU MODE â· #triton (9 messagesđ„):
tl.constexpr behavior with expressions, Thread-level control flow in Triton, Triton kernel warmup time vs torch.compile, Single-row softmax kernel implementation
- tl.constexpr Expression Woes!: A user found that using
tl.constexpr
in an expression (e.g.,a = tl.constexpr(b // 2)
) causes errors withtl.arange
because it doesnât recognizea
as a constexpr, whiletl.arange(0, b // 2)
works fine; the solution is to type it asa: tl.constexpr = b // 2
.- The user provided a minimal reproduction example showing the error during compilation when
tl.arange
âs arguments are not explicitly defined astl.constexpr
.
- The user provided a minimal reproduction example showing the error during compilation when
- Looping at Thread-Level?: A user inquired about thread-level control flow in Triton, seeking to implement a loop that sums matrix rows and stops when the sum exceeds a threshold.
- No responses were given.
- Triton Kernel Slow Start?: A user reported that their handwritten Triton kernel takes warmup time to reach peak performance, unlike
torch.compile
, and wondered iftorch.compile
uses better heuristics for block size or other optimizations.- No responses were given.
- Softmax Race Condition: A user working on a single-row softmax kernel faces a race condition in the final kernel that writes the softmax results, as the first program overwrites the initial global max and sum.
- No responses were given.
GPU MODE â· #cuda (55 messagesđ„đ„):
CUDA cache policies, TF32 vs FP16 precision, L40 vs 4090 performance, nvcc generating LDS instruction, GCC and NVCC version compatibility
- CUDA Cache Policy Gets Fractional: A member shared a CUDA code snippet for creating a cache policy object using
createpolicy.fractional.L2::evict_last.L2::evict_unchanged.b64
and using it in a load instruction with cache hints, inquiring about its usage. - TF32 Precision Woes: A member observed that TF32 matmuls are 3x less precise than float16 on CUDA GPUs, tested on 4070 mobile and A10, and shared a Triton code snippet to reproduce the issue.
- They pointed to a potential cause in a related Triton issue regarding precision.
- L40s Underwhelming: ECC to Blame?: Members discussed that L40s may seem underwhelming in cloud environments due to ECC being activated by default, impacting performance, citing it as a configuration issue rather than a hardware problem.
- nvcc accidentally generates LDS instruction: A member reported that
nvcc
generates an unintended LDS instruction for data in global memory, causing errors when usingcompute-sanitizer
, and that using__ldg
fixes the issue.- Others suggested it could be undefined behavior and requested a minimal, reproducible example to further investigate the possible compiler bug.
- GCC+NVCC Version Combo Causes Choke: A beginner encountered an error related to parameter packs not expanded in
<std_function.h>
, and it was suggested that this is due to an incompatibility between GCC and NVCC versions, with CUDA 11.7.0 being the first to officially support Ubuntu 22.04.
GPU MODE â· #torch (2 messages):
CUDA kernel blocksize args, TorchTitan training graph capture
- CUDA Kernel Blocksize Args Best Practices Needed: A member is seeking best practices for transferring blocksizes in Python to CUDA code without JIT, using Torchâs cpp_extensions + setuptools to compile custom CUDA kernels.
- Theyâre looking for an alternative to explicitly adding blocksize as an int[] parameter in TORCH_LIBRARY registration, as that doesnât seem as elegant since most PyTorch functions donât expose blocksize args at all.
- Graph Capture with Torchtitan in Training: A member is training a llama1b with Torchtitan and wants to capture the training graph(s) with the various collectives when working with different parallelism combinations.
- They tried to intercept a training step and use the aot_module in functorch.compile to capture it but think the Faketensor propagation is not working with it.
GPU MODE â· #announcements (2 messages):
d-Matrix team, Dr. Lisa Su, GPU MODE, kernel data, Project Popcorn
- d-Matrix Team to demo Custom Silicone: The d-Matrix team will demo their custom silicone for low latency batch inference.
- Dr. Lisa Su Shout-Out GPU MODE: Dr. Lisa Su called out GPU MODE and its work enabling the worldâs first $100K competitive kernel competition at gpumode.com/news.
- Kernel Competition Generates Massive Kernel Data: The community generated more kernel data than exists on all of Github combined, outperforming the best baselines by human experts.
- Project Popcorn Collaborations: GPU MODE thanked collaborators at AMD and on Project Popcorn.
GPU MODE â· #beginner (1 messages):
Instruction latencies, arxiv.org
- Instruction Latencies Paper Shared: A member shared a link to a paper on instruction latencies: https://arxiv.org/pdf/1903.07486.
- The member noted that the instruction latencies may be outdated, but the discussion is still worth reading.
- Paper discussion on instruction latencies.: The paper at https://arxiv.org/pdf/1903.07486 discusses instruction latencies.
- The discussion is considered valuable, despite potentially outdated instruction latencies.
GPU MODE â· #torchao (5 messages):
MX-FP4 Matmul, MX-FP8 Matmul, CUTLASS, CuBLAS, FP4 Weight Quality
- CUTLASS and CuBLAS shine on 5090: MX-FP4 matmul from CUTLASS and MX-FP8 matmul from CuBLAS (via
torch._scaled_mm()
) are very impressive on 5090 (PR 2285). - Weight-Only FP4 kernel not available yet: A member inquired about benchmarks for smaller batches (1-64) with fp4 weight-only and there isnât a weight-only kernel for fp4 yet.
- Good Perf with Weight-Only FP4 coming soon?: A member stated they got a pretty good perf with weight-only FP4, and will try to make some time to put the code together for integration.
- FP4 Weight quality is bothering some members: The quality of FP4 weights is bothering some, as thereâs a noticeable drop in accuracy converting the weights, so it would need some quant algo to improve accuracy (mx-hqq ? đ ).
GPU MODE â· #off-topic (1 messages):
HQQ Rebrand, Quantum Quantization
- Rebrand HQQ for Quantum: A member suggested rebranding HQQ to Half Quantum Quantization to attract more interest.
- This followed a Multiverse Computing raise of $21.5M to scale technology that compresses LLMs.
- Quantum Computing Funding: Multiverse Computing recently secured $21.5M in funding.
- The funding aims to scale their technology for compressing LLMs, potentially benefiting projects like HQQ.
GPU MODE â· #rocm (19 messagesđ„):
Nvidia to AMD transpilation, AMD stable inference deployment, MI300A architecture, IODs and infinity cache, memory distribution
- Nvidia transpilation to AMD is CASS: A member shared a paper on CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark to try RF-ing a model with GRPO instead of SFT.
- The member is investigating whether the performance thresholds/differences are realistic in real-world workflows.
- AMD Stable Inference Deployment is Ollama: A member was looking for tips for stable inference/deployment libraries on AMD.
- It seems that Ollama works so they were just overthinking it.
- Diving Into MI300Aâs Architecture: Members discussed the fused AMD CPU-GPU platforms, especially the IOD and infinity cache of the MI300A architecture.
- They are wondering if there was a way to test a particular Path or pressure one or the other IOD.
- Memory Chipsâ Distribution Strategy Explored: Members speculated on how memory is distributed between the memory chips and how this affects latency, particularly on the MI300X, where each IOD is connected to 2 HBM stacks.
- One member mentioned using
s_getreg
to figure out which XCD and CU a shader is running on, and from that, measuring access latency to different locations in memory.
- One member mentioned using
GPU MODE â· #self-promotion (4 messages):
Thrust library, CUDA kernels, segmented sum algorithm, iterators, high_resolution_clock vs steady_clock
- Veitner Introduces Thrust Library: Simon Veitner introduced Thrust, a high-level abstraction layer that allows you to write performant CUDA kernels using modern C++ concepts.
- High_resolution_clock gets benched!: A member suggested not using
high_resolution_clock
but rathersteady_clock
for benchmarking, referring to this stackoverflow answer.- They added that, given enough periods and therefore parallelism, this should not be a problem, however what actually kills the performance even then is the strided/uncoalesced memory access.
- cuTensor is more fitting library: For a regularly sized example, a member suggests that cuTensor might be a more fitting library than Thrust.
- MatX makes multidimensional algorithms more elegant: A member recommended MatX for an elegant C++ interface to these kinds of multidimensional algorithms.
GPU MODE â· #đż (6 messages):
Tensor Core Algorithm Reformulation, RL for Tensor Core Usage, Kernel Code Verification, GPU Thinking Interpretability
- Algorithm Reformulation for Tensor Cores: A user inquired about a feedback loop involving a domain expert to reformulate algorithms into tensor-core forms, especially beyond simple cases like FFTConv, considering modifications like padding and rank-factorization.
- They sought guidance on steering experts towards tensor-core-friendly algorithm design.
- Reinforcement Learning Guides Tensor Core: One member suggested using Reinforcement Learning (RL) to guide models in utilizing tensor cores, by building a small verifier to check a modelâs trace for tensor core understanding.
- They pointed to Hugging Face Kernels as a potential data source, emphasizing its community-driven contributions.
- Creative Kernel Code Verification Ideas: One user is experimenting with verifying kernel code through a Triton interpreter, instead of full execution, for quicker verification and better scalability in data quality and RL efforts.
- This approach provides easier insight into memory and instruction calls within a CPU environment.
- âThinking GPUâ can be interpretable: Members discussed applying methods from natural language interpretability to program languages by creating probing classifiers per layer based on paper presentation at ICSE-NIER â25.
- The goal is to show that a model exhibits âGPU thinkingâ by using different layers and attention heads when solving a problem with a GPU approach versus a CPU approach, analyzing internal representations after initial translation projections.
GPU MODE â· #thunderkittens (4 messages):
ThunderKitten on Older GPUs, TK to AMD port, Attention Kernels with Variable Length
- Run ThunderKitten on Older GPUs Like T4 and P100?: Members discussed the possibility of running ThunderKitten on older GPUs like T4 and P100 available on Kaggle, noting it is likely doable despite challenges with async instructions and smaller shared memory.
- One member suggested compiling with a 4090 TARGET and reporting any breakages to help with compatibility.
- TKâs Port to AMD: Awaited Soon!: The team is actively developing a TK-to-AMD port, aiming for an imminent release to expand compatibility.
- The lack of async instructions is generally a bit annoying, and shared memory is smaller so weâd need more pipelining on the register front vs. the Nvidia megakernels.
- ThunderKitten Attention Kernels Support Variable Length: The ThunderKitten repo includes attention kernels that support variable length and padding, which can be helpful in various sequence processing tasks.
GPU MODE â· #reasoning-gym (1 messages):
Chain of Thought, CoT, Symbolic Reasoning, Math Reasoning
- CoT Benefits Pinpointed by New Research: A recent paper, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning (arxiv link), investigates the scenarios where Chain-of-Thought (CoT) reasoning provides the most advantages, specifically in math and symbolic reasoning tasks.
- Math and Symbolic Reasoning Excel with CoT: The study indicates that Chain-of-Thought (CoT) primarily enhances performance in mathematical and symbolic reasoning domains, offering insights into its limitations and strengths.
GPU MODE â· #submissions (16 messagesđ„):
MI300 AMD-FP8-MM, Conv2D on H100, VectorAdd Leaderboard Updates
- MI300 AMD-FP8-MM gets a Speedy Submission: A submission on the
amd-fp8-mm
leaderboard for MI300 achieved 5.23 ms.- Another submission reached 9th place with a time of 161 ”s.
- H100 Conv2D keeps churning: Multiple successful submissions were made to the
conv2d
leaderboard on H100, with times around 187-192 ms.- These submissions indicate consistent performance on the H100 for
conv2d
tasks.
- These submissions indicate consistent performance on the H100 for
- VectorAdd sees a ton of activity: Many submissions updated the
vectoradd
leaderboard across various GPUs (A100, H100, T4, L4), with times ranging from microseconds to milliseconds.- One submission achieved third place on T4 with 6.31 ms.
GPU MODE â· #factorio-learning-env (162 messagesđ„đ„):
Factorio RL, Factorio Agents, Hierarchical RL and LLMs, FLE API
- DOTA vs. Factorio gameplay difficulty debated: Members debated whether professional DOTA play is harder than reasonable Factorio gameplay, citing DOTAâs heavy use of compute (millions of GPU-hours) and reward shaping, while Factorio has a larger action and observation space and is sensitive to production balancing.
- The action and observation space of Factorio is bigger and sensitive to needing to calculate and balance production of intermediates, task horizon is longer, etc, and wonder what would happen if we tried RL on Factorio with heavy reward shaping and human prior, we could get to rocket launch with a similar level of compute as OAI5.
- FLE: Curriculum-based code generation promises LLM probes: Members find FLEâs current setup using code generation, production score feedback, a REPL loop, and memory compression is a useful scaffolding that reduces the action space and induces structure and planning for LLMs in Factorio.
- A member suggested a curriculum-based code generation approach, guided by mini-goals and a theory of mind module within the FLE loop, seems like a promising way to probe the limits of LLM planning in this environment, like minedojoâs voyager and mindforge.
- Team explores Hierarchical LLM+RL hybrid system: The team explored a hybrid setup where compositional planning is handed off to the RL loop, calling reusable LLM stubs for concrete implementation, handing medium- to long-horizon planning over symbolic base design primitives to the RL loop, while the LLM handles implementation details.
- One member suggested that HLP and LLP makes sense, given LLMs have that human prior knowledge multiplier skill where we can jump in and out of levels, allowing things to have more HLPs now, and that skillset procedures would be hard to compose because they would explode combinatorially.
- FLE API gets REST API for container Integration: A member integrated a REST API inside the Factorio container, choosing C# for the server because it compiles down to machine code and the container doesnât need dependencies other than the two binaries.
- Thereâs an issue to have a discussion about actions: which do we need and what should they be called on GitHub.
connect_entities
eases Factorio agent development:connect_entities
prevents the agent from explicitly designing the route of belts/poles/pipes, but removing it makes agents totally incompetent.- A member suggested that it would be better to figure out how to make
connect_entities
more configurable for an agent, rather than removing it entirely.
- A member suggested that it would be better to figure out how to make
GPU MODE â· #amd-competition (1 messages):
AMD GPU, Image Analysis
- Image Analysis Incoming: A user sent a potentially official image related to AMD GPUs.
- The image was sent just in case it was relevant, but no further context was provided.
- AMD GPU Speculation: The image is speculated to contain details about an upcoming AMD GPU, potentially related to competitive positioning.
- Without further context, the exact significance of the image remains unclear, but it hints at internal documentation or marketing material.
GPU MODE â· #cutlass (1 messages):
BLISRetreat2023, UTexas presentation
- Seeking Slides Video: A member inquired about a video accompanying the presentation slides from BLISRetreat2023 at the University of Texas.
- University Presentation: The presentation discusses topics related to BLIS.
Manus.im Discord â· #general (196 messagesđ„đ„):
Manus credits, Manus speed, Minimax copy of Manus, Manus AI updates, Manus Agent mode
- Minimax copies Manus Functionality: Members pointed out that agent.minimax.io has copied Manus, and noted that it had serious potential before credits were announced but the stupid pricing ruined it.
- Users complain of Manus credit-eating errors: Users are reporting that Manus is eating credits due to its own errors, one stated that it ate all my credits all 4k over its own errors.
- There were complaints of it using 700/1000 credits to deliver a blackscreen website.
- Free Lume alternative to Manus being shilled: Members discussed lume.im as an alternative to Manus.
- A user promoted it as free and unlimited, resulting in accusations of shilling and spam.
- Gemini outcompetes Manus in specific tasks: A member stated that Manus couldnât do it, but Gemini could and shared a link to a Gemini output.
- The same user also stated, Gemini is the best static canvas currently. Manus isnât static, so we cant combine those.
- Users report slowness and task failures with Manus: Users are complaining that Manus is slow, doesnât follow instructions, and new updates have made it worse, in addition to simple compiling documents taking 40 minutes and burning 400+ credits.
Nous Research AI â· #general (112 messagesđ„đ„):
Decentralized Pre-training, Hermes 4 Training, Bandwidth Differentials, AI Evals company, Multilingual reasoning in AI
- Nous launches Psyche for pretraining: Nous Research is doing pretraining on psyche.network while Jensen Huang roasted Anthropic on pre-training versus post-training.
- A member noted that distributed stuff is only gonna get better from here and will benefit decentralized training.
- Dawn Internet plugs decentralised broadband: Dawn Internet is a decentralised broadband protocol providing gigabit internet using fixed wireless rooftop antennas.
- Their new WiFi router includes a GPU capable of supporting RL.
- Nous to commence training Hermes 4: Nous Research will begin training Hermes 4 on Monday, though it will still take a minute to train and prepare.
- The new model of the Zeus series will not be based on the old Hermes, but on the newest Mistral.
- Atropos RL environments works with Axolotl: Atropos RL environments works with axolotl right now (which uses TRL) and a member is working on VERL integration, according to discussion in Discord.
- A member states that atropos is very good and shared Atroposâs Readme file for more.
- Kimi-Dev-72B releases open-source coding LLM: MoonshotAI introduces Kimi-Dev-72B, a new open-source coding LLM for software engineering tasks achieving a new state-of-the-art on SWE-bench Verified among open-source models with a score of 60.4%.
- Kimi-Dev-72B is optimized via large-scale reinforcement learning, autonomously patching real repositories in Docker and gains rewards only when the entire test suite passes, aligning with real-world development standards.
Nous Research AI â· #ask-about-llms (8 messagesđ„):
Gemini 2.5 Pro, Chain of Thought (CoT) prompting, Reasoning Techniques, API key setup, Hyperbolic integration
- Gemini 2.5 Pro Deployed on AI Studio: A member confirmed using Gemini 2.5 Pro on AI Studio and inquired about methods to force the model to use Chain of Thought (CoT) in long chats.
- The user noted that CoT is not consistently triggered.
- Reasoning Techniques to Enhance Model Performance: A member suggested prompting the model to use reasoning techniques such as neurosymbolic, counterfactual, inductive, and deductive.
- They advised explicitly instructing the model how to think and inputting keywords like Alternatively, consequentially, and due to to guide the reasoning process.
- API Key Setup Explained: A member provided guidance on setting up the API key and settings, referring to a specific gear icon location.
- An image was attached to further illustrate the process (image.png).
- Hyperbolic Connection Issues: A member reported having trouble connecting Gemini 2.5 Pro to Hyperbolic despite completing the API key setup.
- The discussion centered on troubleshooting the integration process.
Nous Research AI â· #research-papers (19 messagesđ„):
Bitter Lesson, Generalist vs SME, Grounding in reality, Gene edits to cure cancer
- Bitter Lesson Summary: Discussion around the Bitter Lesson essay by Rich Sutton, which discusses how innovations scale with Mooreâs Law rather than human alignment.
- It was noted that the essay requires the assumption that reality is ultimately computational and that discovery = observable human reality.
- Cancer Cure Paradox: A member mentioned the issue of grounding in reality using a hypothetical example of a model discovering gene edits that cure cancer without understanding how.
- They pointed out the risk of unintended consequences, such as causing ALS, if the mechanism of the cure is not understood, and pointed to this research.
Nous Research AI â· #interesting-links (8 messagesđ„):
WebSummit talk on closed internet/AI, Robotic Skin, Deep Residual Learning
- Closed Internet Rant at WebSummit: A member shared a talk given at WebSummit in Vancouver about the closed internet and closed AI, half history, half rant.
- It was cross-posted on FXTwitter by another user.
- Crazy Robotic Skin from Cambridge: A member posted about robotic skin from Cambridge, also linking to a YouTube video.
- The skin seems to be made from a stretchable matrix with embedded sensors.
- Deep Residual Learning Paper: A user shared a link to the Deep Residual Learning paper from CVPR 2016.
- The paper abstract link leads to a non-existent arXiv ID.
Nous Research AI â· #research-papers (19 messagesđ„):
Bitter Lesson, Generalist vs SMEs, Nature Article, Arxiv Paper, Observable Reality
- Bitter Lesson: Scaling Beats Human Alignment: Discussion around Rich Suttonâs âBitter Lessonâ essay highlights how the largest innovations are due to scaling with Mooreâs Law rather than aligning with humanity.
- The essay suggests innovations scale, but understanding them requires observation, otherwise we risk systems we will never understand, as in gene edits that cure cancer but also cause unexpected side effects because we donât understand the mechanism.
- Nature Article deemed âuselessâ by some: A member shared a Nature article but commented âliterally donât, all my chinese friends say itâs uselessâ, without further context.
- New Arxiv Paper Posted: A member posted a new arxiv paper and another arxiv paper with no other context.
aider (Paul Gauthier) â· #general (135 messagesđ„đ„):
VS Code forks, TUI, RA-Aid, Context Window Management, LLM Personas
- Electron Apps Valued Highly: A user joked about forking an electron app instead of building a TUI, referencing the $9B valuation of VS Code forks.
- A member agreed itâs an over engineered solution but they pointed out that everyone coming out of college, and some going into it are going to have contact with VS Code.
- RA-Aid Aider Integration Clarified: After some poking around with RA-Aid, a member found that Aider has a clear benefit with its repo map, and letting user add files to context.
- However they did mention Aider spent 5 cents doing seemingly nothing with 32K tokens which shocked me, then did a brute force grep of the codebase.
- Systematic Persona Evaluation Done Right: A user linked an Arxiv paper arguing that adding personas in system prompts does not improve model performance across a range of questions compared to the control setting where no persona is added.
- They felt like that was the case for a long time but was wondering if there is an actual research that backs this up.
- New Brainstorming UX Features Generated: A user prompted DeepSeek to generate different tiers of features for Realistic, Outside the Box and Completely Bonkers.
- The Completely Bonkers tier included suggestions like Anti-Gravity Code Reflow and Multiverse Branching.
- Aider Context Window Feature Requested: A user asked if it would be possible to add a feature to Aider that allows it to manage the context window on its own, not just adding files to it, but also removing or cleaning the context window as needed.
aider (Paul Gauthier) â· #questions-and-tips (18 messagesđ„):
LLM-OpenAPI-minifier integration with aider, Setting API keys within Aider, Aider's agentic capabilities, Loading active parameters in VRAM for MoE models like Qwen3
- Seeking Aider Integration with LLM-OpenAPI-minifier: A member inquired about using LLM-OpenAPI-minifier to integrate application interfaces with Aider.
- Requesting In-Program API Key Setting for Aider: A member asked how to set an API key within Aider itself, noting the absence of such a feature in the documentation, and another member suggested a
llm keys set anthropic xxxx
style command pattern for setting keys.- A member asked if this feature was on the roadmap and whether a rookie could contribute a PR for it, referencing Simon Willisonâs
llm
tool as inspiration.
- A member asked if this feature was on the roadmap and whether a rookie could contribute a PR for it, referencing Simon Willisonâs
- Confirming Aiderâs Limited Agentic Functionality: A member questioned whether Aider is fully agentic, as they were unable to make it work as an agent or modify code or run commands, leading another member to clarify that Aider is not really agentic but the
/run
command exists for limited use.- They mentioned a personal project called gitmind that attempted this but was later abandoned.
- Proposing Selective VRAM Loading for Qwen3 MoE: A member asked if itâs possible to load only active parameters in VRAM when running models like Qwen3 30B MoE, aiming to use Q8 without significant speed degradation on a 3090 GPU.
- They clarified that they wanted to avoid loading parameters unnecessary for a specific prompt (e.g., grammar layers when focusing on coding).
Latent Space â· #ai-general-chat (136 messagesđ„đ„):
Claude Swarm for team management, Proactive AI agents definition, Anthropic's multi-agent system, LLM as judge for evaluations, Cursor for writers AI tool alternatives
- Claude Swarm swarms Shopify with Team MCP: Claude Swarm, a tool for setting up hierarchical teams of experts using Claude Codeâs MCP capabilities, is gaining traction at Shopify and other companies (code here).
- A user suggests making one of the experts a recruiter to manage the swarm configuration and team expansion.
- IMPACT framework challenges Proactive AI agent definitions: A blogpost defines proactive AI agents as entities that control their wake schedules, workflows, have persistent memory, and use stateful tools (substack post).
- swyxio ran this definition against his IMPACT framework and noted it lacks intent, planning, and authorization.
- Anthropic Multi-Agent System Outperforms Opus 4: Anthropic found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on internal research eval (Anthropic blogpost).
- The system uses about 15x more tokens than chats due to parallelization and extensive tool use, requiring prompt engineering to prevent excessive agent spawning and web scouring for nonexistent sources; LLMs are also used to evaluate the outputs.
- Obsidian Copilot offers Obsidian markdown writerâs cursor: Users discussed tools for working with markdown files with AI assistance, proposing Obsidian Copilot as an option (Obsidian Copilot).
- Users desire functionality beyond simple chat, such as breaking notes by topic, tagging, aggregating notes, and creating flashcards with Anki MCP.
- Moonshot AI launches Kimi-Dev-72B model: Moonshot AI has open-sourced their Kimi-Dev-72B model, achieving a State-of-the-Art (SotA) result of 60.4% on SWE-bench Verified among open-source models (HF model).
- The announcement was made by Aran Komatsuzaki on Twitter, with links provided to both the Hugging Face model and the GitHub repository.
Eleuther â· #general (75 messagesđ„đ„):
Landmark Papers in new field, LLMs as narrative simulators, AI-generated papers with mathematical errors, pytorch dataloader workers, EleutherAI community vs research focus
- Sniffing out Seminal Studies: New Field Navigation: A member asked about how to find landmark papers when diving into a new field, specifically video generation with existing knowledge of image generation.
- Suggestions included browsing professorâs lecture notes, asking informed individuals directly, and starting with recent papers from major labs to leverage their citations, also see this discussion on LessWrong .
- LLMs as Narrative Navigators, Simulating Stories: A member inquired about posting their English 101 paper, examining LLMs as narrative simulators due to their architecture and emergent behavior.
- The request was denied as it was a request for reviews, but a link to The Void post on tumblr was shared with related analyses and examples.
- Math Mishaps Mar AI-Made Manuscripts: A warning was issued regarding AI-generated papers, citing an alleged (dubious) response to an Apple paper riddled with mathematical errors from arxiv.org.
- One member shared a link to this tweet pointing out the errors.
- Pytorch Dataloader Doom: WSL Workersâ Woes: A user reported issues with PyTorch dataloader workers being killed by signal in WSL, particularly with high worker counts and long sequence lengths.
- It was suggested that they check
/var/log/syslog
for potential OOM errors, and be more careful about memory usage when processing long video sequences.
- It was suggested that they check
- Eleutherâs Ethos: Balancing Novices and Research Nucleus: Concerns were raised about the discordâs perceived mixed messages to newcomers, contrasting the welcoming web copy with the research-focused interactions.
- Community members discussed the balance between welcoming newcomers and maintaining a research-level discussion, emphasizing the distinction between AI education and teaching research skills, as well as discussing LLM SEO (Language Engine Optimization).
Eleuther â· #research (50 messagesđ„):
DMCA and Copyright Law, Emergent behavior in independent tasks, Llama-3.2-1B-Instruct ARC-AGI, Qwen3 tokenizer and image understanding
- Copyright Law is a Joke: A user stated that copyright law is a joke unless youâre the abuser, commenting on DMCA and copyfraud penalties, with a link to fxtwitter.com and arxiv.org.
- Independent tasks lead to emergent behavior: Papers suggest that independent tasks X and Y may exhibit emergent behavior on the combined task âX and Yâ, prompting a search for papers meaningfully exploring this phenomenon, with members linking to arxiv.org.
- Llama-3.2-1B-Instruct scores 72.5% on ARC-AGI: Llama-3.2-1B-Instruct achieved 72.5% on ARC-AGI, but the test was curated from a subset of 11 training and 8 evaluation tasks solvable under optimal TTT configurations.
- Qwen3 gets raw bytes tokenizer and image patches: A member is using the FAFO method of taking Qwen3 at various sizes (1.7b, 4b, and 8b) and doing simple SFT when switching the tokenizer to raw bytes and adding image understanding with the Fuyu method, projecting image patches into the token stream, using the LLaVAR-Instruct-16K dataset.
Eleuther â· #interpretability-general (1 messages):
LLM Fairness, Interpretability Interventions, Unfaithful Chain of Thought
- Realistic Details Trigger LLM Bias: A new paper reveals that adding realistic details to bias evaluations triggers race and gender bias in LLMs, with up to 12% difference in interview rates even in models like GPT4o and Claude 4 Sonnet.
- These details include company names, culture descriptions from careers pages, or constraints like âonly accept top 10%â, thereby exacerbating bias.
- Interpretability Fixes Fairness Flaws: While prompt tuning fails, interpretability-based interventions, such as affine concept editing/ablation of race/gender directions, reduce bias, typically to below 1%.
- The research paper highlights that such targeted interventions effectively mitigate the identified biases.
- LLMs Exhibit Unfaithful Chain of Thought: The study found that inspecting Chain of Thought (CoT) in LLMs gives no indication of race/gender bias, despite outcomes showing clear bias.
- This demonstrates an instance of unfaithful chain of thought in the wild, where the reasoning process masks underlying biases.
Eleuther â· #lm-thunderdome (5 messages):
Benchmark Evaluation Algorithm, Inspect Standard Format, Eval Coalition Effort
- Algorithm Pool Evaluates Models: A member envisioned an algorithm + pool system for benchmarking, allowing new benchmarks to be added and models to be evaluated dynamically.
- The pool would select examples to test against, addressing benchmark saturation and enabling comparison of models with different capabilities, although the specifics are still evolving.
- Inspect Standardizes Evaluation Results: A member mentioned that Inspect includes a standard format for storing evaluation results, potentially covering evaluation inputs, outputs, metrics, and metadata.
- They asked what specific aspects were not covered by Inspectâs standardization, prompting further discussion on the toolâs capabilities.
- Evals Coalition Seeks Scaled Implementation: A member expressed hope to join the Eval coalition effort, starting with scaled implementation of current benchmarks and evaluations in an automated setting.
- Another member confirmed that they should be added to the evaleval Slack soon and welcomed their input as the effort is still in the early exploratory stage.
Eleuther â· #gpt-neox-dev (1 messages):
Vitabyte Founder, GroK-scale Training, Multi-node LLM fine-tuning, ROCm + CUDA, Full stack Ops
- Vitabyte Founder Seeks Grok-Scale Projects: George, founder of Vitabyte/Vitapay, is seeking to join any Grok-scale (314B) or multi-node LLM training/fine-tuning projects.
- He brings experience with ROCm + CUDA setups, quantization, and full stack ops, offering contributions in infra, logs, tuning flows, and documentation.
- Vitabyte Founder Skills: George brings experience with ROCm + CUDA setups, quantization, and full stack ops.
- George offers contributions in infra, logs, tuning flows, and documentation.
Notebook LM â· #use-cases (19 messagesđ„):
Notebook LM Plus Access, PM Interview Conversational AI Platform, Exam Prep with NotebookLM, Chrome Extension for NotebookLM, Podcast Personality Shaping
- Notebook LM Plus Access Still a Question: A user of paid AI Pro inquired about not having access to NotebookLM Plus, despite their subscription, and mentioned using 1900 sources.
- The user also shared a NotebookLM Mind Map, noting it has 4 sublevels and is vertically dense, but not yet horizontally, with a size around 9MB.
- AI Platform Aims to Ace PM Interviews: A member is developing a Conversational AI platform designed for PM interviews and is seeking beta users to validate their idea and provide feedback.
- Interested users can sign up through this form to be added to the waitlist.
- NotebookLM Tackles Exam Prep: A user asked for advice on using NotebookLM to prepare for an exam with material that isnât in PDF format, consisting of web pages and virtual labs.
- Another user suggested using Chrome extensions to follow and import links from web pages into NotebookLM.
- Podcast Hosts Seek Personality Shaping Strategies: A member is delving into shaping the personality of their NotebookLM podcast hosts and is seeking to exchange experiences with others.
- Another user inquired about strategies and apps for publishing episodes to Spotify.
- Flattening Websites into Single Source for Notebooks gains traction: A member proposed creating a flattened version of a websiteâa single page containing all content without linksâto easily feed it as a single source into NotebookLM.
- Another user suggested using the Web Sync tool, accessible via this article.
Notebook LM â· #general (86 messagesđ„đ„):
LaTeX in NLM, Image Uploading, Android App of NotebookLM, Podcast issues, Mindmaps on iPad
- NLM Embraces LaTeX Markups!: NotebookLM uses LaTeX markups, as do other LLMs, for math and scientific equations.
- To view these equations, users can use online or offline LaTeX renderers or try the LatexInNotebooklm extension.
- Image Uploading Issues Resolved!: Users found that NotebookLM now supports image uploads directly from the device, not from Google Drive.
- To upload images, users can click the choose file option or by dragging.
- Android App Appreciation Surges!: Users are praising the convenience of the NotebookLM Android app, particularly for listening to deep dives.
- However, it was mentioned that for full functionality like choosing the length of podcasts itâs better to use the website.
- Podcast Audio Quality Takes a Dive!: Users noticed a decline in the audio quality and content of NotebookLM podcasts, with robotic and repetitive framing of the âsource material.â
- The issue affected generated podcasts, and was described as sounding broken and fake.
- Mind Maps Vanish on iPad!: Users reported that mind maps are not visible in the iPad app.
- Users are waiting for the ability to save mindmaps in a format that works as interactive objects rather than an image.
Torchtune â· #dev (69 messagesđ„đ„):
DTensor cross-mesh operation, Llama4 maverick finetuning, Iterable packing, Fused optimizer, Flex attention
- DTensor distress during distributed Llama4: Members encountered a
RuntimeError
related to DTensor cross-mesh operations during multi-node distributed finetuning of Llama4 Maverick on the latest nightly builds, specifically withaten.mm.default
and differing device meshes.- The error manifested differently with varying numbers of nodes (8 vs 12), pointing to potential issues with the fused optimizer and mesh configurations, stack trace available in output.log.
- Iterable packing innovation inbound: A member is developing a private finetuning library with iterable packing, built on top of pytorch/data, showing great results and prefetching capabilities.
- They suggested that a separate dataset wrapper might not be needed and that the main overhead is from tokenization, expecting to opensource the library next week and also highlighting that packed DPO is missing in many libraries.
- Fused Optimizer Flounders on Full Finetune?: During attempts to train, the fused optimizer was found to cause issues, particularly with checkpoint creation resulting in
nccl
timeouts, whereas the non-fused optimizer allowed training on 8 nodes.- It was suggested that increasing the
NCCL_TIMEOUT
environment variable, or settingtotal_epochs=self.total_epochs+1
to enable asynchronous checkpoints, might mitigate these issues, while creating a minimal reproducible example for the optimizer issue was also recommended.
- It was suggested that increasing the
- Mini-Batch Musings Meet MoE Memory Mastery?: A member speculated whether using a micro batch size of 1 could reduce the memory requirements for training a Mixture of Experts (MoE) model, by only needing the memory for the active parameters.
- The idea was proposed as a way to train very large models by offloading gradient accumulation to CPU RAM, however another member pointed out that the micro batch size is really
seq_len
as you still need all experts for training.
- The idea was proposed as a way to train very large models by offloading gradient accumulation to CPU RAM, however another member pointed out that the micro batch size is really
- Flexing Attention with Flashy Nesting?: Members discussed forcing packed batches to be of size 1 for simplicity, and its ties to flex attention, where one member saw a performance increase to 10k TPS vs 2k TPS for non-flex attention.
- The members suggest using SDPA + flashattention 3, but then your tensors have to be nested tensors (using
torch.nested
with jagged layout), while pointing out that many ops are missing when using nested tensors.
- The members suggest using SDPA + flashattention 3, but then your tensors have to be nested tensors (using
Torchtune â· #papers (15 messagesđ„):
Mistral Small, Magistral model, ZO optimizer, Flex integration
- Mistral Small Debuts, Disappoints?: Despite its recent release, the Mistral Small model isnât impressing everyone, with one member saying the mistral small results, even on their own blogposts look barely better than Gemma 3
qwen3.- The member also clarified that they had initially misclicked on Magistral instead of Mistral while researching.
- ZO Optimizer Promises VRAM Savings: Members discussed the ZO optimizer and its potential for 3x VRAM economy, referencing a paper on the topic (arxiv.org/abs/2506.044303).
- One member found it amazing that ZO even works at all, while another suggested adding it to Flex.
- Flex Integration a low priority: A member suggested to include ZO to Flex for its 3x VRAM economy but another user responded with I wouldnât prioritize it, but eventually sure.
- Members agreed that the most important takeaway from the ZO paper is its scalability on different sizes and its use of mostly non-synthetic experiments.
Modular (Mojo đ„) â· #general (46 messagesđ„):
RDNA4 support, AVX512_BF16, Zen 4, Mojo testing structure, 1-bit model support
- RDNA4 Support is here!: As of the last nightly, RDNA4 is supported for direct GPU programming in Mojo, but full models arenât quite there yet as the matrix multiplication operations need RDNA-specific paths put in place.
- An introductory patch to add some of the WMMA operations necessary for this has been added, bringing models closer to being fully functional on RDNA3+.
- Zen 4 CPUs for bfloat16 Support: While the 5950x does not support AVX512_BF16, Zen 4 and above CPUs, such as the Ryzen 7000 series, do offer some bfloat16 support.
- However, it is not confirmed whether these include the exact FMA instructions needed for CPU inference.
- Navigating Mojoâs Testing Codebase: Users expressed frustration with Mojoâs testing codebase structure, particularly regarding imports within test files and understanding the package init.mojo hierarchy.
- A major revelation was realizing that running with
mojo test -I .
allows tests to import the package being tested as if it was a library; one user suggested looking at ExtraMojo as a good project structure example.
- A major revelation was realizing that running with
- LLVM takes up most binary size: Most of the binary size is taken up by statically linking LLVM, with MAX on its own being around 750 MB, and the .mojopkgs shipped with MAX being about 100 MB.
- There is active work to reduce the number of copies of LLVM.
- Intel Nova Lake to have 52 Cores?: The next âcompile skuâ is likely to be Intel Nova Lake, since itâs likely to have 52 cores on the top sku.
- The i9 is the one which will likely have that many cores, while HEDT for Intel is buy a xeon.
Modular (Mojo đ„) â· #mojo (32 messagesđ„):
CUDA Stream Synchronization, Mojo C ABI, Mojo Zed Extension, Mojo 'let' deprecation, Mojo AOT compilation
- Host Synchronization Unnecessary for CUDA Streams?: A member questioned whether
ctx.synchronize()
is necessary in Puzzle 12, suggesting CUDA streams handle synchronization automatically for dependent kernel launches.- A Modular team member confirmed that DeviceContext uses a CUDA stream, so the execution order matches the call order and no explicit sync is required, promising to adjust the documentation accordingly.
- **Mojo Calls C with **
external_call
****: A member asked about calling into C from Mojo, seeking examples in the documentation.- Another member pointed out the use of the
external_call
function for Mojo to C interoperation.
- Another member pointed out the use of the
- Mojo Zed Extension Still Functional: A user reported that the Mojo extension for Zed is working well, inquiring about future updates.
- The extension developer confirmed theyâre working more with Mojo and asked for specific feature requests, but there is an issue regarding unnecessary highlighting of unused variables.
let
Declaration Laid to Rest in Mojo: A new Mojo learner inquired about the deprecatedlet
variable declaration, encountering errors in tutorials.- A team member confirmed
let
was removed in the 24.4 changelog, noting that most Mojo tutorials are becoming outdated quickly, but the official proposal is here.
- A team member confirmed
- Mojoâs AOT Compilation Discussed: A member asked which aspects of Mojo are JIT versus AOT compiled, particularly regarding SIMD and runtime statistics.
- A member clarified that CPU code is AOT compiled unless inside a kernel for MAX, while GPU code uses a JIT compiler due to the need for driver-specific optimizations, the autotune library that existed was removed because it massively bloated compile times.
MCP (Glama) â· #general (40 messagesđ„):
MCP in Agentic Frameworks, A2A Agent Discovery, FastMCP and Server Composition, GitHub APIs in MCP Server, Orchestrator Agent Recommendations
- Agentic Frameworks Embrace MCPs: An agent questioned where MCPs fit into an agentic framework, considering the top layer as the orchestrator agent, followed by specific agents accessing multiple MCP servers for tool selection and memory storage.
- One member suggested using smarter hosts with tool reranking.
- FastMCP Mounts Domain Segregation Subservers: A member mentioned that fastmcp can mount MCP servers, allowing a router server to host subservers for domain segregation.
- The team developing a single MCP server exposing all GitHub APIs in one place is exploring the idea of an orchestration server that can invoke or proxy to other MCP servers, as well as weighing performance as the number of tools grows.
- LLM Selection Done by Client: LLM selection is done by the client and the model used depends entirely on the client or app consuming the server.
- The MCP team is figuring out how to optimize and they encourage checking out the code here: GitHub MCP Server.
- Opus Orchestrates Cursor: For those using Cursor, Opus was recommended as an orchestrator agent, although its cost was noted.
- One person preferred a local one.
- Streamable HTTP needs full URL: A member helped someone resolve a connection error with fastmcp by pointing out that the full URL with
/mcp/
is required for streamable-http.- The default streamable-http port is 8000, not 6277.
MCP (Glama) â· #showcase (7 messages):
SchemaPin for Rug Pulls, Glama MCP servers support streamable HTTP, excel-mcp-server, User analytics and live debugging for MCPs
- SchemaPin Prevents MCP Rug Pulls: A member built SchemaPin to prevent MCP Rug Pulls and similar attacks, the repo is available on GitHub.
- The homepage has easy ways to implement SchemaPin.
- Streamable HTTP Launched on All Glama MCP Servers: All Glama MCP servers now support streamable HTTP e.g., glama.ai/mcp/instances/svuec7nlpl/mcp?token=f6830a11-ded3-4492-8fb0-09eb09b08257.
- Excel MCP Server Trending on GitHub: A member shared their repo, excel-mcp-server, after it trended twice on GitHub.
- They welcome any and all feedback on the project.
- Debug your MCP with MCPCat: A member is working on user analytics and live debugging for MCPs, with the repo available here.
Cohere â· #đ§”-general-thread (20 messagesđ„):
Cohere documentation typo, Team collaboration with LLMs, AI/backend developer introduction, Cohere's work with the government, Secure ML and privacy preservation
- Typo Troubles: Cohere Docs Fixed!: A user reported a typo in the Cohere documentation where
co = cohere.SagemakerClient()
should have a lowercasem
. - LLM Teamwork: How Teams Collaborate!: A user is researching how teams are integrating large language models like ChatGPT and Claude into their daily workflows, inquiring about changes and missing elements since their introduction.
- AI Developer Kira: Joins the Chat!: Kira, an AI/backend developer, introduced themselves, expressing excitement to connect and build cool stuff, focusing on custom bots, automations, and scalable systems.
- Government Giggle: Cohereâs Public Sector Work!: A user shared a Carney news video highlighting Cohereâs work with the government, expressing it must have been a huge honor.
- Privacy Pal Yasir: Secure ML Enthusiast!: Yasir Khan, a Computer Science graduate, introduced themself, mentioning work on secure machine learning and privacy-preservation, seeking connections for collaboration on AI/ML projects.
Cohere â· #đ-api-discussions (5 messages):
direct-injected-document tool, Cohere a032025 memory usage
- âDirect-injected-documentâ Tool Surfaces Sporadically: Some users reported that a tool named direct-injected-document pops up as an answer sporadically.
- A member asked for a prompt example and which model was being used.
- Cohere a032025 hosting needs: A user inquired about the memory requirements for hosting Cohere a032025.
Cohere â· #đ-introduce-yourself (6 messages):
AI developers introductions, Custom bots, Automations, Scalable systems, Secure machine learning
- AI Developers assemble!: An AI/backend developer named Kira introduced herself, offering help to startups in building custom bots, automations & scalable systems.
- She expressed excitement to connect & build cool stuff with others.
- Secure ML & Privacy Guru Seeks Collabs: Yasir Khan, a Computer Science graduate, introduced himself, highlighting his work on Secure machine learning and privacy-preservation.
- He expressed interest in connecting with friends having similar interests and collaborating on AI/ML projects to enhance his expertise.
- Machine Translation Maestro Materializes: Joel, a Computer Science student from the Philippines, introduced himself as doing research on improving Machine Translation and LLMs for the Filipino language.
- He said heâs here to look around, see cool stuff and possibly meet cool people too.
- Ollama models gain a new fan: A new person in the AI world expressed their enjoyment in playing with models from ollama.
- They expressed that its fun.
LlamaIndex â· #blog (3 messages):
Data + AI Summit 2025, Agentic Document Workflows, Multi-Agent System, AI Travel Agents, AI Agents in Production
- Data + AI Summit 2025 Concludes: The @databricks Data + AI Summit 2025 has concluded, with more content on the emerging landscape of agentic document workflows to come, learn more here.
- The CEO @jerryjliu0 gave a standing-room-only talk.
- Microsoftâs AI Travel Agents Demo: @microsoftâs new AI Travel Agents demo shows how to coordinate multiple AI agents using the Model Context Protocol, LlamaIndex.TS, and @Azure AI Foundry for complex travel planning scenarios.
- Six specialized AI agents work together, learn more here.
- Build and Secure AI Agents in Production: Join an evening in San Francisco for expert insights on building and securing AI Agents in production, covering best practices here.
- Our VP of Developer Relations @seldo will be presenting Building Real-World Agents alongside industry experts from Ravenna and @auth0.
LlamaIndex â· #general (26 messagesđ„):
LandingAI vision agent vs LlamaIndex, Synk hiring, Faiss Index, LlamaCloud contact sales page, LlamaExtract parsing errors
- LandingAI vs LlamaIndex Document Understanding: Members discussed a new vision agent document understanding tool developed by LandingAI, a company started by Dr. Andrew Ng, with one member asking for a comparison against Llama Parse in light of a previous post comparing it to Mistral.
- The companyâs tool can be found at LandingAIâs website.
- Synk Recruiters Seek Developers: A member announced that Synk is hiring developers (back-end, Front-end, blockchain), a QA Engineer, a DevOps Engineer, Moderators, and a Marketing Analyst for their decentralized browser system project, directing users to Synkâs X page.
- They offer official employment with signed documentation, guaranteed salary, and a flexible schedule.
- Faiss Index Filtering Still Unsupported: A member inquired about the possibility of doing metadata filtering on Faiss index queries.
- Another member responded that Faiss doesnât support it.
- LlamaCloud Contact Page Fails: A member reported that the contact sales page on llamacloud (https://cloud.llamaindex.ai/contact-sales) was not working due to a 500 internal server error.
- Another member asked if they were instead referring to the LlamaIndex contact page.
- LlamaExtract Glitches Cause Parsing Errors: Several members reported experiencing parsing errors on every document they tried to run in LlamaExtract, with no data being extracted.
- A member suggested trying again, noting that they were receiving data, and included a screenshot of a successful extraction using LlamaExtract (image.png).
DSPy â· #show-and-tell (1 messages):
DSPy Optimization Patterns
- Thoughts On Incorporating DSPy Optimization Patterns Requested: A member asked about thoughts on how to incorporate any of the optimization patterns that exist in DSPy.
- Filler Topic for JSON Validation: This is a filler topic to ensure the JSON has at least two elements in topicSummaries.
DSPy â· #general (19 messagesđ„):
DSPy runners, TextGrad Optimizer, Custom LM Concurrency, DAIS Session Write-Up, BootstrapFewShot Optimizer
- Run DSPy anywhere with JSON definitions: A member is thinking about building DSPy ârunnersâ that take the saved JSON definition and runs the compiled program, enabling cross-language functionality, like Swift leveraging a compiled program via a managed API.
- Another member expressed interest but questioned how program logic not captured in the JSON output (like signature and module) would be handled, pondering how a program could be serialized.
- TextGrad optimizer awaiting updates: A member inquired about updates on adding TextGrad as an optimizer for DSPy, referencing issue #1197 on GitHub which has been open for nearly a year.
- The member expressed enthusiasm for TextGrad due to its effectiveness in optimizing complex prompts and asked if anyone had âhacksâ for incorporating it into DSPy.
- Model writes prompts at DAIS session: A member shared a write-up of their session at DAIS this week, titled âLet the Model Write the Promptâ, available on their website.
- In a follow-up, a member inquired about a recording of the session, to which the first member replied with a YouTube link.
- DeepSeek R1 7B Struggles with DSPy Optimization: A member reported suboptimal optimization results using DeepSeek R1 7B in a DSPy-Text2SQL demo, compared to GPT-4o-mini and sought suggestions for improvement following attempts with LabeledFewShot and BootstrapFewShotWithRandomSearch.
- Another member suggested that providing more information about the schema could potentially enhance the performance of DeepSeek R1 7B.
- BootstrapFewShot Optimizerâs Use Cases: A member sought to understand how BootstrapFewShot optimizer works, particularly for classification use cases, questioning the handling of ground truth for bootstrapped inputs.
- Another member explained that one can use anything as a metric as long as it returns a bool, an int or a float (and higher is better).
LLM Agents (Berkeley MOOC) â· #mooc-questions (8 messagesđ„):
Certificates, Assignment Selection, MOOC Quiz Archive
- Certificates Coming Mid-July!: A user inquired about the distribution of certificates, to which a member responded that certificates will be released in mid July.
- Assignments Passed with Reasonable Effort: A user questioned how to know if they were selected for a certificate or if they passed their assignments.
- A member clarified that email confirmations are sent for each assignment submitted via Google Forms, and as long as everything is completed with reasonable effort, a certificate will be granted.
- MOOC Quiz Archive Linked: A member shared the Spring 2025 MOOC quiz archive, also available on the course website in the Quizzes section.
MLOps @Chipro â· #events (1 messages):
ControlThrive, Outerbounds, ML Consulting
- ControlThrive founder greets community: Servando, the founder of the AI/ML consulting practice ControlThrive controlthrive.com, introduced himself to the community.
- He invited members to connect with him on LinkedIn or X.
- Outerbounds event coming up: Servando announced an upcoming event he is hosting with Eddie from Outerbounds (the team behind the ML infra at Netflix).
- He shared a link to the event and encouraged community members to join.
Codeium (Windsurf) â· #announcements (1 messages):
Claude Sonnet 4 API Access, Anthropic models, API Pricing
- Claude Sonnet 4 models launch: Claude Sonnet 4 and Claude Sonnet 4 (Thinking) are now available to all paid plans via API Pricing.
- Mohanâs hot take on Claude: Mohan retweeted some impressions of Claude on X.