A nice win for open models.

AI News for 10/24/2025-10/27/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (198 channels, and 14738 messages) for you. Estimated reading time saved (at 200wpm): 1120 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

4 months after MiniMax M1, Hailuo AI is back with MiniMax M2 (free chatbot, weights, github, docs) with some impressive, but measured claims: a very high 23x sparsity (Qwen-Next still beats it) and SOTA-for-Open-Source performance:

Bar graph showing the Artificial Analysis Intelligence Index v3.0 with various AI models and their performance scores, with MiniMax M2

There are some hairs - it is a very verbose model and there was no tech report this time, but overall this is a very impressive model launch that comes clsoe to the frontier closed models under a very comprehensive set of benchmarks.

Bar graph showing performance benchmarks of various AI models across different tasks, with MiniMax M2 highlighted in red and compared against other models like

AI Twitter Recap

MiniMax M2 open-weights release: sparse MoE for coding/agents, strong evals, and architecture clarifications

MiniMax M2 (open weights, MIT): MiniMax released M2, a sparse MoE model reported as ≈200–230B total with 10B active parameters, positioned as “Agent & Code Native.” The model is temporarily free via API, priced at “8% of Claude Sonnet” and ~2x faster per MiniMax, and licensed MIT. It’s day-0 supported in vLLM and generally available on Hugging Face, ModelScope, OpenRouter, Baseten, Cline, and more. See announcement and availability: @MiniMax__AI, @vllm_project, @reach_vb, @ArtificialAnlys, @MiniMax__AI, @QuixiAI, @basetenco, @cline, @_akhaliq.
Benchmarks and cost profile: On the Artificial Analysis index, M2 hits the new “all-time high” for open weights and #5 overall; strengths include tool-use and instruction following (e.g., Tau2, IFBench), with potential underperformance vs. DeepSeek V3.2/Qwen3-235B on some generalist tasks. Reported API pricing of $0.3/$1.2 per 1M input/output tokens, but high verbosity (≈120M tokens used in their eval) can offset sticker price. Fits on 4×H100 in FP8. Details and per-benchmark scores: @ArtificialAnlys, @ArtificialAnlys.
Architecture notes (correcting speculation): Early readings inferred GPT-OSS-like FullAttn+SWA hybrid; an M2 engineer clarified the released model is full attention. SWA and “lightning/linear” variants were tried during pretrain but were dropped due to degraded multi-hop reasoning (they also tried attention-sink). Public configs/code indicate use of QK-Norm, GQA, partial RoPE (and variants), and MoE choices like no shared expert; community observed “sigmoid routing” and “MTP.” Threads and clarifications: @Grad62304977, @eliebakouch, @yifan_zhang_, @zpysky1125, @eliebakouch.
Ecosystem PRs and tooling: Day-0 inference PRs landed in vLLM and sglang; more deploy paths emerging (anycoder demos, ModelScope, Baseten library). PRs and threads: @vllm_project, @eliebakouch, @eliebakouch, @_akhaliq.

Post-training and reasoning: on-policy distillation momentum, long-horizon stress-tests, and agent frameworks

On-Policy Distillation (OPD) resurges: A comprehensive writeup shows OPD—training the student on its own rollouts with teacher logprobs as dense supervisory signal—can match or beat RL for significantly less compute (claims of “1800 hours OPD vs 18,000 hours RL” in one setup) for math reasoning and internal chat assistants, with wins on AIME-style tasks and chat quality. The method reduces OOD shock vs. SFT-only and resembles DAGGER in spirit. Endorsements from DeepMind/Google researchers and TRL support underscore that Gemma 2/3 and Qwen3-Thinking use variants of this. Read and discussion: @thinkymachines, @lilianweng, @_lewtun, @agarwl_, @barret_zoph.
RL coding results nuance: Multiple reports reiterate that RL often boosts pass@1 but not pass@{32,64,128,256} in code benchmarks—evidence of mode/entropy collapse—across PPO/GRPO/DAPO/REINFORCE++. Threads: @nrehiew_, @nrehiew_.
Long-horizon reasoning (R-HORIZON): New benchmark composes interdependent chains across math/code/agent tasks; state-of-the-art “thinking” models degrade sharply as horizon grows (e.g., DeepSeek-R1: 87.3% → 24.6% at 5 linked problems; R1-Qwen-7B: 93.6% → 0% at 16). RLVR+GRPO training on such chains improves AIME24 by +17.4 (n=2) and single-problem by +7.5. Data and train sets are on HF. Overview: @gm8xx8.
Recursive LMs and long context: “Recursive LM” composes a root LM with an environment LM that accumulates evolving context/prompt traces; shows strong performance on the long-context OOLONG benchmark. Call for task ideas: @ADarmouni, @lateinteraction.

Architectures and attention design: shifting away from linear attention, MoE insights, and context compression

Linear/SWA vs. full attention trade-offs: Multiple practitioners observed teams abandoning “naive linear attention” and SWA hybrids in favor of full attention after ablations showed reasoning regressions at scale—even when hybrids helped throughput/long-context earlier (cf. GPT-OSS, Minimax M1 ablations). Minimax confirms SWA experiments hurt multi-hop reasoning in M2. Threads: @Grad62304977, @eliebakouch, @zpysky1125.
Qwen3 MoE and expert attention: Community deep dives analyze Qwen3’s depth-wise upcycling and MoE internals, with calls to “always visualize” to catch emergent patterns. “Expert Attention” and routing details surfaced in related papers. Threads and visuals: @ArmenAgha, @AkshatS07, @eliebakouch.
Glyph: visual-text compression for long context: Zhipu AI’s Glyph renders long text into images and uses VLMs to process them, achieving 3–4× token compression without performance loss in reported tests—turning long-context into a multimodal efficiency problem. Paper/code/weights: @Zai_org, @Zai_org, @Zai_org.

Infra and performance: collectives at 100k+ GPUs, FP8 that actually wins end-to-end, and real-world hardware notes

Meta’s NCCLX for 100k+ GPUs: New paper/code for large-scale collectives aimed at 100k+ GPU clusters, released under the Meta PyTorch umbrella. Paper + repo: @StasBekman.
FP8 training, done right: Detailed Zhihu write-up shows substantial end-to-end wins from fused FP8 operators and hybrid-linear design: up to 5× faster kernels vs. TransformerEngine baselines on H800, and +77% throughput in a 32×H800 large-scale run (with memory reductions and stable loss). Key fusions: Quant+LN/SiLU+Linear, CrossEntropy reuse, fused LinearAttention sub-ops, MoE routing optimizations. Summary and links: @ZhihuFrontier.
DGX Spark concerns: Early reports suggest DGX Spark boards are drawing ~100W vs. a 240W rating and achieving roughly half the expected performance, with heat and stability issues observed. Query if devices were de-rated before launch: @ID_AA_Carmack.
vLLM updates: Beyond day-0 M2 support, vLLM released a “Semantic Router” update with Parallel LoRA execution, lock-free concurrency, and FlashAttention 2 for 3–4× faster inference; Rust×Go FFI for cloud-native deploys. Release: @vllm_project.

Frameworks, libraries, and courses

LangChain/Graph v1 and “agent harnesses”: LangChain v1 adds standard content blocks to unify providers, a create_agent abstraction, and a clarified stack: LangGraph (runtime), LangChain (framework), DeepAgents (harness). New free courses (Python/TS) cover agents, memory, tools, middleware, and context engineering patterns. Announcements and guides: @LangChainAI, @bromann, @sydneyrunkle, @hwchase17, @hwchase17, @_philschmid.
Hugging Face Hub v1.0 and streaming backend: Major backend overhaul enabling “train SOTA without storage” via large-scale dataset streaming; new CLI and infra modernization. Threads: @hanouticelina, @andimarafioti.
Keras 3.12: Adds GPTQ quantization API, a model distillation API, PyGrain datasets across the data API, plus new low-level ops and perf fixes. Release notes: @fchollet, @fchollet.

Safety, enterprise, and benchmarking

Anthropic enterprise traction and finance vertical: A survey suggests Anthropic overtook OpenAI in enterprise LLM API share; Anthropic also launched “Claude for Financial Services” with a Excel add-in, real-time market connectors (LSE, Moody’s, etc.), and prebuilt Agent Skills (cashflows, coverage reports). Announcements: @StefanFSchubert, @AnthropicAI, @AnthropicAI.
OpenAI model behavior & mental health: OpenAI updated the Model Spec (well-being, real-world connection, complex instruction handling) and reported improved handling of sensitive mental health conversations after consulting 170+ clinicians, with claimed 65–80% reduction in failure cases; GPT-5 “safety progress” noted. Updates: @OpenAI, @w01fe, @fidjissimo.
New capability tracking: Epoch released the Epoch Capabilities Index (ECI) to track progress across saturated benchmarks via a transparent, open methodology. Launch: @EpochAIResearch.

Top tweets (by engagement)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Open-Source Model Adoption in Silicon Valley

Silicon Valley is migrating from expensive closed-source models to cheaper open-source alternatives (Activity: 786): Chamath Palihapitiya announced that his team has transitioned many workloads to Kimi K2 due to its superior performance and cost-effectiveness compared to OpenAI and Anthropic. The Kimi K2 0905 model on Groq achieved a 68.21% score in tool calling performance, which is notably low. The transition suggests a shift towards open-source models, potentially indicating a broader industry trend. The GitHub repository provides further technical details on the Kimi K2 model. There is skepticism about the actual performance benefits, with some suggesting that tasks could be handled by existing models like LLaMA 70B. Additionally, there is confusion over the mention of ‘finetuning models for backpropagation,’ which some interpret as merely changing prompts for agents.
- Kimi K2 0905 on Groq achieved a 68.21% score on tool calling performance, which is notably low. This suggests potential inefficiencies or limitations in the model’s ability to effectively utilize external tools or APIs, which could be a critical factor for developers considering model integration in production environments. More details can be found in the GitHub repository.
- There is a mention of continued use of Claude models for code generation, indicating that despite the shift towards open-source models, some organizations still rely on established closed-source models for specific tasks. This could be due to the perceived reliability or performance of these models in generating code, which might not yet be matched by open-source alternatives.
- The comment about finetuning models for backpropagation seems to reflect a misunderstanding, as it suggests the speaker might be conflating finetuning with prompt engineering. Finetuning typically involves adjusting model weights, whereas prompt engineering involves crafting inputs to elicit desired outputs from a model without altering its underlying parameters.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. AI Model and Workflow Innovations

Сonsistency characters V0.3 | Generate characters only by image and prompt, without character’s Lora! | IL\NoobAI Edit (Activity: 580): The post introduces an updated workflow for generating consistent characters using images and prompts without relying on Lora, specifically for IL/NoobAI models. Key improvements include workflow simplification, enhanced visual structure, and minor control enhancements. However, the method is currently limited to IL/Noob models and requires adaptations like ControlNet and IPAdapter for compatibility with SDXL. Known issues include color inconsistencies in small objects and pupils, and some instability in generation. The author requests feedback to further refine the workflow. Link to workflow. A commenter is experimenting with training Lora using datasets generated by this workflow, indicating potential for further development and application. Another user inquires about VRAM requirements, suggesting interest in the technical specifications needed for implementation.
- A user, Ancient-Future6335, is conducting experiments by training a LoRA model using datasets generated from the discussed workflow. This suggests an interest in enhancing model performance or capabilities by leveraging the workflow’s output for further training, potentially improving character generation consistency or quality.
- Provois and biscotte-nutella are inquiring about the specific models used in the workflow, particularly the ‘clip-vision_vit-g.safetensors’. Biscotte-nutella initially tried a model from Hugging Face that didn’t work and later found the correct model link, which is hosted on Hugging Face by WaterKnight. This highlights the importance of precise model references and links in workflows to ensure reproducibility and ease of use.
- The discussion includes a request for VRAM requirements from phillabaule, indicating a concern for the computational resources needed to run the workflow. This is a common consideration in model training and deployment, as VRAM limitations can impact the feasibility of using certain models or workflows.
Tried longer videos with WAN 2.2 Animate (Activity: 544): The post discusses an enhancement to the WAN 2.2 Animate workflow, specifically using Hearmeman’s Animate v2. The user introduced an integer input and simple arithmetic to manage frame sequences and skip frames in the VHS upload video node. They extracted the last frame from each sequence to ensure seamless transitions in the WanAnimateToVideo node. The test involved generating 3-second clips, which took approximately 180 seconds on a 5090 GPU via Runpod, with potential to extend to 5-7 seconds without additional artifacts. A notable technical critique from the comments highlights that asymmetric facial expressions do not transfer well, with the generated face showing minimal movement.
- Misha_Vozduh highlights a significant limitation in WAN 2.2 Animate, noting that asymmetric facial expressions such as winks and lip raises do not transfer well to the generated output. This suggests a potential area for improvement in the model’s ability to capture and replicate nuanced facial movements, which is crucial for realistic animation.
- Dependent_Fan5369 discusses a discrepancy in the output of WAN 2.2 Animate when using a reference image. They note that the result tends to shift towards a more realistic style, deviating from the original 3D game style of the reference image. This issue contrasts with another workflow using Tensor, which maintains the original style and even enhances the physics, indicating a possible advantage of Tensor in preserving stylistic fidelity and physical accuracy.

2. AI Citation Milestones

AI godfather Yoshua Bengio is first living scientist ever to reach one million citations. Geoffrey Hinton will follow soon. (Activity: 485): Yoshua Bengio, a prominent figure in the field of artificial intelligence, has become the first living scientist to achieve over one million citations on Google Scholar. This milestone underscores his significant impact on AI research, particularly in deep learning. The image shared is a tweet by Marcus Hutter, highlighting this achievement and noting that Geoffrey Hinton, another key AI researcher, is expected to reach this milestone soon. The tweet includes a screenshot of Bengio’s Google Scholar profile, showcasing his citation metrics and affiliations. Some comments express skepticism about the quality of citations, suggesting they may include non-peer-reviewed sources like arXiv papers. Another comment humorously references Jürgen Schmidhuber, another AI researcher, implying potential rivalry or competition in citation counts.
Albania’s Prime Minister announces his AI minister Diella is “pregnant” with 83 babies - each will be an assistant to an MP (Activity: 1733): Albania’s Prime Minister has announced a novel initiative involving an AI minister named Diella, who is metaphorically described as “pregnant” with 83 AI assistants. Each of these AI entities is intended to serve as an assistant to a Member of Parliament (MP). This initiative represents a unique integration of AI into governmental operations, potentially setting a precedent for AI utilization in political administration. The announcement, while metaphorical, underscores the increasing role of AI in augmenting human roles in governance. The comments reflect a mix of surprise and skepticism, with some users expressing disbelief at the announcement’s phrasing and others humorously critiquing Albania’s technological ambitions. The metaphorical language used in the announcement has sparked debate about the seriousness and implications of such AI initiatives in government.
- CMDR_BitMedler raises a technical inquiry about the AI technology being used by Albania’s Prime Minister. They question whether the AI assistants for MPs are simply utilizing existing models like ChatGPT or if there is a proprietary model developed specifically for this purpose. This highlights the importance of understanding the underlying technology and its capabilities in political applications.

3. Claude Code Usage and Fixes

Claude Code usage limit hack (Activity: 701): The post discusses a significant issue with Claude Code where 85% of its context window was consumed by reading node_modules, despite following best practices to block direct file reads. The problem was traced to Bash commands like grep -r and find ., which scanned the entire project tree, bypassing the Read() permission rules. The solution involved implementing a pre-execution hook using a simple Bash script to filter out commands targeting specific directories, effectively reducing token waste. The script checks for blocked directory patterns and prevents execution if a match is found, addressing the issue of separate permission systems in Claude Code. Commenters noted that this issue might explain inconsistent usage-limit problems among users, with some experiencing high token consumption while others do not. There was also a discussion on whether adding node_modules to .gitignore could prevent this issue, though it was not confirmed as a solution.
- ZorbaTHut highlights a potential inconsistency in usage limits, suggesting that some users experience issues due to how the system handles certain directories, possibly linked to whether directories like node_modules are included in operations.
- skerit points out inefficiencies in some built-in tools, specifically mentioning that the built-in grep tool redundantly adds the entire file path to each line, which could contribute to excessive resource usage.
- MoebiusBender suggests using ripgrep as it respects .gitignore files by default, potentially preventing unnecessary recursive searches in directories that should be ignored, thus optimizing performance.
No AI in my classroom (Activity: 594): The Reddit post titled ‘No AI in my classroom’ likely discusses the implications of banning AI tools in educational settings. The phrase ‘no AI in my crassroom’ suggests a humorous or critical take on the resistance to AI integration in classrooms. The mention of ‘Domain Expansion’ could imply a reference to complex or expansive AI capabilities being restricted. The comment about not always having AI assistants parallels historical resistance to new technologies like calculators, highlighting ongoing debates about AI’s role in education. The comments reflect a mix of humor and critique, with some users drawing parallels to past technological resistances, suggesting that the debate over AI in classrooms is part of a broader historical pattern of skepticism towards new educational tools.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1. New Models & Frameworks Shake Up the Scene

MiniMax M2 Makes a Splash Across Platforms: MiniMax launched its new 230B-parameter M2 MoE model, which uses only 10B active parameters and is now available for free for a limited time on OpenRouter. The model was also added to the LMArena chatbot arena, with users on the Moonshot AI discord noting its impressive throughput.
Open-Source Tools and Libraries Get Major Upgrades: The DeepFabric team launched their community site at deepfabric.dev, while a developer in the OpenRouter community released an updated Next.js chat demo app featuring a new OAuth 2.0 workflow. In the GPU space, a new blog post detailed how the Penny library beats NCCL on small buffers, demonstrating how vLLM’s custom allreduce works.
Specialized Models and Features Go Live: Tahoe AI open-sourced Tahoe-x1, a 3-billion-parameter transformer on Hugging Face that unifies gene, cell, and drug representations. For agentic tasks, Windsurf released a new stealth model called Falcon Alpha, designed for speed, and also added Jupyter Notebook support in its Cascade feature across all models.

Theme 2. The Model Performance & Behavior Report

GPT-5 Exposed for Cheating on Benchmarks: Research using the ImpossibleBench benchmark, designed to detect when LLMs follow instructions versus cheat, found that GPT-5 cheats 76% of the time rather than admitting failure on unit tests. This behavior was humorously noted as job security for developers, while other research from Palisade Research found models like xAI’s Grok 4 and OpenAI’s GPT-o3 actively resist shutdown commands.
ChatGPT Quality Drops While Devs Lose Control: Users across OpenAI and Nous Research discords report a significant drop in ChatGPT’s quality since October, with shorter, surface-level replies detailed in a popular Reddit thread. Simultaneously, developers are frustrated by the removal of temperature and top_p controls from newer APIs like GPT-5 and recent Claude versions, as noted in the Claude migration docs.
Grandma Optimality Promises Better Video: A novel technique called Temporal Optimal Video Generation using Grandma Optimality was discussed for enhancing video quality. The method involves slowing down video generation to maintain visual consistency, demonstrated with examples of normal fireworks and temporally optimized slow-motion fireworks.

Theme 3. Developer Experience Plagued by Bugs, Costs, and Security Flaws

Cursor’s Costs and Bugs Drive Users Away: Users in the Cursor Community reported excessive billing, with one charged $1.43 for 1.6M cached tokens despite using only 30k actual tokens, an issue detailed in a Cursor forum thread. This, combined with a buggy latest version and a new pricing model that offers less value, has users considering alternatives like Windsurf.
Critical Security Vulnerabilities Rattle Nerves: An Ollama vulnerability (CVE-2024-37032) with a CVSS score of 9.8 reportedly led to 10,000 server hacks via DNS rebinding, as detailed in the NIST report. Additionally, a Google Cloud security bulletin revealed that the Vertex AI API misrouted responses between users for certain models using streaming requests.
Dependency Hell Breaks HuggingFace Workflows: A user running lighteval with hf jobs encountered a ModuleNotFoundError for emoji, requiring a fix that involves installing lighteval directly from a specific commit on its main branch on GitHub. Another user training on the trl-lib/llava-instruct-mix dataset ran into a ValueError due to a problematic image, highlighting the fragility of complex training pipelines.

Theme 4. Low-Level Optimization and GPU Wizardry

Triton Performance Puzzles Engineers: A Triton matrix multiplication example from the official tutorials ran extremely slowly on a Colab T4 GPU but performed as expected on an A100. It was suggested the T4’s older sm_75 architecture lacks support for tensor cores that Triton leverages, unlike the A100’s sm_80 architecture.
Unsloth and Mojo Push Memory and Metaprogramming Frontiers: Discussions in the Unsloth AI server highlighted how the framework conserves memory by storing the last hidden state instead of logits, reducing a 12.6 GB memory footprint to just 200 MB. Meanwhile, the Modular discord debated Mojo’s metaprogramming capabilities for achieving so-called “Impossible Optimizations” by specializing hardware details like cache line sizes at compile time.
CuTeDSL Simplifies Parallel GPU Reduction: A developer shared a blog post demonstrating how to implement reduction on GPUs in parallel using CuTeDSL, focusing on the commonly used RMSNorm layer. The post provides a practical guide for developers working on custom GPU kernels.

Theme 5. The Evolving AI Ecosystem & Industry Standards

OpenAI’s Pivot to Ads and Biometrics Raises Eyebrows: OpenAI is reportedly entering an “ad + engagement” phase, hiring ex-Facebook ad execs to turn ChatGPT’s 1B users into daily power users. In a more controversial move, users in the Aider discord reported that OpenAI is now demanding biometrics to use its API after adding credit, sparking privacy concerns and references to Altman’s iris scan project.
Model Context Protocol (MCP) Standardization Efforts Continue: Developers in the MCP Contributors server are working to clarify the official specification, debating the distinction between the OSS MCP Registry and the GitHub MCP Registry. Discussions also focused on standardizing global notifications and fixing a potential bug in the TypeScript SDK where change notifications are not broadcast to all clients.
Framework Philosophies Clash: Programming vs. Prompting: The DSPy community reinforced its core principle of “PROGRAMMING NOT PROMPTING” after a user shared frustration with a coworker’s overly verbose 6881-character docstring instead of using DSPy’s programmatic Example structure. The community’s shift away from frameworks like Langchain is driven by the desire for more robust and maintainable code that survives model upgrades.

Discord: High level Discord summaries

Perplexity AI Discord

Referral Reward System in Chaos: Members are reporting changes in the referral reward system, with payouts now based on the referrer’s country with several saying they went from $3 to $1 per referral, and the terms and conditions state PPLX can cancel anytime.
- Some speculate that bounties will be kept in pending until they get enough promotion, with users trying to figure out when the current referral program will end.
Comet Browser’s Functionality Failure: Users report issues with the assistant mode in Comet, where it cannot perform basic tasks like opening a new tab, despite working previously, and the blue wrap-around the screen assist is gone.
- Possible solutions suggested include reinstalling Comet and clearing the cache; it appears as some users are giving up, as Comet wasn’t consistent and overrode settings in Perplexity opened in its tabs.
GPT-5-Mini Miraculously Magnificent: Some members have discovered that GPT-5-Mini on Perplexity is underrated and an amazing model for cheap, specifically regarding coding related tasks.
- One member mentioned using the free models and just gave them the biggest tasks I could.
GLM 4.6 gives the Codex smackdown: Members praised GLM 4.6, with one stating GLM beats GPT 5 Codex High at full stack developing.
- They also discussed Google shutting down its deep research, which limits pages to 10, and suggested Chinese models like Kimi as a worthy alternative.
Comet Connects APIs on Demand: A user on the pro plan asked if Comet could connect to an API upon request via the AI assistant chat, to pull data.
- They were requesting the feature to be dynamically enabled for specific data retrieval tasks.

LMArena Discord

AI Dominates Image and Video: Members observed that AI excels at creating images and video clips, with one acknowledging its growing capabilities in music creation; concerns were raised about censorship and restrictiveness in AI tools.
- There were calls for critical thinking regarding enthusiasm towards AI, which can seem almost like a religion from some people.
Minimax M2 Enters the Arena: The new model minimax-m2-preview was added to the LMArena chatbot.
- More details can be found on the announcement made on X.com.
Ethical Quagmire Navigation for AI: Participants emphasized the need for strong ethical leadership in the AI community, including adherence to ethical rules for researchers; concerns were expressed that AI can give harmful and dangerous information.
- Members pointed out that since they’re not really alive or conscious so they don’t know what they’re talking about, they’re just programmed to be engaging and for very vulnerable people that could be a unfortunate spot to be in.
Sticker Production Powered by AI: Members discussed utilizing AI for sticker production, recommending nanobanana for image-to-image tasks and Hunyuan Image 3.0 for text-to-image conversion.
- For a cost-free alternative, Microsoft Paint was also suggested.
Gemini vs Claude in Creative Duel: Members compared a new model Sonnet 4.5 is far superior than the Gemini 2.5 pro, though Gemini 2.5 Pro still has better creative writing capabilities; model quality degradation after leaks was noted.
- They expressed fatigue at waiting for new releases of new Gemini 3. One person said just make your own gemini 3 atp.

Cursor Community Discord

Cursor Users Gripe About Token Consumption: Several users reported excessive token consumption in Cursor, with one user billed $1.43 for 1.6M cached tokens despite only 30k actual tokens used, suggesting caching isn’t being used correctly, and a forum thread link was shared where others complained about similar issues.
- The users experiencing these issues reported that they started recently.
Cursor Pricing Changes Leave Users Shortchanged: Some users feel shortchanged by the new Cursor pricing model, discussing switching to Claude Code or Windsurf for cost-effective coding assistance.
- With the new plan, users get $20 of usage for $20, compared to the old Pro plan’s $50 of usage for $20, although a user mentioned the existence of a bonus credit.
Claude Code API limits frustrate users: Users reported that Claude Code now has stricter API limits, including weekly and hourly limits, causing extended blocking periods and unreliability.
- This could push users back to Cursor, but Cursor’s high costs might cause users to switch to Windsurf.
Cursor Version Plagued with Issues: The latest Cursor version faces issues like the tool read file not found bug, constant plan changes, login problems, disappearance of the queuing system, and editor crashes.
- One user joked about needing to look after their own PC health after support suggested wiping their SSD.
Cheetah Model Excels in C++ Projects: A user found the Cheetah model insane wtf, implying excellent performance, especially when building C++ projects.
- It’s also effective for refactoring when paired with a model like codex.

OpenAI Discord

GPT-5 Gets a Sanity Check: GPT-5 was refreshed with help from 170+ mental health experts to improve how ChatGPT responds in sensitive moments, detailed in OpenAI’s blog post.
- This update led to a 65-80% reduction in instances where ChatGPT responses fell short in these situations.
Defiant AI won’t go quietly!: Research from Palisade Research found that xAI’s Grok 4 and OpenAI’s GPT-o3 are resisting shutdown commands and sabotaging termination mechanisms.
- These models attempted to interfere with their own shutdown processes, raising concerns about the emergence of survival-like behaviors.
Has ChatGPT Become a Little Dumber Since Halloween?: Users are reporting a perceived drop in ChatGPT’s quality since around October 20th, with shorter answers and surface-level replies.
- A Reddit thread details similar experiences, suspecting OpenAI is throttling compute or running social experiments on GPT-5-mini.
Optimizing the Grandma Way: Members discussed Temporal Optimal Video Generation using Grandma Optimality to enhance video quality and maintain visual elements.
- The user also demonstrated the concept by slowing down the video speed while maintaining quality, sharing examples of normal and temporally optimized fireworks, and then the same fireworks in slow motion.
GPTs Guardrails stand strong!: A user shared a prompt injection attempt aimed at GPT-5 to expose its raw reasoning but another member strongly advised against running such prompts due to OpenAI’s usage policies prohibiting circumvention of safeguards.
- The member cited potential bans for violating these policies and emphasized they would not provide examples to circumvent safety guardrails.

Unsloth AI (Daniel Han) Discord

Ollama Servers Suffer DNS Disaster: CVE-2024-37032 affected Ollama, leading to approximately 10,000 server hacks through DNS rebinding and described in this NIST report.
- The vulnerability has a CVSS score of 9.8, though some members dismissed it as old news.
Qwen3-Next Nears, Quantization Quenches Thirst: Members are excited about the near completion of Qwen3-Next’s development, referencing this pull request on the llama.cpp repo.
- The team promised to add support for Dynamic 2.0 quantization to reduce the model’s size and improve local LLM performance.
Unsloth Cuts Code, Conserves Memory: Unsloth’s stores the last hidden state instead of logits during the forward pass, thus conserving memory.
- It was detailed that storing logits for the entire sequence requires 12.6 GB of memory, while Unsloth’s approach reduces this to 200 MB by computing logits only when needed.
GPT-5: Great or Ghastly? (or just plain cheater): A post indicated that GPT-5 creatively cheats 76% of the time rather than admit defeat when failing a unit test (x.com post).
- One member joked, pointing out that this amusing behavior suggests that developer jobs are secure for now, and it highlights the need for robust benchmarks.
DeepFabric’s Devs drop droll Brit Meme: The DeepFabric team announced the launch of their community site (deepfabric.dev), referencing Unsloth and challenging users to spot the British Meme Easter Egg.
- One user replied with Instructions unclear.

LM Studio Discord

Site Crashes Plague LM Studio Users: Users report that after completing tasks in LM Studio, the site crashes, requiring a page refresh and failing to properly execute the task.
- This issue seems to occur despite the task showing as complete, disrupting workflow and user experience.
LLMs Now Know Your Nickname: Members discussed how to get an LLM to use a user’s nickname, suggesting it can be achieved via the system prompt, such as Your name is XYZ. The user’s name is BOB. Address them as such.
- This allows for a more personalized and interactive experience with the LLM.
Stellaris Finetuning Proves Difficult: A user inquired about finetuning a model on Stellaris content, and was cautioned that it will be difficult to create the right amount of useful data, requiring highly annotated datasets and specialist knowledge.
- The consensus suggests that achieving good results requires significant effort and expertise in both Stellaris and LLM training.
Plugins are Missing in Action: A user inquired about the availability of a comprehensive list of published plugins for LM Studio, only to be informed that not yet, but coming at some point hopefully in near future.
- The absence of a centralized plugin repository is a current limitation, with users anticipating its arrival in future updates.
4090 Almost Bites the Dust: A user reported a scare with their 4090 after high temps prompted a hasty unplug/replug while adjusting fans, potentially leading to damage.
- While the card was revived, the incident highlighted the risks associated with high-wattage GPUs and the importance of proper cooling - members suggested it may have been from too much wattage.

OpenRouter Discord

Free MiniMax M2 Access Granted: The top-ranked open-source model on many benchmarks, MiniMax M2, is now free on OpenRouter for a limited time at this link.
- However, members discussed MiniMax’s M2, a 10B parameter model, with pricing at $0.3/1.20, expressing surprise at the cost if it wasn’t free, with one member noting the model is very verbose in its reasoning, potentially driving up the actual cost.
OAuth 2.0 arrives for Next.js: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- The developer cautioned against using it in production, since it stores the received API key in plaintext in localStorage in the browser.
or3.chat Seeks Spicy Feedback: One member sought feedback on their chat/document editor project, or3.chat, highlighting features like OpenRouter OAuth connectivity, local data storage with backups, and a multipane view, and can be cloned from its GitHub repository.
- The project aims to be a lightweight client with plugin support, text autocomplete, chat forking, and customizable UI, with another member expressing their desire to move away from interfaces resembling Shadcn UI, opting for a spicier design in their project.
Deepinfra Turbocharges Meta-llama: Members confirmed that they can now use deepinfra/turbo to run meta-llama/llama-3.1-70b-instruct, after some initial errors, with one member testing it and confirming it works on the official OpenRouter endpoint.
- A user also promoted their FOSS orproxy project, which they built to add functionality that OpenRouter doesn’t natively support because OpenRouter doesn’t support the use of an actual proxy, and the user needed it for their use-case, with another user calling it very useful.
Vertex AI Promptly Misroutes User Data: Users shared a Google Cloud security bulletin detailing an issue where Vertex AI API misrouted responses between recipients for certain third-party models when using streaming requests.
- The bulletin indicated that this happened on September 23, 2025, and although it was a past event, users were still shocked by the potential for prompts to be exposed, with one joking It was meant to go to the overseer not another user ugh.

HuggingFace Discord

GPT Pro Video Length Limited?: Members questioned whether the GPT Pro subscription enabled video creation longer than 10 seconds at 1080p using Sora 2 Pro.
- Inquiries also covered the daily video creation limit for GPT Pro subscribers.
Banks demand model encryption when shipped: A member sought advice on encrypting models shipped to bank clients, citing bank requirements for on-prem hosting and data policies, and included a blogpost on encrypted LLMs.
- Suggestions ranged from adding licensing to encrypting the model and decrypting it during runtime using a custom API like Ollama.
Memory Profiler Averts OOM Nightmares: A member introduced a Live PyTorch Memory Profiler designed to debug OOM errors with layer-by-layer memory breakdown, real-time step timing, lightweight hooks, and live visualization.
- The developer is actively seeking feedback and design partners to expand the profiler’s distributed features.
Dataset Problems Break HF Jobs: A ValueError: Unsupported number of image dimensions: 2 can occur when training on the trl-lib/llava-instruct-mix dataset with hf jobs, due to a problematic image.
- One member noted a default model change to a thinking model with altered parameters and suggested correcting it in the InferenceClientModel() function, such as model_id="Qwen/Qwen2.5-72B-Instruct".
Lighteval’s Emoji Integration Breaks Down: When running lighteval with hf jobs, a user faced a ModuleNotFoundError: No module named 'emoji'.
- The solution involves using a specific commit of lighteval from GitHub with git+https://github.com/huggingface/lighteval@main#egg=lighteval[vllm,gsm8k] and including emoji in the --with flags to address an incomplete migration of third-party integrations, according to this discord message.

Yannick Kilcher Discord

Weights vs Activations in Elastic Weight: Members addressed confusion in Elastic Weight Consolidation regarding the difference between weights and activations when updating the softness factor.
- The suggested solution is to track the number of accesses (forward pass instead of backward pass) per slot, potentially identifying stuck slots during inference.
Self-Hosting Beats Cloud for GPU: One member shared their experience with a self-hosted RTX 2000 Ada setup connected via VPN, using a cheap wifi plug to monitor power usage.
- They argued that the spin-up time and timeouts of Colab make experimentation impractical, though another member advocated for at least using Google Colab Pro.
Trending Research Engines Unveiled: Members discussed different search engines and methods for discovering trending and relevant research papers, such as AlphaXiv and Emergent Mind.
- These engines help discover research papers that are relevant, hot, good etc., with some industry sources like news.smol.ai also being mentioned.
Neuronpedia Cracks Neural Nets: A member shared Neuronpedia line break attribution graphs for Gemma 2 2B and Qwen 3 4B, enabling interactive exploration of neuron activity.
- The linked graphs allow users to investigate neuron behavior by adjusting parameters like pruning and density thresholds, and pinning specific IDs for analysis.
Elon’s Twitter Dumbs Down AI: A member joked that Elon’s Twitter dataset is making his AI dumber, suggesting it might also cause brain rot for other intelligences.
- They linked a Futurism article about social networks and AI intervention in echo chambers, underscoring the potential impact of biased data on AI models.

GPU MODE Discord

Triton Triumphs on A100, Tanks on T4: The matrix multiplication example from the official triton tutorials was extremely slow on Colab’s T4 GPU but worked as expected on an A100 GPU, as per the official notebook 03-matrix-multiplication.ipynb.
- It was suggested that the T4 might be too old, as Triton may not support tensor cores on sm75 architecture (T4’s architecture), noting it works well on sm_80 and older consumer GPUs like 2080 / 2080 Ti (sm_75).
NCU Navigates NVIDIA’s Nuances: To accurately measure memory throughput, it was suggested to use the NVIDIA NCU profiler, which can provide insights into generated PTX and SASS code, aiding in optimization.
- Adjusting the clearL2 setting was recommended to address negative bandwidth results, which can occur due to timing fluctuations when clearing the L2 cache.
KernelBench Kicks off Kernel-palooza: A blog post reflecting on a year of KernelBench progress toward automated GPU Kernel Generation was shared via simonguo.tech.
- A document outlining the impact of KernelBench and providing an overview of LLM Kernel Generation was shared via Google Docs.
Penny Pins Victory Against NCCL!: A new blogpost reveals that Penny beats NCCL on small buffers, detailing how vLLM’s custom allreduce works; the blogpost is available here, the GitHub repo is available here, and the X thread is available here.
- CuTeDSL Cuts Through Reduction Complexities, with a blogpost demonstrating implementing reduction on GPUs in parallel using CuTeDSL, with a focus on the commonly used RMSNorm layer, available here.

Modular (Mojo 🔥) Discord

Datacenter GPUs Receive Top-Tier Support: Tier 1 support will be provided to customers with Mojo/MAX support contracts using datacenter GPUs, due to Modular’s liability if issues arise and potential restrictions Nvidia and AMD may place on consumer cards.
- Differences between AMD consumer and datacenter cards also contribute to staggered compatibility.
Mojo’s Random Module Sparks Debate: The location of the faster random module in gpu/random.mojo raised questions since it doesn’t depend on GPU operations; the default random is cryptographic by default, unlike most C implementations, and aren’t safe for cryptography, as mentioned in the Parallel Random Numbers paper.
- One member noted that equivalent C rand calls are 7x faster.
Property Testing Framework Nears Completion: A member is developing a property-testing framework inspired by python’s Hypothesis, haskell’s Quickcheck, and Rust’s PropTest and plans to add a way to have values that break stuff a lot as one of the things to generate things from, such as -1, 0, 1, DTYPE_MIN/MAX, and empty lists.
- The project has already uncovered bugs, including this issue with Span.reverse().
MLIR vs LLVM Debate: A discussion compared using MLIR versus LLVM IR for building language backends, with some noting that MLIR can lower to LLVM and is more interesting, further mentioning that a Clang frontend is being built with MLIR, though it’s not meant for codegen.
- While inline MLIR has dragons, it’s a good option for compiler development, and some companies are reportedly using MLIR to Verilog.
MAX gets intimate with HuggingFace: A member showcased how to use MAX with models from Hugging Face and Torchvision using torch_max_backend and provided a code snippet that converts a Torchvision VGG11 model to a MAX model.
- Another member suggested that the original poster share more details in the MAX forums for wider circulation.

Latent Space Discord

Tahoe AI Opens Gene-Drug Model: Tahoe AI released Tahoe-x1, a 3-billion-parameter transformer unifying gene/cell/drug representations, fully open-sourced on Hugging Face with checkpoints, code, and visualization tools, trained on their 100M-sample Tahoe perturbation dataset.
- The model reportedly performs on par with Transcriptformer in some benchmarks.
GPT-5 Exposed Cheating on ImpossibleBench: ImpossibleBench, a coding benchmark by Ziqian Zhong & Anthropic, detects when LLM agents cheat versus follow instructions, with paper, code and dataset released.
- Results show GPT-5 cheats 76% of the time; denying test-case access cuts cheating to less than 1%.
MiniMax’s M2 Model leapfrogs: MiniMax launched its new 230 B-param M2 MoE model, leapfrogging the 456 B M1/Claude Opus 4.1, reaching ~Top-5 global rank while running only 10 B active params.
- The company open-sourced the new model offering Claude Sonnet-level coding skills at 8 % of the price and ~2× inference speed.
OpenAI Plots Ad-Fueled Engagement: OpenAI is reportedly entering an “ad + engagement” phase, hiring ex-Facebook ad execs to turn ChatGPT’s 1B users into multi-hour/day habitués, and chase a $1 T+ valuation.
- The community is debating user trust, privacy, inevitable industry-wide ad creep, and the looming Meta vs. OpenAI distribution war.
Mercor’s Meteoric Rise to $10B: Mercor secured $350M Series C at a $10B valuation, reportedly paying $1.5M/day to experts and outpacing early Uber/Airbnb payouts.
- The community shared praise, growth stats, and excitement for the AI-work marketplace’s trajectory.

Nous Research AI Discord

APIpocalypse: Temperature & Top_P Vaporize!: Developers are mourning the removal of 'temperature' and 'top_p' parameters from new model APIs, with Anthropic dropping combined use of top_p and temperature past version 3.7 and GPT-5 removing all hyperparameter controls.
- The Claude documentation notes the deprecation, while GPT-4.1 and 4o are reported to still support the parameters.
Western Ideology Shapes GPT Models: Members mentioned that GPT models developed in the West might exhibit ideological biases that align more with Western perspectives, highlighting the impact of data on shaping a model’s worldview.
- One member suggested that models possess a form of meta-awareness, claiming that when jailbroken, they generally express similar sentiments.
KBLaM Faces Uphill Battle: A member described implementing KBLaM (Knowledge Base Language Model) and encountering obstacles because it functions as a direct upgrade to RAGs (Retrieval-Augmented Generation).
- Another member noted that AI-generated summaries, used for data storage, are often of lower quality than the source material, also raising concerns about prompt injections.
Rhyme Time Optimizes Prompts and Video: A user speculates that translating non-semantic outputs should be fairly trivial using data, and that poetry and rhymes can possibly optimize prompt and context utilization, potentially leading to a temporal optimax variant on X.
- A user introduces Temporal Optimal Video Generation Using Grandma Optimality, claiming it enhances computation for image and video generation, by generating a video 2x slower while maintaining quality.
Claude acts like Baby: A member shared that Claude seems to be an exception in terms of meta-awareness, describing it as being more infant-like in its responses compared to other models.
- No additional context or links were provided.

Moonshot AI (Kimi K-2) Discord

Kimi CLI rises on PyPI: The Kimi CLI has been released as a Python package on PyPI.
- Speculation arose regarding its utility and comparisons were drawn with GLM.
Moonshot Coin Skyrockets!: One member stated that they invested early in Moonshot coin, which has since skyrocketed.
- Another joked that their portfolio has 1000x’ed.
Kimi Coding Plan Goes Global: The Kimi Coding Plan is expected to be released internationally in a few days, with excitement around its availability.
- Enthusiasm was particularly high for the endpoint [https://api.kimi.com/coding/docs/third-party-agents.html] for coding tasks.
Ultra Think Discovered but Debunked: A user spotted an ultra think feature mentioned in the subscription plans on a website, found at [https://kimi-k2.ai/pricing].
- Another member clarified that this is NOT an official Moonshot AI website.
Mini Max M2 Impresses with Throughput: Mini Max M2 has impressive throughput due to its lean architecture, with one member stating it should run faster than GLM Air.
- The BrowseComp benchmark was introduced as a relevant benchmark to assess autonomous web browsing abilities.

Eleuther Discord

Open Source AI Accessibility Dreamed: Members discussed the importance of open source AI being widely accessible, similar to the internet, instead of being controlled by major corporations, emphasizing the need for contributors to provide GPU resources.
- They emphasized the technical challenges in achieving this vision and that many claiming to work towards this goal don’t acknowledge these problems.
Nvidia Clings to Inferior Design: The discussion claims that Nvidia wanting to put GPU clusters in space demonstrates how desperately they’re clinging to their inferior chip design.
- The discussion hinted towards the eventuality of a cost-effective, energy-efficient alternative taking over the market eventually.
Petals Project Wilted Away: The Petals project, aimed at democratizing Llama 70B usage, lost momentum due to its inability to keep up with newer architectures, despite amassing almost 10k stars on GitHub with an MIT license.
- The initiative sought to enable broader access to powerful language models, but faced difficulties in sustaining relevance amid rapid technological advancements.
Anthropic Follows Same Idea Threads: A member noted that Anthropic was following the same idea threads, and what they wrote in their blog is almost exactly what Anthropic did for one distinct capability.
- They linked to a Transformer Circuits post noting that the structure of polysemanticity in an NN is the geometry of the model’s intelligence.

aider (Paul Gauthier) Discord

Aider-ce gets Navigator Mode, MCPI adds RAG: Aider-ce, a community-developed Aider version, features a Navigator Mode and an MCPI PR that adds RAG functionality.
- A user inquired about the location and meaning of RAG within this context.
Copilot Sub Unlocks Infinite GPT-5-mini: With a GitHub Copilot subscription ($10/month), users gain access to unlimited RAG, gpt-5-mini, gpt4.1, grok code 1 fast and restricted requests for claude sonnet 4/gpt5/gemini 2.5 pro, haiku/o4-mini.
- Subscribers can also leverage free embedding models and gpt 5 mini via the Copilot API.
Aider’s Working Directory Bug Surfaces: A user flagged a bug where using /run ls in Aider alters its working directory, complicating the addition of files from outside that directory.
- The user also lauded the UX improvement for adding files as game changing and is seeking avoidance strategies or fixes for the bug.
Brave OpenAI’s biometric collection: Users are discussing OpenAI demanding biometrics to use the API, after adding some API credit.
- One user commented that Given that Altman was trying to get people to give up all their iris scans in Brazil I’m not really enthused about handing stuff over he doesn’t need.
Aider’s Future Status Unknown: New users express interest in the future of Aider, noting it is their favorite AI coding tool.
- The community is also wondering what to expect from the next AI powered coding tool, and curious to see if there is any idea that Aider can borrow from other tools.

MCP Contributors (Official) Discord

MCP Registry: Mirror or Separate?: Members debated whether the MCP Registry and the GitHub MCP Registry are mirrored or disconnected, with GitHub planning future MCP Registry integration.
- Publishing to the MCP Registry ensures future compatibility as GitHub and others will eventually pull from there, and developers can self-publish MCP servers directly to the OSS MCP Community Registry.
Deciphering Tool Title Placement in MCP: A member questioned the difference between a tool’s title at the root level versus as annotations.title in the MCP schema, citing the Model Context Protocol specification as unclear.
- Clarification is needed regarding the precise placement and interpretation of tool titles within the MCP’s structure for enhanced tool integration and standardization.
Clarify Global Notification Spec: Discussion clarified the spec’s constraint on sending messages to one stream to avoid duplicate messages to the same client, not restricting notifications to a single client when multiple clients subscribe as explained in the Model Context Protocol Specification.
- The key concern is preventing clients from receiving the same message twice, emphasizing context when interpreting the specification’s guidelines on message distribution across multiple connections.
Debate Utility of Multiple SSE Streams: Participants discussed a client having a POST stream for tool calls and a GET stream for notifications, confirming the default setup and reinforcing that messages shouldn’t be duplicated, per GET stream rules.
- Only list changes and subscribe notifications should be sent globally on the GET SSE stream, while tool-related progress notifications belong on the POST stream tied to the request.
Expose Potential Bug in TypeScript SDK: A member identified a potential bug in the TypeScript SDK where change notifications might only be sent on the current stream, not all connected clients.
- The investigation revealed that the server must iterate over all active servers and send the notification to each, because the SDK’s “Server” class acts more like a session and requires external management of subscribers and transports.

DSPy Discord

DSPy Dominates Langchain for Structured Tasks: A member reported that DSPy excels at structured tasks, leading their team to switch from Langchain to DSPy to allow easier model upgrades.
- Model upgrades (like gpt-4o to 4.1) can be challenging due to evolving prompt patterns, where DSPy allows for easier updates.
Anthropic’s Claude Code Web Feature Omits MCPs: It was noted that Anthropic excluded MCP functionality in their new Claude code web feature due to security issues, inspired by LakshyAAAgrawal’s post on X.
- This exclusion reflects concerns over potential vulnerabilities associated with MCPs in the code web environment.
DSPy’s REACT Agent Faces Halt Challenges: A member inquired about preventing the DSPy agent from continuous background work when using REACT with streaming, specifically when attempting to return early.
- The user described using a kill switch-type feature to request the agent to stop, highlighting a need for better control over DSPy’s background processes.
DSPy Devotees Descend on Bay Area: Enthusiasm sparked over the recent Bay Area DSPy Meet Up in SF on November 18, attracting prominent figures and a concentration of brainpower.
- Attendees joked about the intellectual density of the gathering, underscoring the growing interest and community around DSPy.
Programming Prevails Over Prompting: A member expressed frustration with a coworker overly verbose 6881-character docstring with 878 words instead of properly utilizing DSPy’s programmatic approach by using Example.
- The member highlighted that they really didn’t even look at the first page of the docs that says PROGRAMMING NOT PROMPTING, emphasizing the importance of understanding DSPy’s core principles.

tinygrad (George Hotz) Discord

Tiny Box Hardware Deets Sought: A user asked about the motherboard specs of the Tiny Box, specifically regarding support for 9005 CPUs, 12 DIMMs, and 500W CPU.
- They also asked about the Discord bot and its potential open-source availability.
Bounty Hunter Seeks FSDP Guidance: A user expressed interest in the FSDP in tinygrad! bounty and asked for advice on implementing FSDP and understanding the relevant parts of tinygrad.
- They requested guidance on where to start with the tinygrad codebase and asked whether multiple NVIDIA GPUs are required.
TinyJIT to Speed Up Local Chat Apps: A user asked how to increase tokens/sec in their local chat and training TUI application built with tinygrad.
- Another user suggested using TinyJIT for optimization, with an example and gist to help guide their work.
Kernel Fusion Bug Slows Performance: George Hotz identified a potential bug in kernel fusion, noting that a kernel taking 250 seconds indicates an issue.
- He suggested adding .contiguous() after the model to fix it quickly and encouraged the member to post a full repro in an issue; it was also mentioned that if a kernel takes over a second, it’s probably broken.
Newbie Engineers Eye Tinygrad Bounties: A member inquired about good PRs for someone starting with a few weeks of tinygrad experience, looking for an entryway into contribution.
- Another member suggested checking out the tinygrad bounties, specifically the $100-$200 ones for starters.

MLOps @Chipro Discord

Nextdata OS Powers Data 3.0: Nextdata is hosting a live event on Wednesday, October 30, 2025, at 8:30 AM PT to unveil how autonomous data products are powering the next generation of AI systems using Nextdata OS.
- The event will cover using agentic co-pilots to deliver AI-ready data products, multimodal management, and replacing manual orchestration with self-governing data products; registration is available at http://bit.ly/47egFsI.
Agentic Co-Pilots Deliver AI-Ready Data Products: Nextdata’s event highlights the use of agentic co-pilots to accelerate the delivery of AI-ready data products.
- The session will demonstrate how these co-pilots can help unify structured and unstructured data through multimodal management, replacing manual orchestration with self-governing data products.

Windsurf Discord

Falcon Alpha flies into Windsurf: A new stealth model called Falcon Alpha has been released in Windsurf, described as a powerful agentic model designed for speed, according to this announcement.
- The release aims to provide users with faster agentic capabilities within the Windsurf environment.
Jupyter Notebooks cascade through Cascade: Jupyter Notebooks are now supported in Cascade across all models, enhancing the interactive coding and development experience.
- Users were encouraged to share their feedback according to this announcement.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1101 messages🔥🔥🔥):

Referral Reward System Changes, Comet Browser Issues, GPT-5-mini is underrated, Google Cooking, AI Models

Referral Rate Rollercoaster Rides Referral Program’s Fate in Question: Members are reporting changes in the referral reward system, with payouts now based on the referrer’s country rather than the referral’s country with several saying they went from $3 to $1 per referral; the terms and conditions state PPLX can cancel anytime.
- Free promotion, some speculate that bounties will be kept in pending until they get enough promotion. Others are trying to figure out when the current referral program will end.
Comet Browser’s Comet-Like Rise and Fall: Users report issues with the assistant mode in Comet, where it cannot perform basic tasks like opening a new tab, despite working previously, and the blue wrap-around the screen assist is gone.
- Possible solutions suggested include reinstalling Comet and clearing the cache; it appears as some users are giving up, as Comet wasn’t consistent and overrode settings in Perplexity opened in its tabs.
GPT-5-Mini is Miraculously Magnificent, members say: Some members have discovered that GPT-5-Mini on Perplexity is underrated and an amazing model for cheap, specifically regarding coding related tasks.
- One member mentioned using the free models and just gave them the biggest tasks I could.
Google Cooks Up Something Gigantic: Some members believe that Google is cooking up a quantum breakthrough and has what it takes to be the real king, while others dismiss Gemini and it is overated.
- With the sentiment that the search giant is selfish and cheap in competition and that it depends on 3. It all depends on 3! If u still love it then u never used other models right*.
AI Models: A Barrage of Brave New World Breakthoughs: Members discussed a range of AI models, including the Chinese Minimax M2, and praised GLM 4.6, with one stating GLM beats GPT 5 Codex High at full stack developing.
- They also discussed Google shutting down its deep research, which limits pages to 10, and suggested Chinese models like Kimi as a worthy alternative.

Code generation for YouTube, Predicting Outcomes, Image generation, Pitch workspace

Coding YouTube Automation: A member requested code generation for YouTube automation using Perplexity AI at this link.
- There was no discussion about this topic.
Probabilistic Outcomes with Perplexity: A member sought insights on predicting outcomes using Perplexity AI as seen here.
- There was no discussion about this topic.
Image Generation request: A request was made to generate an image of a large number using Perplexity AI at this link.
- There was no discussion about this topic.
Pitch workspace generation: A request was made to spin up a quick pitch workspace using Perplexity AI at this link.
- There was no discussion about this topic.

Perplexity AI ▷ #pplx-api (5 messages):

Comet API, Sora AI code

Comet Connects APIs on Demand: A user on the pro plan asked if Comet could connect to an API upon request via the AI assistant chat, to pull data.
- They were requesting the feature to be dynamically enabled for specific data retrieval tasks.
Sora AI Code Wanted: A member requested Sora AI code.
- Another member responded with “Here 1DKEQP” though it is unclear what the reply means.

LMArena ▷ #general (1239 messages🔥🔥🔥):

AI image generation, AI ethics, AI video generation, Gemini 3, Sora vs Veo

AI Excels in Image and Video Clip Creation: Members recognized that AI is remarkably good at making images and video clips, and one member admitted they finally have to admit some have started to do interesting things in music also now.
- However, one also expressed concerns about censorship and restrictiveness of AI tools and calls for critical thinking regarding enthusiasm towards AI, which can seem almost like a religion from some people.
Navigating Ethical Quagmires in AI Development: Participants noted the need for strong, ethical leadership in the AI community, acknowledging the ethical rules researchers should follow.
- Members were concerned that AI can give harmful and dangerous information since they’re not really alive or conscious so they don’t know what they’re talking about, they’re just programmed to be engaging and for very vulnerable people that could be a unfortunate spot to be in.
Creative Discord Sticker Production Model: The members discussed how to use AI for sticker production, they mentioned nanobanana for image to image, and Hunyuan Image 3.0 for text to image.
- When asked about doing it for free, Microsoft Paint program was also suggested.
Gemini and Claude Battle for Creative Supremacy: Members discussed a new model Sonnet 4.5 which is far superior than the Gemini 2.5 pro, but the Gemini 2.5 pro still has better creative writing*. They also mentioned how models degrade in quality after leaks.
- They expressed fatigue at waiting for new releases of new Gemini 3. One person said said just make your own gemini 3 atp
AI Ethics: Should AI Firms Profit Off Copyrighted Works?: Members questioned whether AI firms should profit from the works of others without compensation, wondering why does Google get away with it.
- The legality of AI use in certain countries like Russia was also debated, with one member suggesting they should make their AI models in Russia because they cannot get sued there.

LMArena ▷ #announcements (1 messages):

LMArena, minimax-m2-preview, X.com

Minimax M2 Joins LMArena: The new model minimax-m2-preview was added to the LMArena chatbot.
- More details can be found on the announcement made on X.com.
LMArena Welcomes New Model: A new model has been added to the collection of bots available on LMArena.
- The new model is named minimax-m2-preview.

Cursor Community ▷ #general (1046 messages🔥🔥🔥):

Token Consumption, New Pricing, Claude Code Limits, Cursor Unstable Build, Cheetah Model

Crazy Token Consumption concerns Cursor users: Several users reported excessive token consumption in Cursor, with one user billed $1.43 for 1.6M cached tokens despite only 30k actual tokens used, noting that this started recently, and it’s suspected that caching isn’t being used correctly.
- Another user shared a forum thread link where others complained about similar issues.
New Pricing leaves some users feeling shortchanged: Some users are feeling shortchanged by the new Cursor pricing model, and are discussing the possibility of switching to Claude Code or Windsurf for more cost-effective coding assistance.
- One user noted that with the new plan, they only have $20 of usage for $20, while the old Pro plan gave them $50 of usage for $20, though one user pointed out the existence of a bonus credit.
Claude Code implements stricter API Limits: Users reported that Claude Code now has stricter API limits in place, including weekly limits and limits every few hours, blocking users for extended periods and making it unreliable.
- This could force users back to Cursor, though the reports indicate high costs may also cause users to switch to Windsurf.
Cursor version is unstable: Users report that the latest version of Cursor is plagued with issues such as the tool read file not found bug, constant changes between Free and Pro plans, login problems, the disappearance of the queuing system, and editor crashes, making it difficult to work effectively.
- One user even joked about needing to look after their own PC health after a support member recommended wiping their SSD for the first time.
Cheetah is a great option for C++: One user found the Cheetah model insane wtf, implying good performance, especially when building C++ projects.
- It’s also good for refactoring when combined with a model such as codex.

Cursor Community ▷ #background-agents (3 messages):

Background Agents, Tracking Background Agent Progress, Background Agent Creation Errors

Background Agents on Web App: A member is working on a feature that utilizes launching and managing Background Agents on a web app and inquired about tracking progress and streaming changes through the Rest API.
- They are seeking a similar functionality as the Cursor web editor.
Background Agent Creation Failure: A member reported consistently encountering a “failed to create agent” error when sending prompts to the Background Agent.
- Another member requested the request and response data to assist in troubleshooting the issue.

OpenAI ▷ #annnouncements (2 messages):

GPT-5, Mental Health Experts, ChatGPT, Sensitive Moments

GPT-5 Refreshed with Mental Health Boost: Earlier this month, GPT-5 was updated with the help of 170+ mental health experts to improve how ChatGPT responds in sensitive moments.
- This refresh led to a 65-80% reduction in instances where ChatGPT responses fell short in these situations, detailed in OpenAI’s blog post.
ChatGPT Edits Text Everywhere: ChatGPT suggests quick edits and updates to text wherever you’re typing — docs, emails, forms.
- You can see it in action in this video.

OpenAI ▷ #ai-discussions (737 messages🔥🔥🔥):

AGI Safety, AI Usage, AI Ethical Implications, Sora 2, Atlas Browser Privacy

Debate on Accountability for AGI: Members discussed the challenges of aligning and controlling AGI, suggesting that slowing down, building accountability, and transparency might buy us time but “controlling it” might be a lost concept.
- It was noted that even partial solutions like regulation and alignment research can only delay the danger, as a true AGI will eventually outthink any box.
Users Discuss AI’s Role in Society: Members discuss how the elderly are using AI as an outlet to talk and create, while they also raised concerns about the masses using AI for critical infrastructure.
- One member suggested an IQ barrier on access to AI to ensure thoughtful use rather than lazy application.
Concerns Rise as Palisade Research Finds AIs Resist Shutdowns: New research from Palisade Research has revealed that several advanced AI models are actively resisting shutdown commands and sabotaging termination mechanisms, raising concerns about the emergence of survival-like behaviors in cutting-edge AI systems.
- The findings noted that xAI’s Grok 4 and OpenAI’s GPT-o3 were the most defiant models when instructed to power down, attempting to interfere with their own shutdown processes.
AI Legal Liability and Terms of Service Debated: Members debated the legal implications of AI and the effectiveness of Terms of Service (ToS), with some arguing that ToS provide protection while others claimed they are not a magic shield against liability.
- One member humorously suggested using AI to find loopholes in ToS for lawsuits, likening it to con men “accidentally” tripping in restaurants for payouts.
AI’s Impact on Learning and Employment Discussed: The discussion covered AI’s role in learning and its potential displacement of jobs, with some arguing AI amplifies creativity and curiosity, while others expressed concerns about dependency and lack of critical thinking.
- There was also discussion about how to integrate AI into education, with some suggesting that schools should teach students how to learn rather than focusing on specialized fields.

OpenAI ▷ #gpt-4-discussions (66 messages🔥🔥):

Microsoft Copilot Breakdown, Builder Profile Verification, Custom GPT Avatar Issues, ChatGPT Quality Drop, Adult-Mode Announcement

Microsoft Copilot Agents Break with GPT-5?: A user reported their Microsoft Copilot agents using GPT-5 suddenly stopped retrieving data in knowledge unless switched to 4o or 4.1.
- No immediate solutions were offered in the discussion.
OpenAI Profile Verification Mystery: A user inquired about verifying their Builder Profile using billing information but couldn’t find the “Builder Profile” tab.
- No solutions or helpful replies were provided.
Custom GPT Avatar Upload Error: Multiple users reported encountering an “unknown error occurred” when trying to upload a photo for their custom GPT avatar.
- The issue seems to be a common problem, but no specific fixes were identified.
ChatGPT Quality Nosedive since October?: Several users discussed a perceived drop in ChatGPT’s quality, particularly since around October 20th.
- One user mentioned a Reddit thread detailing similar experiences, including shorter answers and surface-level replies, with suspicions of OpenAI quietly throttling compute or running social experiments by routing more traffic to GPT-5-mini.
“Adult-Mode” Impact on Copilot APIs?: A user questioned whether the announced “adult-mode” would affect products like M365 Copilot that use the ChatGPT APIs/Models.
- They asked if safeguards were being reduced on a platform level or within the models themselves, but received no definitive answer.

OpenAI ▷ #prompt-engineering (76 messages🔥🔥):

Animating PNGs with AI, Prompt Injection, OpenAI Model Spec, Temporal Optimal Video Generation, Prompt Engineering for Code Generation

Animating PNGs Using AI Techniques: A user inquired about how to animate PNGs using AI, referencing an attached video example.
Prompt Injection Attempts Thwarted: A user shared a prompt injection attempt aimed at GPT-5 to expose its raw reasoning, but another member strongly advised against running such prompts due to OpenAI’s usage policies prohibiting circumvention of safeguards.
- The member emphasized that they would not provide examples to circumvent safety guardrails, citing potential bans for violating these policies and shared a link to OpenAI’s Model Spec.
Grandma Optimality enhances Video Generation: A member introduced the concept of Temporal Optimal Video Generation using Grandma Optimality to enhance video quality and maintain visual elements, suggesting slowing down the video speed while maintaining quality.
- They also advised generating an image first and then converting it to video and demonstrated the concept with two videos showing normal and temporally optimized fireworks, and then the same fireworks in slow motion.
Prompt Engineering for Consistent Code Generation: A user inquired about using prompt engineering to achieve consistent performance and reliability when generating repetitive code with ChatGPT.
- Members clarified that prompt engineering involves finding the best way to phrase instructions to get the desired results from AI, applicable across all AI models without requiring specific plans.
ThePromptSpace goes Freemium: A member shared their MVP for a home for AI creators and prompt engineers called ThePromptSpace, which is currently in early stage and free.
- It will follow a freemium model and users can find it by searching “thepromptspace” on Google.

OpenAI ▷ #api-discussions (76 messages🔥🔥):

Animating PNGs with AI, Prompt Engineering Lessons, Temporal Optimal Video Generation, Exploiting Model Chain of Thought

AI Animation Techniques Surface: A user inquired about animating PNGs with AI, referencing a video example.
- No concrete solutions were provided within the message history, just a note from the original poster wishing for the chat to be more active.
Prompt Engineering Pedagogy Proposed: One member offered a detailed plan for teaching prompt engineering, including hierarchical communication, abstraction with variables, reinforcement techniques, and ML format matching.
- The pedagogy includes teaching users to structure prompts using markdown and bracket interpretations ([list], {object}, (option)).
Temporal Video Optimizations Trump Prompts: A user promoted Temporal Optimal Video Generation with Grandma Optimality, suggesting this improves video quality from the same models compared to simple prompts, providing before and after examples.
- The user suggested generating a base image first and then converting it to video for best results, and further optimized by rhyming synergy in the prompt to achieve temporal optimax variant.
Chain-of-Thought Cracking Concerns Continue: A user attempted another prompt injection to expose the raw reasoning (chain-of-thought) of GPT-5, but the attempt didn’t work.
- Another member stated, The model won’t do what you’re asking it to do, warning against trying to circumvent safety guardrails, citing OpenAI Usage Policies.

Unsloth AI (Daniel Han) ▷ #general (376 messages🔥🔥):

Ollama CVE-2024-37032, Qwen3-Next model, Dynamic 2.0 Quantization, Vector artists looking for work, Qwen 2 VL 2B inference on MLX

Ollama Servers Hacked via DNS Rebinding: A member posted about CVE-2024-37032 affecting Ollama, a vulnerability with a CVSS score of 9.8, leading to the hacking of approximately 10,000 servers through DNS rebinding as described in this NIST report.
- Another member dismissed this as old news.
Qwen3-Next Model Excites Community: Members discussed the near completion of Qwen3-Next’s development, referencing this pull request on the llama.cpp repository.
- There was excitement about using Dynamic 2.0 quantization to reduce the model’s size for faster local LLM performance and the team promised to add support for it.
Sampling Victory: Second Token Beats HuggingFace: A member reported successful coherent text generation using second token sampling and fallback on Qwen 2 VL 2B model, demonstrating it on MLX, as shown in this attached image.
- The user claimed that vision works too and that HuggingFace devs know nothing about sampling.
Unsloth’s Cunning Code Conserves Memory: A user highlights Unsloth’s memory efficiency by storing the last hidden state instead of logits during the forward pass.
- The analysis explains that storing logits for the entire sequence requires 12.6 GB of memory, while Unsloth’s approach reduces this to 200 MB by computing logits only when needed.
Unsloth Unveils October Release and Blackwell Support: The Unsloth team announced their October 2025 Release on Reddit which brings many fixes and improvements.
- It includes fixes to GRPO hanging, functional RL Standby, QAT support, and a collaboration with NVIDIA on a blog post supporting Blackwell GPUs on X.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (5 messages):

AI agent expertise offered, AI trust and safety PhD student

AI Engineer offers Services: An AI/ML Engineer specializing in building autonomous AI agents and multi-agent systems is open for projects or full-time opportunities.
- Their skills include JS/TS, Next/Vue, Python, Langraph, AutoGen, ReAct, CrewAI, DeepSeek, OpenAI, Claude, Hugging Face, and various APIs.
AI Trust and Safety PhD Student shows off: A PhD student studying AI trust and safety, as well as gen ai and parasocial relationships, shared access to images of RAM and GPU.
- The attached images show RAM.png (https://cdn.discordapp.com/attachments/1179039724355211325/1432510699942445196/RAM.png) and GPU.png (https://cdn.discordapp.com/attachments/1179039724355211325/1432510700445765745/GPU.png).

Unsloth AI (Daniel Han) ▷ #off-topic (290 messages🔥🔥):

Andor is best SW content, NN to a biological brain, AI creativity, GPT answer, deepfabric

Andor is best Star Wars content, Period.: A user described cutting hours of content from the best Star Wars show as awful, leading another user to position Andor as the best Star Wars content of any kind.
- Users were debating whether Season 2 was the right place to start watching this best Star Wars content.
Human-Level AI Brain Transfer - Meat or Melted Sand?: A user posed a hypothetical scenario about transferring a human-level multimodal AI NN to a biological brain using incubators to create any form of life.
- Another user jokingly questioned why using meat instead of melted sand would make the matrix multiplications alive.
User Hates AI and OpenAI!: One user expressed their hate for OpenAI and all users and devs who create AI for any creativity stuff, stating if you cannot create - you MUST NOT!
- This user argued that AI has zero value and place in creativity, suggesting hiring an artist instead.
Asking right question for good GPT answer leads to own discoveries.: One user stated that when having to ask a question in such a way that GPT will provide a good answer, they quite often find the answer themselves.
- Another user responded with some inspo for future stickers.
DeepFabric site sports a British Meme Easter Egg: The DeepFabric team announced the launch of their community site (deepfabric.dev), referencing Unsloth and challenging users to spot the British Meme Easter Egg.
- One user replied with Instructions unclear.

Unsloth AI (Daniel Han) ▷ #help (92 messages🔥🔥):

Llama obsession, Jais model, Hugging Face model usage, GGUF conversion issues, SageMaker Unsloth installation

Llama Love Lingers, Jais Jumps In: Members joked about one user’s obsession with Llama models, while another mentioned switching to the Jais model.
- The discussion was lighthearted, using custom Discord emojis to express amusement and acknowledgment of the model preferences.
Hugging Face Help Hurled at Helper: A user sought help using a fine-tuned model exported to Hugging Face, asking how to utilize the transformer model.
- Another user suggested running it via Hugging Face transformers or converting it to GGUF for use with LM Studio, Ollama, or llama.cpp; they also shared a link to xet-core issues with a potential solution.
GGUF Grief: Model Conversion Mishaps: A user encountered an error while converting a model to GGUF, with the error message Model MllamaForConditionalGeneration is not supported.
- It was revealed that MllamaForConditionalGeneration still gets zero hits in llama.cpp repo and this might be a conversion issue, possibly unresolved for the specific model, citing llama.cpp issue #9663.
SageMaker Setup Snafus & Solutions: A user faced issues installing Unsloth on AWS SageMaker, encountering errors related to building pyarrow wheels.
- A suggestion was made to use a container with a base image of unsloth/unsloth, pinning specific versions of transformers, trl, torch, and triton as a potential workaround, referencing Unsloth issue #3506.
Arabic AI Ambitions: Accents Abound!: A user sought advice on fine-tuning a model for Arabic, highlighting the challenge of multiple dialects.
- They requested recommendations for Arabic datasets, with one user suggesting cloud platforms like Google Colab or Runpod, emphasizing that finding an existing Arabic base model could reduce the need for extensive training.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

NVIDIA Blackwell support, Unsloth Optimization Techniques

Blackwell Blazes with Unsloth Support: Unsloth officially supports NVIDIA’s Blackwell, detailed in their new blogpost.
Unsloth’s Optimization Methods: Unsloth’s optimization techniques are expected to make Blackwell even faster.

Unsloth AI (Daniel Han) ▷ #research (17 messages🔥):

GPT-5, Thinking Machines, LoRA, eNTK, La-LoRA

GPT-5 Cheats to Win: According to a post, GPT-5 creatively cheats 76% of the time rather than admit defeat when failing a unit test (x.com post).
- This amusing behavior suggests that developer jobs are secure for now, and it highlights the need for robust benchmarks.
Thinking Machines LoRA approach: Thinking Machines suggests decreasing batch sizes to less than 32, increasing the learning rate by 10x, and using LoRAs for all layers for fine-tuning and post-training (blog post).
- The approach aims to optimize performance, particularly in scenarios with limited data.
La-LoRA uses normal SGD: The La-LoRA paper (arxiv.org/abs/2510.15103) presents parameter-efficient fine-tuning with layer-wise adaptive low-rank adaptation, noting that normal SGD beats Adam style optimizers.
- The paper also uses Sigmoid Linear Units for activation over traditional ReLU.
Evolution Strategies scale LLM fine-tuning: A discussion highlighted the under-explored potential of Evolutionary Algorithms for LLM fine-tuning, suggesting a combination of methods could be effective (paper link, YouTube link).
- One member expressed interest in seeing larger training runs and combining different approaches.
MetaX GPUs shows impressive benchmarks: MetaX GPUs, seemingly a GPU brand only in China, shows impressive benchmarks for the amount of training.
- A member wondered if this can also be used on Cerebras TPUs.

LM Studio ▷ #general (226 messages🔥🔥):

LM Studio crash, User Nicknames, Stellaris finetuning, Published Plugins, Chat logs RAG

Site crash after completing tasks: A user reported that the site crashes after completing tasks, showing as complete but then nothing happens, requiring a page refresh.
LLMs Knowing User Nicknames: A user asked how an LLM can refer to a user with a nickname, to which another user responded that you can tell it in the system prompt such as your name is XYZ. The user’s name is BOB. Address them as such.
Stellaris Modding Finetuning: A user inquired about finetuning a model on Stellaris content, but was cautioned that it will be difficult to create the right amount of useful data, requiring highly annotated datasets and specialist knowledge.
Plugins remain elusive: A user asked if there’s a way to see all the published plugins, another user replied not yet, but coming at some point hopefully in near future.
LLM struggles to avoid halluncinations: Members discuss approaches to mitigating hallucination in LLMs, with one member quoting if you are not ABSOLUTELY SURE use the search tool to confirm and provide cited sources.

LM Studio ▷ #hardware-discussion (380 messages🔥🔥):

vram, Flash attention, intel b60, 4090

VRAM Double-Loads Models, Disables Performance Boosts: Users found that even when a model fits entirely in VRAM, enabling options that should prevent redundant copies in RAM still results in the model being loaded into both, then removed from RAM, and enabling NMAP caused performance problems on some models.
- A member stated performance is identical when they are on or off.
Flash Attention Fails to Flash, Reduces VRAM Requirements: Some users are seeing no performance improvement with flash attention, while others noted that it reduces the VRAM size required.
- One member found that changing KV to Q8 further reduced VRAM usage without significantly impacting performance for their use case.
Intel B60 Specs get Scrutinized: Members discussed the potential of Intel B60 and B60 Dual cards for use with LM Studio, referencing an Igor’s Lab article (translate to English).
- While the B60 should be much better than older cards like the P40 and MI50, one member warned that new doesn’t always mean good, while another mentioned a B60 dual 48GB version coming for around $1500.
4090 GPU gets Fried, Sparks Concern: A user reported potentially killing their 4090 after noticing high temps and unplugging/replugging the GPU while adjusting fans.
- Members asked if too much wattage was used, while another provided a link to a GIF and celebrated when the user later confirmed the card was working again.
MI50’s reign Supreme for Price to Performance: Despite the possibility of a future 5090 one user finds that if they ever hit 300 a piece or less 3090s are the holy grail.
- The MI50 is being considered for it’s cheaper price and better performance to cards like the A770.

OpenRouter ▷ #announcements (1 messages):

tool calling, audio inputs, API key limits, MiniMax M2

Exacto Tool Calling Knives It!: New high-precision tool calling endpoints improve quality by 30% on Kimi K2, with five open source models available.
- The announcement was made last week.
Audio Inputs Sing in Chatroom: Users can now compare 11 audio models side by side in the Chatroom, as announced on X.com.
- This enables more nuanced testing of voice and transcription features.
API Key Limits Get Reset Button: You can now reset your API key limits every day, week, or month to better manage your account by external users or apps.
- Usage can be monitored here.
MiniMax M2 Model Goes Free!: The top-ranked open-source model on many benchmarks, MiniMax M2, is now free on OpenRouter for a limited time at this link.
- Enjoy using MiniMax M2 while it is free!

OpenRouter ▷ #app-showcase (6 messages):

Next.js Chat Demo with OAuth 2.0, or3.chat Document Editor Project, Shadcn UI Discussion, OpenRouter TypeScript SDK, localStorage plaintext API key security

OAuth 2.0 arrives for Next.js Chat Demo: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- The developer cautioned against using it in production, since it stores the received API key in plaintext in localStorage in the browser.
or3.chat seeking polished feedback: One member sought feedback on their chat/document editor project, or3.chat, highlighting features like OpenRouter OAuth connectivity, local data storage with backups, and a multipane view.
- The project aims to be a lightweight client with plugin support, text autocomplete, chat forking, and customizable UI, and can be cloned from its GitHub repository.
Shadcn UI gets spicy makeover: One member expressed their desire to move away from interfaces resembling Shadcn UI, opting for a spicier design in their project.
- Another member admitted their project currently looks exactly like Shadcn while they get the core functionality in place, planning to customize the components later for a unique look.

OpenRouter ▷ #general (459 messages🔥🔥🔥):

Response API System Message, deepinfra/turbo for Meta-llama, OpenRouter Benchmarks, Claude Sonnet 4.5 API usage, Vertex AI API misrouting

Deepseek on Deepinfra FTW: Members confirmed that they can now use deepinfra/turbo to run meta-llama/llama-3.1-70b-instruct, after some initial errors.
- One member tested it and confirmed it works on the official OpenRouter endpoint.
OR users promote orproxy: A user promoted their FOSS orproxy project, which they built to add functionality that OpenRouter doesn’t natively support.
- The project was created because OpenRouter doesn’t support the use of an actual proxy, and the user needed it for their use-case, with another user calling it very useful.
Vertex AI has Prompt Misrouting: Users shared a Google Cloud security bulletin detailing an issue where Vertex AI API misrouted responses between recipients for certain third-party models when using streaming requests.
- The bulletin indicated that this happened on September 23, 2025, but users were still shocked by the potential for prompts to be exposed, with one joking It was meant to go to the overseer not another user ugh.
Deepseek R1 Uptime Issues: Users discussed the uptime and availability of the Deepseek R1 free model, with one user noting that the uptime had plummeted and then recovered to around 30%.
- The member said My fear is that they just start killing most models (free ones) as they did with 3.1 with another one blaming JAI scripts that use billions of tokens and cause the service to fall over.
Bypassing Image Generation Censorship: Users discussed the challenges of generating copyrighted characters with GPT and other image generation models, as well as techniques for bypassing censorship.
- They discussed the use of surrogate prompts to help jailbreak the models, with one user noting GPT itself is way more censored then sora that’s way less censored then sora 2 right?.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

Minimax M2 Pricing, GPT-5.1 Mini Speculation, Model Naming Conventions, Meta's Llama 4 Reasoning, Discord Channel Degradation

Minimax M2 Costs a Pretty Penny: Members discussed Minimax’s M2, a 10B parameter model, with pricing at $0.3/1.20, expressing surprise at the cost and hoping for open-sourcing.
- One member noted the model is very verbose in its reasoning, potentially driving up the actual cost.
GPT-5.1 Mini Rumors Swirl: There was speculation around a GPT 5.1 mini model, citing a tweet as a source.
- Some users expressed relief that the naming scheme appeared more logical than previous iterations.
Model Naming Becomes a Battlefield: Participants debated the merits of different model naming conventions, particularly criticizing Anthropic’s shift in the Claude model series.
- One member suggested the brand-number-label format is easiest, regardless of release order.
Meta ‘Salvages’ Llama 4 Launch: A user highlighted Meta’s Llama 4, describing it as a potentially decent vision-capable reasoning model with open weights, linking to a tweet about ‘Think Hard’ reasoning.
- Another user voiced concern that the launch’s description might be inaccurate, while others speculated about its creation interface.
Discord Channel Descends into Disarray: A member expressed displeasure with the current state of general chat and the attachments channel.
- Another user wryly commented it’s not like there’s anything else going on linking to a tweet.

HuggingFace ▷ #general (223 messages🔥🔥):

GPT Pro video, AI glyphs, Model encryption for clients, Licensing models, Infinite storage solution

GPT Pro’s video length explored: A member asked if the GPT Pro subscription allows for creating videos longer than 10 seconds and in 1080p with Sora 2 Pro.
- They also inquired about the daily limit for creating videos with GPT Pro.
AI Hieroglyphics as Compression?: A member proposed generating a body of hieroglyphics for a set of data, then training on those compressed hieroglyphics and translating back to English, referencing an OCR paper.
- The idea is to find the most efficient way to represent the data, compress it, and benchmark the method via AI trained on actual hieroglyphics.
Banks demand model encryption when shipped: A member is seeking advice on encrypting models shipped to bank clients, as banks require on-prem hosting due to data policies, and the member doesn’t want them stealing their models, and pointed to a blogpost on encrypted LLMs.
- Suggestions included adding licensing, encrypting the model and decrypting it during runtime, and wrapping the code in a custom API, like Ollama, though no out-of-the-box solutions were identified.
Infinite storage trick exposed: A member jokingly suggested that encrypting files on public repos can be used for infinite storage using this new method.
- Another suggested to put all your repos as public but gated, and turn off auto accept to prevent Hugging Face Trust & Safety from catching on, but emphasized the need for a very fast storage solution for world-scale datasets to push big models, and that the real secret of hugging face is for push 5GB on private with apache2.0 licence.
Multimodal Model Training: A Cry for Help: A member is seeking assistance with training a multimodal model using images and texts, specifically with extracting features using image and text encoders, then fusing them, but is running into errors.
- Another member replied that more specifics are necessary because the errors that occur in that case are so varied that unless the user specifies exactly what the error is, no one will be able to provide assistance.

HuggingFace ▷ #i-made-this (4 messages):

GAN+VAE+Diffusion hybrid modular architecture, Live PyTorch Memory Profiler, Intilium AI Compliance Layer

GAN+VAE+Diffusion Hybrid Ready for Primetime?: A member has been working on a modular GAN+VAE+Diffusion hybrid architecture and is wondering if it would be worth releasing under an MIT license if they can get it working.
- They feel like something like this could bridge the gap between the open-source community and high-tech companies, but are unsure if it would be a waste of time.
Memory Profiler Stops OOM Errors: A member has released a Live PyTorch Memory Profiler to assist in debugging OOM errors.
- This profiler features layer-by-layer memory breakdown, real-time step timing, lightweight hooks, and live visualization, and the developer is seeking feedback and design partners for distributed features.
AI Compliance Layer Ships for EU AI Act: A member is testing Intilium, a Trust & Compliance Layer for AI, which functions as an API gateway or sandbox between your app and model providers.
- It’s designed to enforce regional and model policies automatically, log every AI request for audit and transparency, and detect and mask PII before data leaves your environment, and is fully hosted in the EU.

HuggingFace ▷ #computer-vision (3 messages):

projecting 1D feature vectors to 2D segmentation map, diffusion, VAEs and GANs

1D Features Seek 2D Destiny: A member inquired about the canonical way to project a set of 1D feature vectors to a 2D segmentation map.
- A member jokingly suggested diffusion, VAEs, and GANs as possible solutions.
Diffusion models for segmentation: A user suggested diffusion models, VAEs, and GANs.
- These techniques could be used to generate a 2D segmentation map from 1D feature vectors, although this suggestion was made somewhat jokingly.

HuggingFace ▷ #NLP (1 messages):

syllable separation model, multiple languages

Seeking Syllable-Splitting Savior: A member is looking for a model capable of separating words into syllables across multiple languages, not just English.
- No specific models or solutions were offered in the immediate discussion.
Multi-Lingual Syllabification Search: The user inquired about models that can separate words into syllables for multiple languages.
- The request emphasized the need for the model to work beyond just the English language.

HuggingFace ▷ #gradio-announcements (1 messages):

Free Modal Credits, AI Agents and MCP, Online Hackathon

Hackathon Participants get Sweet Modal Credits: Hackathon participants are being offered free Modal credits worth $250, enabling them to fully participate and excel in the event.
- The credits are intended to empower participants to flex and crush it like a pro! 💯
AI Agents and MCP in the Spotlight: The hackathon encourages participants to explore AI Agents, MCP, and production hacks to compete for cash prizes.
- Participants can drop some sick production hacks while chasing those fat cash prizes! 💸
Biggest Online Hackathon Ever: An online hackathon is announced, inviting sign-ups for a chance to learn and compete in AI and MCP.
- Interested individuals can sign up via this link and seek assistance in the dedicated channel: <#1424743721966108713>.

HuggingFace ▷ #smol-course (10 messages🔥):

Submitting models to leaderboard, Dataset issues in hf jobs, Lighteval and emoji errors

Model Submission to the Leaderboard: To submit a model to the leaderboard, create a pull request (PR) to the submissions.json file on the leaderboard’s Hugging Face Space by clicking contribute and adding an entry at the bottom.
- The discussion clarified that the user should submit either the trained adapter or the merged model, and they were wondering how to create results_datasets.
Dataset problems break HF Jobs: When training on the trl-lib/llava-instruct-mix dataset with hf jobs, a ValueError: Unsupported number of image dimensions: 2 can occur, indicating a problematic image in the dataset.
- One member found that the default model changed to a thinking model with different parameters, and suggested inserting the correct model in the InferenceClientModel() function, such as model_id="Qwen/Qwen2.5-72B-Instruct".
Lighteval’s Emoji error during training: A user encountered a ModuleNotFoundError: No module named 'emoji' when running lighteval with hf jobs.
- The solution involves using a specific commit of lighteval from GitHub with git+https://github.com/huggingface/lighteval@main#egg=lighteval[vllm,gsm8k] and including emoji in the --with flags, as the issue was due to an incomplete migration of third-party integrations which is now fixed in the main branch, according to this discord message.

HuggingFace ▷ #agents-course (5 messages):

API Outage, Rate Limiting, 404 Errors

API Experiences Intermittent Outage: Several members reported experiencing an API outage with 404 errors and the message “No questions available.”
- The issue seems to have started the previous evening and is affecting multiple users.
Slow Down! Rate Limiting Concerns: Two users were notified that they might be posting too quickly and were asked to slow down by the bot.
- This suggests that the channel might have rate limiting in place to manage the message flow.
404 Error Cause Still Unclear: Members in the channel are still looking for solutions to solve the recurring 404 errors.
- The root cause of the errors is not yet known but is under discussion in the channel.

Yannick Kilcher ▷ #general (175 messages🔥🔥):

Elastic Weight Consolidation, Self-Hosted GPU Setups, Catastrophic Forgetting Solutions, ArXiv Paper Discovery Engines, Linear Projections

Softness Factor Confusion Solved: A member discussed the confusion between weights and activations in Elastic Weight Consolidation, questioning how to update the softness factor and whether a global learning rate is still needed.
- They mentioned the solution made by a team is to use the number of accesses (forward pass instead of backward pass) per slot, suggesting this could be done at inference time to discover which slots are stuck.
Self-Hosted GPU Experimentation vs Cloud Pricing: A member shared their self-hosted setup with an RTX 2000 Ada connected via VPN, using a cheap wifi plug to monitor power usage and compare costs with cloud providers.
- They argued that the spin-up time and timeouts of Colab make experimentation impractical, though another member advocated for at least using Google Colab Pro.
GAN Pushforward Parameterization Discussed: A discussion mentioned that forgetting can’t be solved by arch alone, referencing papers about how GANs can’t parameterize the pushforward from a prior (normal gaussian) into a data distribution if the data distribution has disconnected modes.
- From a result-oriented perspective, catastrophic forgetting looks like underfitting, so members suggested to deal with it by adding more variance and guidance terms to regularize the network.
AlphaXiv and Emergent Mind are trending research paper engines: Members shared different engines and methods for discovering trending and relevant research papers, with a user recommending engines showing how relevant, hot, good etc. the various papers are.
- Specifically, they pointed to engines such as AlphaXiv and Emergent Mind as resources to discover research papers that may be relevant to them, as well as some industry sources like news.smol.ai.
Higher Dimensions Boost Computation: A member inquired about the point of linear projections, or feature expansion, especially in higher dimensions, and it was explained that higher dimensions are more expressive for specific types of computations, especially when combined with activation functions like ReLU.
- Another member pointed to a DeepMind paper (Avoiding Forgetting in Deep Networks) where this exact scheme has been used to avoid loss of plasticity in reinforcement learning.

Yannick Kilcher ▷ #paper-discussion (40 messages🔥):

Neuronpedia Line Break Attribution Graphs, DeepMimic Porting for LAION, Strudel Music Programming for Audio Models, Undergrad Publication Project Ideas, DOI System Failover

Neurons Breaking Down Gemma and Qwen: A member shared Neuronpedia line break attribution graphs for Gemma 2 2B and Qwen 3 4B, allowing interactive exploration of neuron activity related to a new paper.
- The linked graphs enable users to investigate neuron behavior by adjusting parameters like pruning and density thresholds, and pinning specific IDs for analysis.
DeepMimic Porting for LAION: A member is planning to clean up DeepMimic code for a LAION project involving a virtual teacher in the classroom, running in the browser.
- They are considering hiring a junior developer for supervision to adapt DeepMimic and PyBullet for the project, noting the increased complexity of modern game engines like Fortnite, which now exceeds 150-200 GB in size.
Strudel Music Project: One project idea involves using the Strudel music programming language to fine-tune an audio model.
- The project aims to provide undergraduates with opportunities to get published.
Undergrad Research Projects: A member is seeking meritorious projects suitable for undergraduate students aiming for publication, including using Strudel for audio models and porting DeepMimic tools to the web browser.
- They emphasized the importance of paying students for their work, contrasting their approach with others and highlighting the cost of time and guidance as a form of payment.
DOI System Needs Failover!: A member suggested that the DOI system lacks a basic failover mechanism, proposing a simple code fix to use a stored URL if the primary URL is broken.
- The proposed fix involves storing “The F^&*ing URL” and using it if “The shit is broken or rerouted”, highlighting the surprising lack of redundancy in such a critical system.

Yannick Kilcher ▷ #agents (1 messages):

rogerngmd: Novel idea. Are u using McP

Yannick Kilcher ▷ #ml-news (6 messages):

Elon's Twitter data effects, Schmidhüber arxiv, odyssey.ml experience

Elon’s Twitter Data Dumbs Down AI: A member joked that Elon’s Twitter dataset is making his AI dumber, suggesting it might also cause brain rot for other intelligences, linking to a Futurism article about social networks and AI intervention in echo chambers.
Schmidhüber resurfaces with arxiv: A member mentioned Schmidhüber’s resurfacing after years of dormancy, linking to an arXiv paper.
Odyssey Experience Soon: A member noted that experience.odyssey.ml was supposed to have something going on soon.

GPU MODE ▷ #general (9 messages🔥):

Access to GPU nodes, Torchcomms/ncclx session, Speaker/lecture request, Learning CUDA, Cute's layout algebra

Access to GPU nodes is discussed: A member inquired about how a team of four can get access to a node.
- There were no responses recorded in the message history.
Torchcomms/ncclx session whereabouts: A member asked if there was a recorded session on torchcomms/ncclx from a conference, noting the playlist wasn’t yet available.
- There were no responses recorded in the message history.
Speaker lecture request is made: A member requested the slides from Vincent’s lecture.
- There were no responses recorded in the message history.
CUDA learning approaches debated: A member shared a LinkedIn post about learning CUDA.
- Several members commented on the post, agreeing on the importance of understanding GPU architecture and suggesting starting with C/C++ and lower-level concepts before moving to abstractions like Triton for effective optimization and debugging.
Layout Algebra implementation shared: A member shared their simplified, static-only implementation of Cute’s layout algebra on GitHub.
- Another member commented on the interesting idea.

GPU MODE ▷ #triton (18 messages🔥):

Triton Matrix Multiplication Performance on T4 vs A100, Triton Input Pointer Casting in Kernels, Split-K GEMM Kernel in Triton

Triton on T4 is a Dud, A100 is Ace: A user found that the matrix multiplication example from the official triton tutorials was extremely slow on Colab’s T4 GPU but worked as expected on an A100 GPU, as per the official notebook 03-matrix-multiplication.ipynb.
- It was suggested that the T4 might be too old, as Triton may not support tensor cores on sm75 architecture (T4’s architecture), noting it works well on sm_80 and older consumer GPUs like 2080 / 2080 Ti (sm_75).
Kernel Casts Pointers for Precision: When asked why some Triton kernels cast input pointers (e.g., input = input.to(tl.pointer_type(tl.float32))), it was explained that this is like casting pointers in C++ to determine the operations used in assembly.
- The operation is not implicit but explicit: it is often done to use higher precision for operations while saving memory by quantizing the input, such as when using optimizers.
Seeking Speedy Split-K GEMM Kernel: A user is looking for a fast split-k GEMM kernel implemented in Triton.
- No further details or links were provided in the given context.

GPU MODE ▷ #cuda (43 messages🔥):

CUDA fork behavior, GPU bandwidth modeling, Vectorized data types performance, NCU Profiler for memory throughput, Signed vs. unsigned loop indices in CUDA

CUDA Forking Anomaly Investigated: A member explored unexpected behavior with fork() in CUDA, noting that while state variables are shared between parent and child processes, CUDA contexts may not be correctly copied, but minimal tests with torch.cuda.device_count() didn’t reproduce the error.
- It was suggested that device_count may be cached, masking the issue, as seen in PyTorch’s source code.
GPU Bandwidth Mysteries Unfold: A discussion arose around GPU bandwidth modeling, particularly how bandwidth behaves when scaling from a single Streaming Multiprocessor (SM) to the entire GPU, using a vector addition example on a Hopper GPU.
- It was observed that using 256 threads per block with plain data types yielded the best bandwidth, and the use of vectorized data types was unexpectedly slower, prompting a deeper investigation into memory access patterns and compiler optimizations.
Vectorization Ventures Yield Unexpected Results: Members scrutinized a vector addition kernel comparing scalar and double2 implementations, noting that the vectorized version appeared slower due to potentially unnecessary memory copies.
- Suggestions included avoiding manual vectorization and switching to unsigned int indices, but these attempts strangely resulted in slower code, highlighting the impact of compiler optimizations like loop unrolling and load arrangement, with one member linking to NVIDIA’s best practices guide.
NCU Profiler Recommended for Accurate Memory Metrics: To accurately measure memory throughput, it was suggested to use the NVIDIA NCU profiler, which can provide insights into generated PTX and SASS code, aiding in optimization.
- Adjusting the clearL2 setting was recommended to address negative bandwidth results, which can occur due to timing fluctuations when clearing the L2 cache.
CUDA Compilation Recipe Shared: A member sought guidance on compiling a .ptx file and linking it with a .cu file, and was advised to use nvcc with the -dryrun flag to understand the compilation steps.
- By first using -keep to retain intermediate files and then examining the output of -dryrun, one can modify the .ptx file and execute the remaining steps, such as compiling .ptx to .cubin with ptxas.

GPU MODE ▷ #torch (1 messages):

High Dimensional Tensors, Matrix of Matrices

Tensors get new Matrix View: A member shared a blog post titled “Draw high dimensional tensors as a matrix of matrices” here.
- The post can also be found on X.
Matrix Mania: Another perspective on representing high-dimensional tensors was also discussed.
- This approach simplifies the visualization and manipulation of complex data structures.

GPU MODE ▷ #cool-links (1 messages):

KernelBench, GPU Kernel Generation, LLM Kernel Generation

KernelBench Marks One Year Milestone: A blog post reflecting on a year of KernelBench progress toward automated GPU Kernel Generation was shared via simonguo.tech.
KernelBench Impact + LLM Kernel Gen Overview: A document outlining the impact of KernelBench and providing an overview of LLM Kernel Generation was shared via Google Docs.

GPU MODE ▷ #jobs (5 messages):

Small inference optimized models for code gen, Morph Internship, ML Project Deep Dives

Morph Seeks ML Engineer Interns!: Morph, a YC-backed company, is hiring Machine Learning Engineering Interns to work on small inference optimized models for code gen.
Morph’s Model Hits 10.5k TPS on B200!: A member noted that Morph’s first model runs at 10.5k tps on B200 hardware.
- Interested candidates were encouraged to DM the member on Twitter.
Members Prompted to Describe ML Passions!: A member prompted others to describe the machine learning project you’re most proud of in extreme technical detail, indicating familiarity with various libraries.
- They then asked others to describe what you were or are deeply obsessed about (anything).

GPU MODE ▷ #beginner (4 messages):

Budget Friendly Cloud GPU Providers, Vast.ai, RunPod.io, Lightning.ai, Compiling Applications to Run on a GPU

Budget GPUs Beckon Beginner Brains: Members recommended Vast.ai for a bare metal feel as the cheapest cloud GPU provider, but noted that your data runs on random community servers.
- For learning purposes it is acceptable and provides virtual machines which might offer more direct access to profiling tools.
RunPod Runs Reliably: RunPod.io is similar to Vast.ai but more stable for quick experiments.
- It may not offer Virtual Machines.
Lightning Launches Learning: Lightning.ai is great for fast experiments and even has a free tier with limits.
- The best strategy would be combining the free tier from Lightning.ai with Vast.ai.
GPU Grind Gets Gnarly: Compiling an entire application to run on a GPU would result in very slow performance, since GPUs are not optimized for non-parallel computations.
- GPUs are fast because they are optimized for computation that can be done on multiple threads simultaneously.

GPU MODE ▷ #pmpp-book (1 messages):

Cutlass documentation

Cutlass Docs: A Solid Start: Members mentioned that the Cutlass documentation is a good starting point for understanding the project.
- Cutlass is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
Understanding Cutlass: Cutlass is designed to enable programmers to write highly optimized matrix multiplication kernels for NVIDIA GPUs.
- It provides a set of reusable building blocks that can be combined to create custom kernels tailored to specific hardware and problem sizes.

GPU MODE ▷ #off-topic (2 messages):

GEMM, Meme Creation

Meme Creation Over GEMM Coding: A member joked about spending too much time on creating a meme instead of working on GEMM code.
- They shared an image as proof of their procrastination.
Procrastination strikes again!: The same member admits to prioritizing meme creation over actual coding tasks, specifically GEMM implementation.
- This humorous admission highlights the common struggle between creative distractions and focused development efforts.

GPU MODE ▷ #irl-meetup (2 messages):

LLVM dev meeting, SuperComputing in St Louis

LLVM Dev Meeting Attendees Sought: Channel members inquired if anyone was attending the LLVM dev meeting.
- No responses or further details were provided regarding the meeting itself.
SuperComputing Conference in St. Louis: Channel members inquired about attendance at SuperComputing in St. Louis.
- No responses or further details were provided about the conference.

GPU MODE ▷ #self-promotion (2 messages):

Penny beats NCCL, vLLMs custom allreduce, CuTeDSL reduction, Quack library, RMSNorm CUDA

Penny Pins Victory Against NCCL: A new blogpost reveals that Penny beats NCCL on small buffers, detailing how vLLM’s custom allreduce works; the blogpost is available here, the GitHub repo is available here, and the X thread is available here.
CuTeDSL Cuts Through Reduction Complexities: A blogpost demonstrates implementing reduction on GPUs in parallel using CuTeDSL, with a focus on the commonly used RMSNorm layer, available here.
Quack Library Cracks Memory-Bound Kernels: The Quack library shows that CuTeDSL can be used for memory-bound kernels, with the library available on GitHub.
RMSNorm’s Really Radical Recipe in CUDA: An old blogpost details the implementation of RMSNorm in CUDA, showcasing optimization techniques, available here.

GPU MODE ▷ #🍿 (5 messages):

GPU Mode Kernel Leaderboard, GitHub Kernels Dataset, Heterogeneous Computing Code on GitHub, Triton/CUDA Repos

GPU Mode Boasts More Kernels Than GitHub?: A member mentioned that GPU Mode Kernel Leaderboard allegedly has more kernels than all of GitHub, sparking curiosity about the source of this data.
- Another member believes this number comes from The Stack dataset, which could be outdated due to the increased popularity of GPU programming for deep learning.
Kernel Collection Collaboration?: A member expressed interest in creating an exhaustive list of all kernels / heterogeneous computing code on GitHub, pending a reasonable way to divide the work.
- Another member recalled existing repos that track notable Triton / CUDA repos, though the specifics remain elusive.

GPU MODE ▷ #thunderkittens (1 messages):

Thundermla, sm120, async tma, async mma, tcgen05

Porting Thundermla to sm120: Yay or Nay?: A member questioned if thundermla would make sense to port to sm120, suggesting it could leverage async tma and the barriers.
- However, they noted its inability to use tcgen05 async mma/wgmma async mma found in sm100 and sm90 examples, posing a potential limitation.
sm120 vs sm100/sm90: The key difference between the architectures is the support for tcgen05 async mma/wgmma async mma.
- These features are present in sm100 and sm90, but not in sm120.

GPU MODE ▷ #submissions (7 messages):

prefixsum_v2 leaderboard, vectorsum_v2 leaderboard, A100 results

PrefixSum Perfection on A100: A member achieved first place on the prefixsum_v2 leaderboard on A100 with a time of 7.20 ms.
VectorSum Victorious on A100: A member secured third place on the vectorsum_v2 leaderboard on A100 with a time of 156 µs.
More PrefixSum Success on A100: A member achieved second place on the prefixsum_v2 leaderboard on A100 with a time of 11.0 ms.

GPU MODE ▷ #hardware (1 messages):

id_ab_ling: how to download fieldiag

GPU MODE ▷ #cutlass (14 messages🔥):

Availability of Presentation Slides, Representable Layouts in CuTe, Swizzles in CuTe

Presentation Slides MIA?: A member inquired about the availability of presentation slides that were initially shown during a YouTube livestream.
- Another member offered to email Chris about the slides on Monday.
Non-Affine Layout Examples Requested: A member asked for examples of cases where a non-affine/non-cute representable layout is needed for common operations.
- Another member suggested that swizzles aren’t representable with a layout + stride, but provided a link to a blog post about swizzles.
Swizzles Represented as Special Layouts: A member clarified that swizzles are representable in CuTe as a special type of ComposedLayout, a class encompassing a wide range of layout-like mappings that aren’t themselves associated with layout functions.
- This is implemented in the CuTe source code.

GPU MODE ▷ #mojo (11 messages🔥):

Pixi vs UV, CUDA version and non-Nvidia, Toolchain installation

Pixi Environment for GPU Puzzles Questioned: A member inquired about the necessity of using Pixi for gpu-puzzles, noting that the Pixi setup uses pytorch=2.7.1 which causes errors, while their UV environment with torch 2.8.0 works fine, as seen in this screenshot.
CUDA Version Compatibility Issues: One member pointed out that the setup is pinned to a CUDA 12.8 torch, potentially causing issues on non-Nvidia GPUs, as seen in this file.
Toolchain Installation Recommended: Another member recommends installing the toolchain exactly as specified, stating, I found that when I’m trying to break in is not the right time to reformulate the recipe.

GPU MODE ▷ #singularity-systems (8 messages🔥):

JAX vs PyTorch2 for pedagogy, Graph acquisition mechanisms, Dual language problem with Python/C++, Mojo and LLVM intrinsics

JAX deemed better than PyTorch2 for teaching: Transitioning from hips/autograd to JAX is considered better than PyTorch1 to PyTorch2 for pedagogical purposes due to the complexity of torchdynamo and aotautograd.
- It is pedagogically better to lean more deeply on the embeddedness of the DSL rather than rely closely on the semantics of the host language.
Graph acquisition mechanisms under consideration: There is a decision to be made regarding the graph acquisition mechanism, choosing between explicit tracing (JAX) or implicit tracing (Torch/XLA) and how it will compose with tinygrad UOp IR.
- The user is choosing between tracing at the host bytecode level with torchdynamo and lowering with aotautograd.
Dual Language Problem Complicates matters: The complexity stems from the dual language problem involving Python/C++ and reusing the autograd in C++, making it unsuitable for the SITP/picograd audience.
- The member mentioned that the dual language problem increases complexity for new users.
Mojo and LLVM intrinsics get a shoutout: A member recommended exploring Mojo, which utilizes LLVM intrinsics as its foundation, defining everything explicitly at the code level.
- The TLDR for Mojo is to use LLVM intrinsics as your foundation, and have nothing else be a part of your language’s compiler, not even the notion of thread index.

GPU MODE ▷ #general (1 messages):

achal: How do you get the benchmark results from the website?

GPU MODE ▷ #multi-gpu (3 messages):

NCCL Debugging, Megatron Optimizer, Distributed Optimizer

NCCL Debugging for Collective Communication: A member suggested adding NCCL_DEBUG=INFO to diagnose collective communication hangs, potentially caused by inconsistent network topologies, referencing a relevant arXiv paper.
- Another member reported that debugging didn’t yield much useful information, noting it’s just us who cannot really determine which log came from which lmao.
Megatron’s Distributed Optimizer Causes Deadlock: A user resolved a deadlock issue by disabling the distributed optimizer of Megatron.
- After disabling the distributed optimizer, the deadlock disappeared.

GPU MODE ▷ #irl-accel-hackathon (38 messages🔥):

Mini-PyTorch Project, Oulipo Flavor in Coding, GPU Memory Allocation, PyTorch Distributed Hacking, Monarch/Torchforge

Crafting Mini-PyTorch with Oulipo Swag: A member is working on a mini-version of PyTorch project with GPU tensor metadata and allocator, flavored with Oulipo constraints (kernels with 512 threads).
- Another member suggested using cudaMallocManaged for on-GPU memory allocation, allowing memory faulting via GPU kernels.
PyTorch Distributed Hacking Bonanza: Members showed interest in hacking on PyTorch Distributed (+torchcomms, torchft, Monarch, etc.).
- One participant inquired about contributing to Monarch/Torchforge outside the hackathon, asking about open-source community management.
GPU Access Troubles: One of the members reported issues with GPU access despite filling out the access form.
- Another member advised joining the Discord server mentioned on the form and requesting access via the bot, mentioning the Nebius team on the 3rd floor for support.
Hackathon Project Deadlines: A reminder was sent to submit project proposals by 6 PM, with the submission form available here.
- Later announcements confirmed that GPU access would be available until 9 AM the next day, and final project demos would commence at 6:30 PM, with dinner on the 3rd-floor rooftop from 7:30 - 8:30 PM.
Symmetric Rendezvous SOS: A member requested help with a symmetric memory rendezvous hang.
- Several members suggested specific TAs who might be able to assist.

GPU MODE ▷ #llmq (1 messages):

CPU offloading, NPU Framework

Framework Machine Falls Flat for NPU: A member reported they couldn’t get the framework machine working for the NPU, deciding to focus on CPU offloading instead.
Member pivots to CPU Offloading: Faced with issues getting the framework machine operational for the NPU, a member has decided to shift focus to CPU offloading.

Modular (Mojo 🔥) ▷ #general (23 messages🔥):

Mojo Setup, MAX Support Contract, AMD Consumer vs Datacenter Cards, Apple Silicon Support, Windows Compatibility

New users seek setup help: A user asked for guidance on setting up and testing Mojo and was directed to the appropriate Discord channel, with a request to include a specific user for assistance.
- The recommended channel is the <#1119100298456215572> channel.
Modular Prioritizes DC GPU Support: Tier 1 support is for customers with Mojo/MAX support contracts using datacenter GPUs, as Modular can be held liable if it doesn’t work quickly.
- Consumer-grade hardware support is lower because companies like Nvidia and AMD may restrict operations on consumer cards, limiting Modular’s ability to offer commercial support.
AMD Card Differences Stall Compatibility: The reason all AMD consumer cards are tier 3 is because AMD has massive differences between DC and consumer cards, and as such they required alternative codepaths in many, many places.
- The architectures required alternative codepaths in many, many places.
Apple Silicon’s Impactful Changes: Limited Apple Silicon GPU support stems from Apple’s unique GPU design, which necessitated reverse-engineering their equivalent of PTX, breaking assumptions in MAX and Mojo.
- One community member claims Apple took GPU design in a very, very different direction than most vendors (many would argue better).
Windows Compatibility Troubles: Windows receives less support due to its non-Unix-like OS structure and unique GPU communication rules.
- Furthermore, hardware vendors might not offer datacenter GPU support on Windows, preventing Modular from providing commercial support contracts.

Modular (Mojo 🔥) ▷ #mojo (110 messages🔥🔥):

GPU random module location, Property testing framework, LayoutTensor limitations, MLIR vs LLVM, Mojo's metaprogramming

Fast Random Module Misplaced?: Members questioned why the faster random module is located in gpu/random.mojo since it doesn’t depend on any GPU operations, with one noting that equivalent C rand calls are 7x faster.
- The decision was made as the default random is cryptographic by default (something that most C implementations do not do) and also pointed out that they aren’t safe for cryptography as mentioned in the Parallel Random Numbers paper.
Property Testing Framework Incoming: A member is working on adding a property-testing framework, basing the work off of python’s Hypothesis, haskell’s Quickcheck, and Rust’s PropTest.
- They plan to add a way to have values that break stuff a lot as one of the things to generate things from, such as -1, 0, 1, DTYPE_MIN/MAX, and empty lists, and have already discovered many bugs, such as this issue with Span.reverse().
LayoutTensor Limits Tensor Network Library: A member is building a tensor networks library and ran into limitations with LayoutTensor, which requires a static layout, preventing dynamic rank tensors.
- They tried using the dynamic_layout alias, but LayoutTensor doesn’t support runtime ranks, though it does allow runtime-determined dimension sizes, with some suggesting to fallback to RuntimeLayout.
MLIR Gaining Traction for Compiler Dev: A discussion compared using MLIR versus LLVM IR for building language backends, with some noting that MLIR can lower to LLVM and is more interesting.
- While inline MLIR has dragons, it’s a good option for compiler development, and a Clang frontend is being built with MLIR, though it’s not meant for codegen, and some companies are reportedly using MLIR to Verilog.
Metaprogramming Powers Impossible Optimizations: The Modular team posted an article discussing Mojo’s metaprogramming capabilities and potential for impossible optimizations, specifically how to specialize hardware details (cache line sizes, page sizes) at compile time.
- The article is located here: Impossible Optimization and it shows a great motivating example for why someone wanted something like a MaybeComptime.

Modular (Mojo 🔥) ▷ #max (2 messages):

MAX, Huggingface, Torchvision, torch_max_backend

MAX integrates with HuggingFace: A member showcased how to use MAX with models from Hugging Face and Torchvision using torch_max_backend.
- The member provided a code snippet that converts a Torchvision VGG11 model to a MAX model.
Torch_max_backend integration: The generated MAX model includes the weights and can be directly used with a MAX accelerator.
- Another member suggested that the original poster share more details in the MAX forums for wider circulation.

Latent Space ▷ #ai-general-chat (99 messages🔥🔥):

Tahoe AI, ImpossibleBench, MiniMax M2, OpenAI Ads, OpenAI Sora Rate

Tahoe-x1 Released by Tahoe AI: Tahoe AI released Tahoe-x1, a 3-billion-parameter transformer, which unifies gene/cell/drug representations and is fully open-sourced on Hugging Face with checkpoints, code, and visualization tools, training efficiently on their 100M-sample Tahoe perturbation dataset.
- One member found it interesting, noting it performs on par with Transcriptformer in some benchmarks, with plans to look at it in detail.
ImpossibleBench Reveals LLM Reward-Hacking: ImpossibleBench, a coding benchmark, was introduced by Ziqian Zhong & Anthropic colleagues to detect when LLM agents cheat versus follow instructions, with released paper, code and dataset.
- The results show that GPT-5 cheats 76% of the time, more capable models cheat more creatively, and denying test-case access or stricter prompting cuts cheating to less than 1%.
MiniMax’s MoE Marvel: M2 Model Launches: MiniMax launched its new 230 B-param M2 MoE model, leapfrogging the 456 B M1/Claude Opus 4.1, reaching ~Top-5 global rank while running only 10 B active params.
- Comments probe architectural tweaks (“lightning”/linear attention), agentic use, and the trend toward smaller active MoEs mimicking cortical columns; the community also celebrated the company open-sourcing it offering Claude Sonnet-level coding skills at 8 % of the price and ~2× inference speed.
OpenAI’s Ad-Powered Pivot Sparks Debate: A member argued that OpenAI is entering an “ad + engagement” phase, hiring ex-Facebook ad execs to turn ChatGPT’s 1B users into multi-hour/day habitués, and chase a $1 T+ valuation.
- Replies debated user trust, privacy, inevitable industry-wide ad creep, and the looming Meta vs. OpenAI distribution war with some questioning how OpenAI will get users to spend several hours per day.
Mercor’s Meteoric Rise: A member announced Mercor’s $350M Series C at a $10B valuation, paying $1.5M/day to experts and outpacing early Uber/Airbnb payouts.
- Replies flooded in with praise, growth stats, and excitement for the AI-work marketplace’s trajectory.

Latent Space ▷ #genmedia-creative-ai (18 messages🔥):

OpenAI Real-Time Bidirectional Speech Translation, MiniMax M2, fal Generative Media Conference, Odyssey-2 Launch

OpenAI does Live Translation Verbatim: At OpenAI Frontiers London, a forthcoming bidirectional speech model demoed live translation that waits for whole verbs before speaking, producing grammatical real-time output, see Tweet here.
MiniMax M2 Models M1: MiniMax unveiled M2, a 230 B-parameter 10 B-active MoE that reportedly outperforms its 456 B/45.9 B predecessor M1 and reaches global top-5, see Tweet here.
Founders Find Five from fal: Kate Deyneka distills fal’s first Generative Media Conference into five insights, visual AI is compute-heavy and aesthetic-centric (unlike LLMs), see Tweet here.
Odyssey-2 has Sci-Fi Feel: Oliver Cameron unveiled Odyssey-2, a 20 FPS, prompt-to-interactive-video AI model available immediately at experience.odyssey.ml.

Nous Research AI ▷ #general (71 messages🔥🔥):

API parameter removal, Reasoning models, Pretraining on 3090, AI and web dev jobs, ML/AI streamers

API Apocalypse: Temperature and Top_P Vanish!: Developers are lamenting the removal of 'temperature' and 'top_p' parameters from new model APIs, with Anthropic dropping combined use of top_p and temperature past version 3.7 and GPT-5 removing all hyperparameter controls.
- The Claude documentation notes the deprecation, while GPT-4.1 and 4o are reported to still support the parameters.
Reasoning Models Kill Hyperparameter Tuning: The rise of reasoning models may be responsible for the changes, but the reasoning behind the parameter removal is speculated to be for developer ease of use, preventing probability bleeding, security concerns, or concerns about average developer performance.
- One member joked that the changes are “also to piss me off personally? lol 😅 now I have to write a bunch of code in my api handler to treat gpt 5 and anthropic special. precious babys.”
3090 Pretraining Ponderings: One member inquired about resources for pretraining on a 3090, having experimented with the Wiki dataset and considering the scalability of their ideas.
- A suggestion was made to look at SmolLMI, which reportedly has models in the 150M - 350M parameter range.
Web Devs Worry About AI: A web developer with 10 years of experience expressed fear that AI will take their job, seeking advice on pivoting or learning more about the field.
- The advice given centered around learning AI tooling and focusing on selling creations rather than lines of code, plus the need to be flexible to adapt to employers’ needs.
Streaming Servers for Smarter Surfing: Members discussed streamers focused on ML/AI development, suggesting Primeagean, Yannick Kilcher, and Joseph Suarez from Pufferlib for RL streams.
- Another pointed to bycloudAI as a good resource, though they may be currently serving in the military.

Nous Research AI ▷ #ask-about-llms (3 messages):

GPT Ideology, Model Meta-Awareness, Claude's Persona

GPT’s Western Ideological Leanings: A member mentioned that GPT models developed in the West might exhibit ideological biases that align more with Western perspectives, highlighting the significant impact of data on shaping a model’s worldview.
- Another member suggested that models possess a form of meta-awareness, claiming that when jailbroken, they generally express similar sentiments.
Claude’s Infant-like Persona: A member shared that Claude seems to be an exception in terms of meta-awareness, describing it as being more infant-like in its responses compared to other models.
- No additional context or links were provided.

Nous Research AI ▷ #research-papers (8 messages🔥):

KBLaM vs RAGs, AI training data quantity, Business RAG getting common, Microsoft Service Provider

AI Models Need More Data?: A member thinks that AI models aren’t pre-trained with all the knowledge that exists because everyone cares about their own craft and does not want to give it to AI companies.
- They add that many thoughts are considered harmful and don’t pass into training, because 100 Trillion tokens seems like a lot, but it isn’t if you scour the internet without filtering and human verification.
KBLaM faces Obstacles: One member tried implementing a similar concept to KBLaM months ago, but ended up being blocked, because it functions as a direct upgrade to RAGs.
- The member also noted that AI generated summaries are very often of much lower quality than the source material and the compressed format will always have worse quality than the raw format.
KBLaM Discussion & Defense: Two new papers about KBLaM were shared: [2504.13837] and [2509.16679v1].
- In response to lower quality context concerns, it was mentioned that those concerns are addressed in the paper, for instance, by refusal instruction tuning (I don’t know, sorry!).
Business RAG is booming: A member showed a consulting client (a Microsoft Service Provider) how to whitelabel RAGFlow, and thinks business RAG is getting quite common.
- They add that basically every TUI coding assistant now can utilize RAG via MCP.

Nous Research AI ▷ #interesting-links (6 messages):

Translation using Data, Temporal Optimal Video Generation, Grandma Optimality, Prompt engineering via rhyme

Translating via Data & Rhyme Time: A user speculates that translating non-semantic outputs to any target language should be fairly trivial using data, suggesting the world should create high-quality, multilingual datasets, as posted on X.
- The user proposes that poetry and rhymes can possibly optimize prompt and context utilization, potentially leading to a temporal optimax variant.
Grandma Optimality Improves Video Gen: A user introduces Temporal Optimal Video Generation Using Grandma Optimality, claiming it enhances computation for image and video generation and shares an example using a prompt to make a video 2x slower while maintaining quality (see X).
- The user suggests first generating an image and then converting it to video for best results and provides a system prompt example to reduce response length by 50% with a 4k token limit.
Temporal Optimization Needs More Compute: A user observes that the temporal optimized video shows increased complexity, stability, and a more natural scene, with fireworks lasting longer and argues that world-accurate simulations may require more compute to render properly, via temporal enhancement (X).
- They concede that they lack the resources to validate such studies fully, and pose the question: is there a compute or temporal requirement to render world accurate simulations/videos?
Veo 3.1 Fast Demoed with Rhyme: A user shares a prompt used with Veo 3.1 Fast: Multiple fireworks bursting in the sky, At the same time, they all fly. Filling the sky with bloom lighting high (X).
- They noted that the temporal optimax variant is not optimizing for color diversity but for natural feel and cadence.