OpenAI may be all you need.

AI News for 10/3/2025-10/6/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (196 channels, and 20085 messages) for you. Estimated reading time saved (at 200wpm): 1496 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

OpenAI’s GPT5 did such a good job at summarizing OpenAI’s own dev day event today, that we simply reused the title it came up with for today’s email. We’ll leave further analysis to tomorrow’s Latent Space podcast with the team, but otherwise here are the link drops you should not miss:

OpenAI DevDay Product/API/SDK Launches

50min Opening Keynote with Sama: https://www.youtube.com/watch?v=hS1YqcewH0c&t=1382s
1hr OAI podcast with Andrew Mayne: https://www.youtube.com/watch?v=QIdUllqmuls
Website: https://openai.com/devday
Introducing apps in ChatGPT and the new Apps SDK (blog): https://openai.com/index/introducing-apps-in-chatgpt
- Apps SDK (docs): https://developers.openai.com/apps-sdk
- 1min youtube trailer: https://www.youtube.com/watch?v=2C4Cs6503gw
Introducing AgentKit (blog): https://openai.com/index/introducing-agentkit
- 5min Agent Builder intro: https://www.youtube.com/watch?v=44eFf-tRiSg
- Agents (docs): https://platform.openai.com/docs/guides/agents/agent-builder
- ChatKit Studio (app): https://chatkit.studio/ (playground, widget builder, demo)
- ChatKit docs: https://platform.openai.com/docs/guides/chatkit
- ChatKit Python: https://openai.github.io/chatkit-python/
- ChatKit JS: https://openai.github.io/chatkit-js/
- Guardrails (docs): https://guardrails.openai.com/docs
- Evals (docs): http://platform.openai.com/docs/guides/evaluation-getting-started
Codex is now generally available (blog): https://openai.com/index/codex-now-generally-available
- Codex SDK (docs): https://developers.openai.com/codex/sdk
Service Health dashboard: https://platform.openai.com/settings/organization/service-health
GitHub projects: https://github.com/orgs/openai/repositories?q=apps-sdk+OR+chatkit+OR+guardrails (apps, chatkit, guardrails)

New models released

gpt-5 pro (model): https://platform.openai.com/docs/models/gpt-5-pro
gpt-realtime-mini-2025-10-06 (model): https://platform.openai.com/docs/models/gpt-realtime-mini (70% cheaper)
gpt-audio-mini-2025-10-06 (model): https://platform.openai.com/docs/models/gpt-audio-mini
gpt-image-1-mini (model): https://platform.openai.com/docs/models/gpt-image-1-mini (80% cheaper)
Video generation with Sora (docs): https://platform.openai.com/docs/guides/video-generation
Sora 2: Prompting Guide (cookbook): https://github.com/openai/openai-cookbook/blob/16686d05abf16db88aef8815ebde5c46c9a1282a/examples/sora/sora2_prompting_guide.ipynb#L7
sora-2 (model): https://platform.openai.com/docs/models/sora-2
sora-2-pro (model): https://platform.openai.com/docs/models/sora-2-pro

AI Twitter Recap

OpenAI DevDay: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs

OpenAI turned ChatGPT into an application platform. The new Apps SDK (built on MCP) lets partners embed full, interactive apps directly in ChatGPT with custom UI, actions, and forthcoming monetization. Early partners include Canva, Figma, Zillow and Coursera. See the launch and live demos from OpenAI’s keynote: apps inside ChatGPT @OpenAI, SDK preview @OpenAIDevs, and “DevDay ships” roll‑up @edwinarbus.
AgentKit is OpenAI’s end‑to‑end agent stack—visual Agent Builder, ChatKit UI, Guardrails, Evals, and Connectors—to build, deploy, and harden production agents. Live onstage, OpenAI built a working agent in under 8 minutes @gdb. Docs and announcement: AgentKit, blog. Notably, the built‑in prompt optimizer aligns with community best practices (e.g., GEPA) @dbreunig.
Codex is now GA with an SDK, Slack integration and enterprise controls/analytics for code reviews and CLI/IDE workflows (GA post). Live demo showed speech+controller‑driven coding with Codex @gdb. Teams credit Codex for shipping velocity (80% of PRs authored in some internal builds) @stevenheidel.
New models/APIs and scale stats:
- GPT‑5 Pro is in the API for heavier reasoning; pricing shared onstage and by community observers: $15 input / $120 output per 1M tokens @OpenAIDevs, @scaling01.
- gpt‑realtime‑mini offers speech‑to‑speech at ~70% lower cost than gpt‑realtime @juberti.
- Sora 2 and Sora 2 Pro are now API‑accessible (with sound, remixing, duration control). Pricing examples: $0.10/s (720p) for Sora 2; $0.30/s (720p) / $0.50/s (1024p) for Pro @scaling01. Mattel is already using Sora 2 for sketch‑to‑concept loops @gdb.
- Platform metrics: 4M developers, 800M weekly ChatGPT users, >6B tokens/min via API @kevinweil, @nickaturley. New service health dashboard and a priority tier with ~40% faster GPT‑5 responses @OpenAIDevs.

Compute and inference infra: OpenAI × AMD, NVIDIA stacks, and vLLM

OpenAI and AMD announced a multi‑year plan to deploy 6 GW of Instinct GPUs, with AMD issuing OpenAI a warrant for up to 160M shares vesting on deployment/price milestones. AMD stock jumped on the news; OpenAI emphasized this is incremental to ongoing NVIDIA purchases @sama, @LisaSu, @TheRundownAI, @gdb.
NVIDIA’s B200s are now available on Hugging Face Inference Endpoints @ClementDelangue. NVIDIA’s TensorRT‑LLM hit v1.0 with a PyTorch‑native core, CUDA Graphs, speculative decoding, and GB200 support—now serving Llama3, DeepSeek V3/R1, Qwen3, etc. @ZhihuFrontier.
vLLM continues to underpin cutting‑edge RL loops (e.g., PipelineRL with in‑flight weight updates and stale KV cache mixing) @vllm_project, @DBahdanau.

Chinese model surge: Qwen3‑VL, GLM‑4.6, Hunyuan

Qwen released Qwen3‑VL‑30B‑A3B (Instruct & Thinking): MoE with ~3B active params, 256k–1M context, multilingual (32 languages), aiming at GPT‑5‑Mini/Claude Sonnet parity and shipping FP8 variants. Artifacts across chat, GitHub/cookbooks, API, ModelScope, Hugging Face, plus a live HF space @Alibaba_Qwen, HF demo. Day‑0 MLX support highlighted by Nexa @nexa_ai.
Zhipu’s GLM‑4.6 now ranks as the top open model in LMArena and #4 overall, with strong showing even without “style control” @arena, @jietang. Production status: brief z.ai outage due to CPU server attack (now resolved) @Zai_org. Practitioners note GLM‑4.5/4.6 as a high‑value Claude‑style alternative with generous limits and low cost @Tim_Dettmers.
Tencent’s HunyuanImage 3.0 jumped to #1 overall and #1 open‑source on the T2I Arena, displacing prior leaders @arena, @TencentHunyuan. Hunyuan Vision 1.5 Thinking entered to tie for #3 in Vision Arena @arena.

RL and post‑training: LoRA wins, abstractions, pretraining with RL signals

LoRA for RL keeps winning mindshare. John Schulman highlighted multiple reproductions where LoRA rank=1 closely matches full fine‑tuning across RL setups; TRL shipped a reference “LoRA without regret” reproduction @johnschulman2, @ClementDelangue. Threads dissect why RL updates live in low‑dimensional subspaces (good for LoRA) @nrehiew_.
RLAD (Reinforcement Learning with Abstraction and Deduction) separates “how to reason” (short natural‑language hints) from “how to answer.” Reported uplifts include +11% AIME 2024 and +9% AIME 2025 versus long‑CoT baselines, and ~44% gains over standard long‑chain methods, at constant or lower sequential budgets @TheTuringPost.
NVIDIA’s RLP (Reinforcement as Pretraining) treats chain‑of‑thought as actions with verifier‑free dense rewards during pretraining, reporting sizable gains on math/science: +24% (Qwen3‑1.7B‑Base) and +43% (Nemotron‑Nano‑12B‑Base) across 8 benches @ahatamiz1.
Infrastructure for RL is evolving fast: vLLM‑powered PipelineRL’s in‑flight updates with KV reuse @vllm_project; algorithmic variance‑reduction applied to matrix optimizers (MARS‑M on Muon) @YIFENGLIU_AI. Curations of RL trends and foundations: TD learning explainer @TheTuringPost, emerging RL trends list @TheTuringPost, GAIN‑RL data‑curriculum speedups @DeepLearningAI.

Agents, evaluation and tooling beyond OpenAI

Anthropic open‑sourced Petri, a scenario‑driven alignment auditing toolkit used internally for 4.5 alignment testing (sycophancy, deception) and adapted by AISec Inst. for external assessments @AnthropicAI, @sleepinyourhat.
Google DeepMind’s CodeMender agent has already upstreamed 72 accepted security fixes to major OSS repos; research details forthcoming @GoogleDeepMind, @ralucaadapopa.
LangChain shipped a curated LangGraph.js gallery and an agentic tutorial with SingleStore integration @LangChainAI, @LangChainAI. Comet continues to push “AI browser” workflows for long‑form media analysis @AravSrinivas.
Platform notes: Yupp added GPT‑5 Pro and Qwen3‑VL‑30B‑A3B with “Help Me Choose” eval summaries @yupp_ai.

Embodied AI and video generation

Tesla’s Optimus continues rapid capability gains—now “learning Kung Fu”—with leadership hinting at unified self‑driving and humanoid stacks @elonmusk, @aelluswamy. Figure reports five months of 10‑hour/day humanoid operations on BMW’s X3 line (video claims of 2050‑level demos teased; operations update @adcock_brett).
Long‑video diffusion scaling: ByteDance’s Self‑Forcing++ reaches up to 4m15s videos without long‑video teachers, preserving fidelity/consistency @HuggingPapers. Synthesia 3.0 pitches interactive “video agents” with avatars/voice sync for training/support @lax97981.
Sora 2 safety controls: cameo‑owner restrictions, clearer watermark, and moderation tweaks; account unlinking fixes landed @billpeeb, @turtlesoupy. Sora 2/2 Pro now in the API (see above).

Top tweets (by engagement)

Tesla Optimus “learning Kung Fu” demo @elonmusk
“You can now chat with apps in ChatGPT” @OpenAI
Sora update thread (Sam Altman) @sama
Figure: 5 months on BMW X3 production line @adcock_brett
Anthropic’s pop‑up line for hats/books @signulll

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Community Provider Appreciation Post (Image)

Biggest Provider for the community for at moment thanks to them (Activity: 2530): Non-technical meme post praising China-based LLM providers—GLM (ZhipuAI/THUDM), Alibaba’s Qwen, and DeepSeek—as the current biggest contributors to the community by offering capable models and low-cost access, contrasting with more closed, higher-cost Western offerings. Context from comments frames these groups as democratizing AI access versus OpenAI’s historically opaque approach and productization; no benchmarks or implementation details are provided in the post. Top comments laud GLM/Qwen/DeepSeek as “gifts to mankind,” argue OpenAI prioritized secrecy under the banner of safety, and claim that without these providers, developers would be paying significantly more for GPT-like access.
- Commenters highlight Chinese open-weight model families — GLM (THUDM/ZhipuAI), Qwen (Alibaba), and DeepSeek — as current community workhorses thanks to weight releases, detailed model cards, and competitive benchmarks. They’re frequently cited on community leaderboards (MMLU/GSM8K/HumanEval) as strong open alternatives to closed APIs; see the Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard and model hubs for GLM (https://github.com/THUDM/ChatGLM3), Qwen (https://github.com/QwenLM/Qwen2.5), and DeepSeek (https://huggingface.co/deepseek-ai).
- A recurring technical theme is cost-and-deployment: self-hosting 7B–14B models (often 4-/8-bit quantized) can run on consumer GPUs with ~8–24 GB VRAM using runtimes like vLLM (https://github.com/vllm-project/vllm) or llama.cpp (https://github.com/ggerganov/llama.cpp), avoiding per-token API charges. This enables predictable TCO, offline/edge deployments, and customized guardrails/fine-tuning pipelines that would be cost-prohibitive on proprietary tiers (e.g., GPT‑3.5/4).
- There’s a technical critique of OpenAI’s closed release practices (limited training details since GPT‑3) contrasted with these groups’ openness (weights, training/eval recipes, inference stacks), which enables independent benchmarking and reproducibility. References: Qwen docs/papers and model cards (https://huggingface.co/Qwen), DeepSeek releases (https://github.com/deepseek-ai), and GLM/ChatGLM resources (https://huggingface.co/THUDM).

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. DeepMind Codemender and Gemini 3 Tool-Use Updates

Google DeepMind introduces new AI agent for code security - Codemender, automatically finds and fixes code vulnerabilities, and has already submitted 72 high quality fixes in major open source projects (no public access yet, but it is coming) (Activity: 396): Google DeepMind announced CodeMender, an AI code-security agent that autonomously detects vulnerabilities in repositories and proposes/submits fixes, with a claim of 72 high‑quality patches contributed to major open‑source projects; public access is not yet available (post). The provided link content contains only site navigation, so specifics about model family, training/evaluation datasets, supported languages, vuln classes, CI/CD integration, review workflow, and safety/guardrail mechanisms are not disclosed in this input. Top comments question whether such agents will run continuously in org workflows, note limits against human‑factor risks (e.g., credential hygiene), and raise concerns about overreach/false positives (e.g., deleting .env/secrets), implying a need for strict scopes, guardrails, and review‑gated automation.
- Participants note that code-fixing agents don’t address broader operational security risks: even flawless static/dynamic remediation won’t stop credential leaks via poor practices (e.g., passwords on sticky notes, webcam exposure). This underscores the need for defense-in-depth beyond code (secret hygiene, least-privilege access, endpoint hardening, and user training) alongside any automated vuln-fixing agent.
- Skepticism about auto-modifying repositories highlights the need for strict guardrails: agents should avoid touching sensitive artifacts (e.g., .env) via deny/allow lists, integrate secret scanning before/after patches, and require protected branches, mandatory code review, dry-run diffs, and easy rollback to prevent destructive changes or accidental secret exposure.
- On “always-on” deployment, commenters implicitly raise operational concerns: continuously running agents should be integrated at PR-time and/or scheduled scans (nightly/weekly) with rate limits, cost/governance controls, scoped tokens, and detailed audit logs to maintain supply-chain integrity while minimizing noise and repo churn.
Gemini 3 will be able to call tools (Activity: 533): The post claims “Gemini 3 will be able to call tools,” i.e., support structured function/tool calling to invoke external APIs and consume their outputs—enabling retrieval, code execution, and other agent-style actions. Technically this is table‑stakes parity with existing LLM ecosystems (function-calling interfaces with typed arguments/JSON-like schemas and tool selection), improving reliability and integration compared to pure free‑text prompting. Commenters mostly note this is baseline (“isn’t it basically a must these days”), with some skepticism about the post’s seriousness, implying the announcement is trivial rather than novel.
- “Calling tools” refers to LLM-initiated function calls where the model selects a tool name and emits structured args (typically JSON) that a runtime executes (e.g., web search, DB query, code run), returning results the model incorporates in subsequent turns—akin to ReAct-style loops. This is the same capability exposed as function calling / tools in other stacks like OpenAI (docs) and Google Gemini (docs), and Anthropic Claude “Tool Use” (docs). It enables grounding, fresh data access, and precise operations beyond pure text generation.
- Tool use is increasingly a baseline for production LLM apps because it powers RAG (retrieval/search), calculators/coders, and integrations (apps/APIs), significantly reducing hallucinations and extending capabilities. Most competitive stacks—GPT‑4/4o (Assistants API tools), Claude 3.5 (Tool Use), and open-source agent frameworks—treat tool calling as first-class, so lack of reliable tool use is a competitive handicap. In practice this hinges on robust schema adherence, tool selection/routing, and multi-step planning/execution fidelity.
- Several commenters allude to Gemini’s weaker historical reliability with tool use versus peers, pointing to issues like schema-mismatched arguments, incorrect tool choice, and brittle multi-turn plans leading to failures (e.g., 4xxs from strict APIs). Practitioners often mitigate with tighter JSON schemas, validation, tool-choice gating, and decomposed plans, but out-of-the-box success rates reportedly lag GPT‑4o/Claude 3.5 in community tests. If “Gemini 3” improves tool routing, argument formation, and iterative planning, it could close this gap in agentic workloads.
Gemini 3 (Activity: 503): Post titled “Gemini 3” shares an image (not viewable) that, based on discussion, centers on Gemini’s tool-use capabilities and platform integration. Commenters highlight the need for Gemini to invoke a broader set of external tools/APIs beyond the current Gemini Apps sandbox and emphasize stability/consistency as a top priority, suggesting gaps in reliability and ecosystem breadth. Notable debate clarifies what “tools” means—whether on-device actions (phone) vs. cross-device/desktop command execution—with at least one user reporting Gemini can execute commands on a laptop, implying uneven or context-dependent tool support.
- Demand for broader tool-use and open interoperability: commenters request support beyond the current limited “Actions” in Gemini Apps, with one noting “MCP compatibility is necessary atp”. Adopting the Model Context Protocol (MCP) would enable vendor-agnostic tools, standardized discovery/schemas, and permission/logging flows across assistants, letting Gemini tap the same third‑party capabilities supported by other ecosystems (modelcontextprotocol.io, github). This would reduce plugin fragmentation and make it easier to bring in filesystem, HTTP, code, and custom enterprise tools as uniform servers.
- Cross-device execution parity and OS constraints: one user reports Gemini can “execute commands on my laptop” but is constrained on phone, highlighting platform differences. Desktop agents can leverage native apps, shell, or browser extensions, while mobile OSes restrict background tasks and inter‑app automation; bridging this gap likely requires deep integration with Android Intents, iOS App Intents/Shortcuts, foreground services, and a local RPC bridge to expose device capabilities as tools with explicit user permissions. Clear permissioning and sandboxing are essential for safe on‑device actions while preserving reliability.
- Reliability/consistency as the top engineering priority: commenters emphasize “consistency and reliability above everything else,” which maps to concrete targets like tool-call success rate, end‑to‑end action completion rate, and deterministic planning. Techniques include schema‑constrained function calling, temperature=0 for planning/tool selection, retries with exponential backoff and timeouts, idempotency tokens, and structured error handling/logging for auditability. Robust evals (e.g., action success across devices) and caching/stability of prompts can materially reduce variance and user-visible flakiness.

2. TTS Voice Accent Complaints (Scottish accent)

Didn’t even try a Scottish accent. (Activity: 494): From the title and comments, the post appears to showcase an AI voice-clone rendering a line associated with a Scottish accent (likely Braveheart’s “Freedom”) in a Michael Jackson-style voice; the system preserves MJ’s timbre and idiolect (e.g., “hee-hee”, “shamone”) but fails at accent transfer. This highlights a common limitation of zero-shot TTS/voice cloning: models often optimize for speaker identity and prosody but provide weak control over regional accent without accent-conditioned training or fine-tuning (see multi-speaker TTS like YourTTS or VALL-E). Top comments note the MJ affectations—“Free-hee-hee-dom” and “The SHAMOOOONE”—implying the model captured stylistic mannerisms while missing the Scottish accent, which readers find amusing rather than problematic.
Didn’t even try a Scottish accent. (Activity: 492): The linked media on Reddit (v.redd.it/qu1ek8f8jetf1) returns an HTTP 403 Forbidden block page, indicating Reddit’s CDN/gateway requires authenticated access (OAuth or developer credentials) to retrieve the asset. From the title and comment cues, the clip likely features an AI-generated Michael Jackson–style voice (voice cloning/VC or TTS) applied to a Scottish-accent context, but no model, pipeline, or quality metrics are disclosed and the content cannot be verified due to access restrictions. Commentary is largely non-technical, expressing amusement at an “AI MJ” voice and referencing MJ vocal tics (e.g., “hee-hee,” “shamone”), with no substantive technical debate.

3. AI Community Sentiment: Memes, Vibes, and Moderation Debate

The mood right now (Activity: 416): The referenced Reddit post is inaccessible: the request returned HTTP 403 Forbidden (“client is blocked by network security”) and indicates authentication is required; the content appears to be a video hosted at v.redd.it (https://v.redd.it/mgss3gugtitf1). Without access, no technical details, benchmarks, or implementation notes can be extracted; remediation paths include logging in via Reddit Login, using a developer token, or filing a support ticket. Top comments are non-technical: interest in seeing “this” on a farm, amusement (“I’m LMAO”), and a quip that it could push “boomer FB slop” to the next level—implying concern about scaling low-quality autogenerated content.
Gonna be a dank future, boys and girls! (Activity: 629): The linked media at v.redd.it/hjy60pum5jtf1 returns HTTP 403 Forbidden, indicating access control enforced at Reddit’s authentication layer or edge (CDN/WAF), requiring user login or a developer token to retrieve the asset. With the content itself inaccessible, the only concrete technical takeaway is around delivery and gatekeeping on Reddit’s media CDN rather than any model, benchmark, or implementation details of generative AI. Commenters speculate that generative AI usage will skew heavily toward entertainment/NSFW content (“50% dank memes and 50% porn”) and raise a question about historical uniform accuracy (“Is Hitler wearing a British uniform?”), implying potential deepfake/stylization; another expresses doomer sentiment (“Welp it’s been a fun ride”). No technical evidence or benchmarks are provided.
IF ChatGPT is planning on forcing us to prove our identity, they better remove the SFW “guard rails” if they verify we’re over 18 (Activity: 527): Users report a recent tightening of ChatGPT’s SFW safety filters that now block or lock existing threads containing consensual, fictional adult content; the model often asks for clarification and then refuses, disrupting previously workable creative-writing workflows. The OP proposes that if OpenAI adds identity/age verification (KYC) for ChatGPT, verified 18+ users should be allowed to bypass NSFW guardrails—at odds with the current OpenAI sexual-content safety spec, which disallows explicit sexual content generation regardless of user age (see: https://platform.openai.com/docs/guides/safety-specifications/sexual-content). Commenters push for a rollback or an 18+ “bypass” mode; one calls the default “GPT‑5” behavior overly aligned and “lame,” and another argues the restrictions aren’t about protecting minors. The substantive concern is elevated false positives in NSFW detection breaking backward compatibility with prior threads and hindering legitimate adult-fiction use cases.
- Multiple users report a recent tightening of NSFW safety filters that now trigger multi-step “clarification” prompts followed by refusal, even for explicitly adult, fictional characters. This behavior is breaking iterative writing workflows (e.g., post-project “what-if” scene generation) and even locking users out of existing conversation threads—suggesting a change in the platform-level moderation wrapper rather than a limitation of the base model itself.
- One commenter characterizes “GPT-5 in its default state” as extremely constrained, implying the default system prompt/safety layer is the bottleneck for creative/NSFW outputs rather than raw model capability. This highlights the distinction between the underlying model and the deployed, policy-enforced configuration, where the default safety scaffolding can significantly neuter generation quality.
- An alternative is proposed: use Mistral’s “Le Chat,” claimed to be similar to GPT-4o in capability while operating under a different (looser) policy regime, thus avoiding OpenAI’s guardrails. References: Le Chat (https://chat.mistral.ai) and OpenAI’s GPT-4o overview (https://openai.com/index/hello-gpt-4o/).
Biggest Provider for the community thanks (Activity: 1034): The image is a meme “thank you” post implying Chinese labs are the current “biggest provider” to the open‑source AI community via open‑weight releases, e.g., Alibaba/Qwen (HF), 01.AI/Yi (HF), DeepSeek (HF), and InternLM (HF). Technical nuance from comments: these are “open weights,” not fully “free/open‑source”; licenses can include usage restrictions, and downstream ecosystem support (LoRA finetunes, tooling) lags popular Western bases like PonyXL/Illustrious. Commenters argue “it’s not free” (license nuance) and that China gains mindshare by releasing weights while US/EU firms stay closed; others note few community LoRAs for Chinese models, likely due to hardware limits.
- Several commenters clarify that “free” access and open-weights releases are distinct: open weights allow downloading and local use/fine-tuning but still impose compute costs and may have licensing constraints, unlike fully FOSS code. Practically, open weights enable offline inference, quantization, and LoRA training—benefits closed APIs don’t provide—while shifting costs to users’ hardware and electricity. This nuance affects ecosystem health by enabling community benchmarks and reproducibility, even if the model isn’t cost-free to run.
- On model adoption, a key blocker for Chinese open models is the LoRA ecosystem: users note fewer community LoRAs versus PonyXL or Illustrious, likely due to hardware limits for XL-scale fine-tuning. SDXL-class LoRA training commonly pushes consumer GPUs; many hobbyists on 8–12 GB VRAM must use tiny batch sizes or aggressive memory optimizations, whereas smoother training often benefits from >16–24 GB. This reduces the volume/quality of community LoRAs compared to well-backed ecosystems; tools like kohya-ss (https://github.com/bmaltais/kohya_ss) help, but XL models remain more resource-intensive than SD1.5.
Brett Adcock: “This week, Figure has passed 5 months running on the BMW X3 body shop production line. We have been running 10 hours per day, every single day of production! It is believed that Figure and BMW are the first in the world to do this with humanoid robots.” (Activity: 1153): Brett Adcock reports that Figure humanoid robots have been operating on a BMW X3 body shop production line for ~5 months at ~10 hours/day on each production day, claimed as a first sustained humanoid deployment in automotive manufacturing. The post provides no quantitative details on task scope, MTBF/uptime, error rates, safety events, or throughput impact; the referenced clip is access-restricted (video, Figure). Technically minded commenters question the choice of a humanoid form factor for a repetitive station versus purpose-built or wheeled platforms and simpler end-effectors, and note that 10 h/day is modest compared to typical industrial robot duty cycles (often continuous/24×7), suggesting this may still be a limited or pilot deployment. Others speculate on broader adoption timelines (e.g., robots “everywhere by 2035”).
- Form factor debate (humanoid vs wheeled/specialized): commenters question why a humanoid is needed for a repetitive, looped task when a fixed robot cell or a wheeled mobile manipulator could be simpler and more reliable. The technical trade-off highlighted is that a humanoid can be a drop‑in fit for human‑designed workcells (reach envelopes, tool geometries, fixtures) and use human tools without retooling, but legs and 5‑finger hands add complexity, cost, and potential failure modes; many factory tasks can be handled by wheeled bases with 2‑ or 3‑finger grippers or parallel‑jaw end effectors. The implicit optimization problem is dexterity/coverage vs uptime/MTBF and integration cost, with humanoids offering flexibility at the expense of simpler, higher‑reliability dedicated automation.
- Uptime and duty cycle skepticism: the claim of 10 hours/day for 5 months prompts discussion that industrial robots commonly target multi‑shift or 24/7 operation, so limiting to 10 hours likely reflects integration/safety constraints, human shift alignment, charging/thermal limits, or reliability burn‑in. Technically minded readers point to metrics like OEE/MTBF/MTTR and mean cycles between intervention as more meaningful than calendar time, suggesting the need for data on intervention frequency, recovery time, and autonomous error handling to assess production‑readiness.
- Comparison with existing industrial armbots: with massive 6‑DoF armbots already in the cell, commenters ask what unique value the humanoid adds. The technical theme is that fixed armbots excel at highly constrained, fixtured tasks (e.g., welding, material transfer) but struggle with unstructured or variable subtasks (ad‑hoc handling, tool pickup, inspection, cable routing) where a humanlike reach, posture, and multi‑contact manipulation can reduce custom fixturing and changeover time. The trade is throughput and simplicity of dedicated cells versus reconfigurability and lower retooling for edge cases or high‑mix/low‑volume work.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Sora 2 Video Gen: Demos, Restrictions, and Reactions

Duckumentary Dazzles, Memes Multiply: OpenAI premiered a 30‑second Sora 2 short, The Quack: Part 1 — OpenAI on X, fueling buzz around Sora 2’s creative fidelity and the invite code FRIYAY shared alongside the teaser. The drop spotlights rapid progress in AI video generation with a polished, meme‑friendly vignette that tests prompt‑to‑video consistency in a tightly scoped scene.
- Community members celebrated fast‑rising gen‑video quality and shared comparisons with other recent Sora 2 clips like the low‑gravity ‘horse on astronaut’ gag noted by @elder_plinius (Sora 2 Pro’s leap). Many framed the release as a visible step toward more reliable storyboard adherence and cinematic timing.
Sora Slams the IP Door Shut Overnight: Creators reported sudden prompt rewrites and outright bans on copyrighted content in Sora 2, citing Andrew Curran on X after anime tests that had initially looked strong. The shift curtails direct references (e.g., named franchises) and forces descriptive workarounds for protected characters and worlds.
- Users described the experience as “speedrunning enshittification” while noting the model now aggressively sanitizes prompts and output, shrinking the creative envelope for fan‑style video. Discussion centered on how these policy changes impact production pipelines and whether style‑only descriptors still survive moderation.

2. Local/Edge Inference: LM Studio Compatibility and DIY Throughput

LM Studio Speaks OpenAI v1 Responses: LM Studio 0.3.29 added OpenAI /v1/responses compatibility, letting apps that expect the standard OpenAI API format plug directly into local models. The release also debuted the CLI helper lms ls --variants to list local model variants, simplifying multi‑variant dev workflows.
- Engineers reported smoother drop‑in integration with OpenAI‑style clients and faster iteration thanks to robust variant discovery in the terminal. This narrows the gap between local experimentation and production prototypes that assume /v1/responses semantics.
Wi‑Fi Farm Feeds GLM at 23 tok/s: One setup ran distributed inference across 3 nodes with 8× RTX 3090s over Wi‑Fi, hitting ~5.5k prompt processing and ~23 tok/s on GLM 4.5 Air at 8‑bit, using the model available via GLM 4.5 Air (free) on OpenRouter/Z.ai. The operator plans to re‑balance to 2 nodes (4/4) to roughly double throughput once parts arrive.
- The report underscores how careful sharding, precision choices, and interconnects can cheaply push local cluster throughput. It also spotlights GLM 4.5 Air as a reliable, rate‑limit‑friendly baseline for stress‑testing distributed serving.

3. OpenRouter Access: iFlow Hacks, Model Swaps, and Seed LLMs

iFlow Flip Unlocks Free GLM‑4.6 Calls: A member reverse engineered iFlow to route free GLM‑4.6 requests from any OpenAI‑compatible client by simply running the Python script. The technique avoids Docker and reportedly works with Qwen/Gemini stacks.
- Chat focused on wiring this into existing OpenAI SDK flows while cautioning about reliability and Terms‑of‑Service risks. The hack shows how adapter layers can be exploited to piggyback on third‑party model endpoints.
DeepSeek Dies; GLM Air Abides: After the provider dropped hosting for deepseek‑v3.1‑base (HTTP 404), users pivoted to alternatives like GLM 4.6 and a free GLM 4.5 Air tier via OpenRouter/Z.ai. The swap stabilized downstream apps that depended on an OpenAI‑style API.
- With Grok 4 Fast no longer free, GLM’s free tier became the go‑to for avoiding rate‑limit headaches during testing. Threads compared latency, token pricing, and reliability between stopgaps to keep prototypes unblocked.
Seed Models Tease Frontier on the Cheap: Builders asked OpenRouter to add ByteDance’s Seed LLMs (e.g., Seed 1.6), citing strong results and bargain pricing around $0.11/$0.28 mtok, hosted via Volcengine Ark. Interest stems from a mix of frontier‑like performance and aggressive cost curves.
- Despite concerns about a China‑hosted control plane, the room favored experimenting to validate quality‑to‑price ratios. The consensus: Seed could pressure mainstream pricing if access and policy clarity improve.

4. GPU Systems: New dtypes, Multi‑GPU Compilers, and NVLink Insights

Arm Arms AI with 6‑Bit MXFP6: Arm announced support for 6‑bit AI datatypes via the OCP MXFP6 format alongside new SVE/SME instructions in its A‑profile roadmap, detailed in Arm Architecture Developments 2025. The move targets reduced memory footprint and bandwidth for edge/embedded AI.
- Engineers expect MXFP6 to boost throughput for quant‑friendly models where memory bandwidth dominates. The update signals broader industry momentum toward sub‑8‑bit inference with hardware‑native kernels.
Mercury Makes Multi‑GPU Compiles Fly: The Mercury paper introduces CommIR, treating remote GPU memory as a managed extension of the memory hierarchy to compile multi‑GPU operators. Reported results show an average 1.56× speedup over handcrafted baselines and up to 1.62× wins for real LLM workloads.
- By explicitly modeling data placement and inter‑device traffic, Mercury minimizes cross‑GPU stalls that plague 3D parallel training. Discussions compared CommIR’s loop‑centric approach with vendor libraries for NVLink/NVSwitch topologies.
NVLink Copy Engines Clock Big Bandwidth: Public experiments benchmarking NVLink copy engines and configurations landed here: NVLink bandwidth experiments (copy engines). Results stressed that measured bandwidth swings with platform wiring and copy paths.
- Engineers contrasted TMA vs. load/store vs. memcpy paths and noted surprising cudaMemcpy headroom in some B200 multi‑GPU tests. Takeaway: profile on your exact fabric + driver combo before cementing a kernel strategy.

5. Agentic Tooling: AppsSDK vs MCP, New DSPy Modules, and Gateways

AppsSDK Crashes MCP’s Party: OpenAI’s AppsSDK brought in‑ChatGPT UI for apps (launch partner: TheFork), while Cloudflare’s Code Mode pitched turning agent tool calls into Workers. MCP contributors debated overlaps between AppsSDK UIs and MCP‑UI and whether Code Mode over‑engineers a simple tool call.
- Some argued AppsSDK could reduce turn count and latency for agent tasks; others flagged perf and complexity vs. plain web APIs/SDKs. The thread urged aligning MCP discovery/capabilities with app‑style transactions to keep ecosystems coherent.
DSPy ReAct Machina Powers Multi‑Turn Agents: A community module, DSPy‑ReAct‑Machina, hit PyPI, enabling multi‑turn ReAct with a single growing context buffer for conversations. The author detailed design tradeoffs in a companion writeup: Dev.to blog.
- Builders discussed trajectory storage, reflection over entire ReAct chains, and plugging this into existing DSPy programs. Interest centered on reducing tool‑call thrash and stabilizing longer plans.
Neosantara Opens Free LLM Gateway: Neosantara AI launched a free LLM Gateway Platform with DSPy integration, documented here: Neosantara DSPy docs. New users get 10k monthly consume tokens and can send feedback to the published contact.
- The gateway targets quick app scaffolding without locking into a single provider, useful for demos and cost‑sensitive prototypes. Early adopters emphasized testing rate limits and fallbacks before shipping.

Discord: High level Discord summaries

LMArena Discord

Perplexity Promo Sparks Referral Race: A Perplexity student offer surfaced, requiring active student status, which led to users transparently asking for referral bonuses.
- The offer generated discussion around its accessibility and the lengths users would go to for a discount.
Comet Browser: EdTech or CheatWare?: The Comet browser was reviewed, with one member describing it as a good place to study, while another jokingly called it a good place to cheat.
- Opinions diverged on Comet’s assistant, with some finding it inferior to ChatGPT or Gemini, while others appreciated its voice mode.
Thirsty AI Strains Local Water Tables: Members debated the environmental impact of AI, particularly water usage for cooling data centers, with one stating that 20 answers from ChatGPT costs around 0.5 liters of water.
- A member alleged that locals near AI data centers are facing hardships, while another speculated about future space-based data centers for heat dissipation.
Gemini 3 Launch Leaks: October 9, 2025?: Chat cited reports from early testers and industry insiders that Gemini 3 is on track for official release on October 9, 2025, pending testing on platforms like AI Studio.
- One member expressed a preference for it to be exclusive to AI Ultra users.
Hunyuan 3 Image Gen Debuts: Members shared first impressions of Hunyuan 3, Tencent’s new image generator, with one finding it superior to Qwen or Flux and on par with Imagen.
- Another noted that these Chinese models often struggle with translation and nuanced cultural details and context when translating into English.

OpenAI Discord

GPT-5 Aims for Immediate Distress Detection: GPT-5 Instant is being updated to better recognize and support users in moments of distress, routing sensitive conversations to provide immediate assistance.
- ChatGPT will continue to tell users what model is active, with discussions around how to force ChatGPT to only use GPT Instant.
Free ChatGPT Business Plan: Users are exploiting a direct link to get a free month of ChatGPT Business with up to 5 seats, after which it bills at $30/seat, via a direct link to the offer.
- This free access has sparked discussions about generating unlimited accounts and the utility of Codex within the business plan.
Arduino’s AI MCU Mimics Neural Net Magic: Arduino is developing an AI-enabled microcontroller similar to an NPU-based ARM, speculated to be a smaller Hailo-8-like device.
- One user reported object recognition at 2000+ FPS for less than 2 watts using a Hailo-8L on Raspberry Pi, mounting it on a UGV Waveshare toy robot tank.
Sora 2 Gets Shade for Shoddy Showings: Users deride Sora 2’s lower resolution quality and watermarks, making it feel cheap.
- Many claim that an OSS 20B model was better and that it is hard to achieve consistent JSON context profiles for health study infographics.
Copyright Censors Creation of DBZ Content: Members note that new copyright restrictions prevent generating copyrighted content like Dragon Ball Z, requiring users to describe characters and settings without direct references.
- Users must describe Goku and Vegeta, the world, animation style, clothing, and tone, but cautioned that it may not closely resemble DBZ.

Unsloth AI (Daniel Han) Discord

B200 on-demand action missing!: Users reported single B200 on-demand rentals missing from DeepInfra, now only offering 8x B200 setups.
- Alternatives such as Modal and Thundercompute were suggested, but they also reported unavailability.
Discord Hit By Security Breach!: Discord disclosed a security incident on September 20, where unauthorized access to a third-party customer service system potentially exposed personal data and government ID images.
- The breach, possibly linked to social engineering of the Trust & Safety team, has ignited discussions on digital IDs and data storage practices.
Dataset Downgrade is TTS Savior!: A user debugging fine-tuning of the Orpheus 4B TTS model resolved a dataset loading error by downgrading datasets to version 3.6.0.
- The ‘torchcodec.decoders.AudioDecoder’ object is not subscriptable error was preventing successful dataset loading and training, highlighting version compatibility issues.
Qwen Moe Shows Off Six-Model Mashup!: A member showcased training 6 separate models on 2 datasets each using a custom-built 6B Qwen 3 architecture, then “moe’d” them into a 6X6B Qwen Moe, 36B model (27B compressed), built using Unsloth.
- Special thanks were given to Unsloth, Team Mradermacher for the quants, and Team Nightmedia for MLX and collab’ing on many parts of this model project; datasets were created in house, and is available at DavidAU/Qwen3-MOE-6x6B-Star-Trek-Universe-Alpha-D-256k-ctx-36B.
Sequence Packing + Flash Attention speeds Fine-tuning!: A member reduced epoch training time from ~30 minutes to ~4-5 minutes (~500 mio tokens) with sequence packing and Flash Attention, resulting in 100% GPU utilization without CPU interruption.
- Flash Attention required manual compilation, pushing the workstation to its peak memory usage.

LM Studio Discord

LM Studio Channels OpenAI Compatibility with v1 Responses: LM Studio 0.3.29 introduces OpenAI /v1/responses compatibility API, enabling developers to integrate LM Studio with applications expecting the standard OpenAI API format for responses.
- The latest LM Studio CLI feature lms ls --variants allows users to easily list their local model variants directly from the terminal.
AI Bubble talk: Members discuss the Nvidia, Oracle and OpenAI money circle, suggesting that if one of them fails, it may cause a snowball.
- Another member noted the toolcalls are not worth it for his local uses, although added that if there was a plugin that injects knowledge stuff into system prompt for example without toolcalls that would be great.
GPT-OSS-120B Framework Lands with 128k Context: The GPT-OSS-120B framework has landed, boasting a 128k context window and running at 19.76 tokens/s on Ryzen AI max+ hardware.
- Other members follow up asking about the exact hardware specs.
Distributed Inference Setups Sizzle: A member shares a successful distributed inference setup using 8x 3090s over WiFi across 3 nodes and achieves approximately 5.5k prompt processing and 23 tokens/s output for GLM 4.5 air at full 8-bit precision.
- They plan to double the speed by using 2 nodes at 4/4 once new parts arrive.
Vulkan > CUDA on Older GPUs?: A member found Vulkan significantly outperforms CUDA on an NVIDIA P40 (63.02 tokens/s vs 23.31 tokens/s) with the model qwen/qwen3-30b-a3b-2507 (Q4_K_M) in LM Studio.
- Members suggest the P40’s outdated drivers and better support of Vulkan for older cards may be the reason.

OpenRouter Discord

iFlow Reversing Exposes Free GLM-4.6: A member reverse engineered iFlow to enable free GLM-4.6 requests for any OpenAI-compatible tool by running the python file from the folder.
- The community discussed using this with Qwen/Gemini without Docker.
Deepseek 3.1 Bites the Dust: Users reported 404 errors for deepseek-v3.1-base after the provider stopped hosting it, however, members suggested deepseek v3.1 deepinfra and GLM 4.6 as viable alternatives.
- As Grok 4 Fast is no longer free, users were looking for alternatives, and GLM 4.5 Air (free) via Z.ai was suggested to circumvent rate limit errors.
Understanding BYOK on OpenRouter Clarified: Users sought clarity on BYOK (Bring Your Own Key), and it was explained that OpenRouter acts as a proxy to the API (e.g., OpenAI), with users being billed directly by the API provider.
- It was noted that OpenRouter waives its 5% surcharge for BYOK, offering benefits like spend control and fallback options.
Sora 2’s Pricing Scheme Surfaces: The pricing for Sora 2 was revealed: $0.3/sec of video for pro and $0.1/sec for non-pro.
- This sparked discussion around the implications of easily generating deepfakes and methods to bypass watermarks and cryptography.
ByteDance’s Seed Models Spark Interest: Members expressed interest in OpenRouter including ByteDance’s Seed LLM models like Seed 1.6, citing their frontier-level performance and cheap pricing ($0.11 / $0.28 mtok).
- Concerns were raised about the primary host being volcengine.com, a Chinese platform, but the models’ potential is still considered worthwhile.

Cursor Community Discord

GPT-5 and Claude 4.5 duke it out in benchmarks: Users find GPT-5 overengineers solutions and depends on the prompt whereas Claude performs better and faster for many tasks, though there is a divergence in opinion.
- Some users found Sonnet 4 inferior to GPT-5, while others prefer Ultra.
Stealth Cheetah Model Leaps Into View: Members discovered a new paid model called Cheetah, possibly Gemini 3.0, praising its speed, with one user reporting image generation in under 5 seconds.
- One user tested it and said it fixed an issue almost immediately, and another touted its speed.
GPT-5 Pro rings up more expensive bill: The newly released GPT-5 Pro is more expensive than other models like Opus, and users are testing how it works in the Cursor CLI.
- One user reported that GPT-5 Codex makes constant zero diff edits.
Cursor UI Changes Leave Users Confused: Users discussed the disappearing close chat button in the recent Cursor UI changes.
- One member didn’t realize the Cursor logo at the top could close the pane, mistaking it for an agent button.
Background Agents Beset by Bugs!: Multiple users reported that background agents are failing and under investigation, with one noting a different node version being used.
- One user is curious to see if spinning an agent using the API will succeed.

GPU MODE Discord

Arm Flexes Six-Bit Future for AI: Arm announced 6-bit data types for AI support with the OCP MXFP6 format, incorporating new Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME) instructions.
- These additions aim to increase AI model efficiency by optimizing memory use and reducing bandwidth requirements, setting the stage for more capable edge and embedded AI applications.
All2All AMD Kernel Gets a Tune-Up: A user fixed their submission by implementing a missing custom_kernel function, an entry point for running the baseline all2all kernel.
- Timeout issues were addressed by optimizing code, particularly for gemm-rs, where improvements in computation and communication overlap yield a 1.25x speedup.
CUDA graphs get iterative with worklists: A developer sought to improve managing CUDA graphs with while conditions and swapping worklists, aiming to reduce extra device pointer allocations.
- The goal is to optimize kernel computation using CUDA while node, avoiding unnecessary pointer overhead during iterative graph execution.
NVSHMEM Symmetric Heap pointer parley: nvshmem_malloc() performs collective operation, reserving memory and returning a symmetric pointer that allows efficient remote memory access without specific patterns.
- It differs from NCCL as it requires one process per GPU, contrasting NCCL’s single process support, and the symmetric heap improves access efficiency with non-identical pointer values across processing elements.
Locking GPU Clocks for Stable Profiling: Members advise locking GPU clocks to ensure stable profiling results using nvidia-smi --lock-gpu-clocks.
- Using the base TDP frequency ensures stability under full load, preventing slowdowns due to overheating; this approach contrasts with using max frequency, which may cause instability as the GPU heats up.

HuggingFace Discord

ONNX Community Converts Image-to-Text: Members are converting VLMs and multimodal models to ONNX with help and guidance from the ONNX Community.
- Shared conversion tips can be found in the HF Discord channel.
Gemma 1B Powers Offline Prompts: The PromptVault Android app now runs on-device AI using the Gemma 1B IT model through MediaPipe Tasks for offline prompt generation and refinement, available on the Google Play store.
- The app features local and Google Drive (encrypted) backups, offline mode, and experimental AI actions for writing titles, descriptions, and prompt text.
BERT Model Classifies AI Essays: A member trained a BERT model to classify AI-generated texts from human-written ones, using a dataset from Kaggle.
- The member is looking for ideas to speed up training on bigger datasets, as the current dataset is not the greatest.
TrackIO Version Causes DPOtrainer Problems: Members found that pegging trackio to version 0.4.0 resolves a problem in the DPOTrainer/DPOConfig, when following the examples, the related issue.
- This fix suggests that version 0.5.1 is broken.
Duplicating Spaces over Cloning due to GAIA Errors: During a GAIA exercise, a user encountered a 500 error when attempting to clone a space, and was advised to duplicate the space instead.
- The course specifically instructed users to duplicate spaces instead of cloning, but didn’t specify why.

Modular (Mojo 🔥) Discord

Pixi to the Rescue: A user fixed an Access denied error with Pixi by switching to the max-nightly channel instead of nightly.
- Pixi is like a combination of uv and a fast conda implementation.
Mojo’s MAX Aims to Dethrone TensorFlow and PyTorch: A member noted that MAX could potentially replace TensorFlow and PyTorch, inquiring about its current parity and open-source status.
- Another member responded that MAX is near or ahead in inference performance, but it is only partially open source, with plans to open up further early next year.
Mojo GPU Compute Capabilities Go Head-to-Head with CUDA: A member inquired whether Mojo could replicate all functionalities available in CUDA.
- Another member responded that while Mojo currently lacks optimal ways to interact with graphics hardware on the GPU, virtually any computational task should be achievable; feature requests are welcome for any unmet needs.
Threading the Needle: C Libraries and Mojo: A member inquired about the possibility of utilizing C libraries like pthreads within Mojo.
- A member responded that while C libraries generally integrate well, pthreads may not be ideal due to Mojo’s runtime environment and its impact on standard library functions, noting that Mojo’s current concurrency model is incomplete.

Nous Research AI Discord

Sora 2 Bans Copyrighted Content: After surprising members with good anime output, Sora 2 video generation model rewrites prompts and began outright banning copyrighted content overnight, effectively crippling its creative potential according to one tweet.
- Members expressed their disappointment, describing the experience as “speedrunning enshittification.”
LPDDR5X Rocks Ryzen AI Mini PCs: The HP G1a R-AI-Max+ Pro 395 w/ 128GB LPDDR5X was obtained for AI testing, featuring soldered onboard, non-replaceable LPDDR5X which is generally much faster and lower power than DDR5.
- The LPDDR5X allows the DGX Spark and popular Ryzen AI Mini PCs to achieve 8000 Mhz ram speed and a 250GB/s+ bus speed, according to this Samsung Semiconductor link.
Qwen VL Experiences Clock Inaccuracy: Members discussed the quirk of Qwen 2.5 VL, where the smaller model sometimes outperforms the larger one on vision tasks, but suffers from the lost-in-the-middle phenomenon when given text.
- In one clock reading test, it was shown to be missing two clocks outright, and mostly wrong, whereas a local vllm run got 3/5 correct and mostly correct except for flipped digits.
Transformer Learns Multi-Step Loss Function: A member described their experiment using multi-step transformer loss, by choosing the hidden vectors from a select number of the middle layers of the transformer and appending them to the input vectors to do another forward pass.
- They noted it was done on a really dumb gemma 3 270 mit model, but that cli agents opening up access to experiment with training and tinkering with algorithms is actually crazy.
Dragon Hatchling Paper Hatches an Idea: A member shared a link to a paper titled “THE DRAGON HATCHLING: THE MISSING LINK BETWEEN THE TRANSFORMER AND MODELS OF THE BRAIN” (Arxiv link), exploring potential connections between Transformers and models of the brain.
- A discussion of brain models and transformer architectures is ongoing.

Latent Space Discord

DeepSeek Loosens CUDA’s China Grip: DeepSeek is challenging Nvidia’s CUDA lock-in with its FP8 spec and TileLang language, aiming to foster shared standards and a programming bridge for Chinese chip stocks, more on X.
- While this move represents a strategic alignment for China, doubts linger regarding the performance gaps that remain compared to CUDA.
Sora 2 Duckumentary Premieres: OpenAI debuted The Quack: Part 1, a 30-second AI-generated short using Sora 2, generating excitement and duck-themed memes, including the invite code (FRIYAY), linked on X.
- The release underscores the rapid advancements in AI video generation capabilities.
Agentic Browsers Become AI Battleground: AI companies are now competing in the agentic browser space, demonstrated by Claude’s new browser on Last Friday’s AIIA call, with early users noting its capabilities.
- This trend marks a significant shift towards more integrated and AI-driven browsing experiences.
Sora’s Astronaut Horseback Riding Revolutionizes Gen-AI: Pliny celebrated Sora 2 Pro’s advance from last-year’s impossible still-image prompt to an impressive low-gravity video: a realistic horse piggybacking on an astronaut with improvised, spontaneously generated mission-control humor (xcancel.com).
- The discussion highlights rapid gen-AI progress, benchmarks, and inside jokes.
OpenAI’s Medal Bid Interrupted: Reportedly, OpenAI offered $500 million to acquire Medal, a gamer-video platform, to gather training footage, but the deal fell through (xcancel.com).
- Medal is launching an in-house AI division—General Intuition—now closing a $100 million funding round.

Yannick Kilcher Discord

Robo-Romance Foreshadows Future: Discussions about when AI sex robots will become less awkward coincided with the introduction of the Unitree R1, a clumsy but affordable $6k robot (X link).
- The conversation touched on the UK’s age verification requirements with some members anticipating singularity and robot wives.
Discord Data Debacle Disclosed: A Discord customer service data breach was reported by The Verge (The Verge).
- This occurred as safety benchmarks by Dan Chollet (Tweet 1, Tweet 2) and a NIST evaluation on DeepSeek AI models (NIST report) revealed potential shortcomings and risks.
New Paper Explores Low-Rank Gradients: A member shared a new paper, Low Rank Gradients, noting that it involves heavy LA (Linear Algebra), sparking interest and a possible exploration session.
- Another member joked that they like to recruit a specific member with a black cat picture when there is complex math involved, as he is an excellent co-pilot, sharing a cat video.
Diffusion Models Demand Signal Boost: A member suggested framing the problem as a diffusion process, noting the conditioning signal may be under-weighted, and recommended increasing the conditioning/guidance weight.
- Another member agreed, noting that the model is underfitting the background, validating the need for a stronger conditioning signal.
GPT-5 Math Prowess Predicted: Claims of GPT-5 assisting in solving math problems from mathematicians are being shared with links to Tweet 1 and Tweet 2.
- A member noted that the first tweet was posted in August 1st, 2025, implying that it is a futuristic claim.

Eleuther Discord

Humans Evaluate Diffusion Models Better Than FVD: Members discussed evaluation of diffusion models, using FID/CLIPScore for images, manual human evaluation, and automated metrics like FVD for video.
- One member expressed curiosity about video evaluation, noting its relative primitiveness compared to image evaluation methods with Sora 2.
Gemma’s Architecture: Not a Hit: Despite strong performance on the LM Arena, Gemma’s architecture is not as widely adopted as Qwen’s.
- A member posited that training data and finetuning distribution are more significant factors in LLM performance than architecture alone.
Synaptik Core Promises Verifiable AI: Janay from Synaptik Core introduced Synaptik Core, a toolchain for verifiable, long-term memory and auditability in AI systems.
- She shared a LinkedIn post showcasing AI agents and her sprint leading up to the OpenAI Open Model Hackathon (link).
AO3 Subset for Easier Learning Emerges: Members explored creating an AO3 story subset with a simpler grammar structure for easier learning, akin to TinyStories.
- The group considered using a readability score to filter the data, while also acknowledging potential noise removal tradeoffs.
nanoGPT Speedrun Records Tumble: A member shared a LessWrong post highlighting that the nanoGPT speedrun world record dropped by 20% in 3 months.
- This suggests rapid progress is still happening, even at smaller scales.

MCP Contributors (Official) Discord

GitHub Teams Embrace Infrastructure-As-Code: The team migrated GitHub team management to infrastructure-as-code, managing memberships and repository permissions via code at modelcontextprotocol/access.
- The migration aims for community ownership, transparency, an audit trail, and AI-friendly access management, with brief access interruptions expected during deployment.
Versioning Issues Plague MCP Tools: An Intuit engineer is facing challenges with versioning MCP Tools in MCP Servers, especially with dependency management and compatibility at scale, and is seeking collaborators.
- They’ve drafted a SEP with a potential solution, available at modelcontextprotocol/modelcontextprotocol#1575, and are looking for feedback.
Cloudflare’s Code Mode Faces Over-Engineering Accusations: Cloudflare’s Code Mode was discussed, with some suggesting it misunderstands MCP or over-engineers a tool call into a request to a Cloudflare worker as per this blogpost.
- Some members debated whether Code Mode reduce the number of turns an agent needs to deliver a result, while others expressed concerns about performance and compared it unfavorably to web APIs or client SDKs, based on this prototype.
AppsSDK Sparks MCP-UI Overlap Debate: OpenAI released their AppsSDK, bringing UI in ChatGPT with MCP, with TheFork as a launch partner, as per their announcement.
- Members are wondering if engaging with MCP-UI would have been a better move, but OpenAI intends to ensure they feel natural together and will fully support transactions from apps using ACP.
MCP Feature Support Matrix Decoded: A member inquired about the meaning of Discovery in the Feature Support Matrix of the Model Context Protocol.
- Another member clarified that discovery refers to server capabilities and the ability to communicate tool changes.

Moonshot AI (Kimi K-2) Discord

Kimi-latest is not Kimi-K2: The alias ‘Kimi-latest’ refers to the closed, production Kimi large model powering the Kimi assistant, while ‘Kimi-K2’ denotes the open-weights moe family (e.g., k2-0905).
- The ‘proprietary llm’ line on moonshot.ai pertains to the closed stack, separate from K2, despite K2’s UI prominence.
Em Dash Divides Opinion: A user inquired if others disregard messages containing em dashes when not interacting with an AI.
- A respondent admitted to curbing their natural use of em dashes to avoid being mistaken for a bot, expressing their love for Kimi through an image macro.
Kimi Censoring Issues During Translation: A member reported that Kimi’s censoring can cause it to erase its output and replace it with an apology.
- They advised utilizing Qwen for translations due to its million-token context window and superior translation capabilities.
Shareholder Pursues Fun and Profit: A user disclosed owning over one hundred USD in a fund with 8% Alibaba stocks by weight, which holds a ~35% stake in Moonshot.
- The member jokingly asserted that the purpose of life is to maximize shareholder value, eliciting amusement from others.

aider (Paul Gauthier) Discord

Deepseek tests Browser with Claude: A member shared a blog post about Deepseek browser testing with Claude CLI and Chrome DevTools MCP.
- They also uses Deepseek via their Anthropic API in Claude Code and Opencode because it performs better on tool tasks.
Manual Controls are badly needed in Aider: A member expressed that the feature they miss most from Aider is the manual controls on context such as /tokens, /add, /remove, and /clear.
- They argue that for large codebases Aider doesn’t stand a chance without them and none of the other tools have implemented it yet.
Agentic Grep could give Aider Advantage: Members discussed how agentic tools use regex grep to find the parts they need and then do views on the surrounding lines, and that Aider is missing a ripgrep agentic handler.
- Another member agreed, saying that agentic grep will really help make Aider competitive with the current gen of tools.
Aider Prompt Cache Activated by Default: Aider’s models.py now sets cache_control: bool = True and caches_by_default: bool = True for the provider Z.aialso, adding “prompt cache” to the greeting message.
- The new greeting message will be similar to: “Main model: openrouter/deepseek/deepseek-v3.2-exp with diff edit format, 8k think tokens, prompt cache”.
Open vs Closed Weights: Price Concerns Emerge: Discussion pivots to comparing open-source versus closed-source AI model weights, highlighting concerns that owners of closed weights could inflate prices at will.
- Ongoing discussions focus on keeping up with the latest updates and features of the Aider tool, ensuring users leverage its full potential.

DSPy Discord

Neosantara AI opens LLM Gateway: Neosantara AI has launched a new LLM Gateway Platform for building AI apps, providing free access and documentation for DSPy integration.
- New users receive 10k consume tokens monthly upon signup, with feedback directed to [email protected].
Call for DSPy Roadmap beyond Issues: A member requested a DSPy roadmap beyond the existing GitHub issues and changelog, citing recent mentions on X/Twitter.
- Links were shared to Drew Houston’s post and Huyen Chip’s blog on React+Reflection as examples of community engagement.
Elysia helps Agents React: Discussion arose about storing ReAct trajectories on disk vs. memory for longer agent steps, with a suggestion to use Elysia by Weaviate, a DSPy project with a Decision Tree.
- A member is considering implementing React+Reflection to reflect on the react part’s entire trajectory.
Fallback Frustrations Fade: A member inquired about modifying the fallback behavior in DSPy.
- The current hardcoded fallback mechanism is currently unchangeable.
DSPy-ReAct-Machina emerges: A member released DSPy-ReAct-Machina, an alternative ReAct implementation for DSPy that facilitates multi-turn conversations via a single, growing context history, available on PyPI.
- They also shared a blog post detailing the motivation and architecture, seeking community input.

tinygrad (George Hotz) Discord

tinygrad devs enter bounty gauntlet: Those seeking to join tinygrad must participate in the bounty program to demonstrate their skills.
- The tinygrad team clarified that they do not conduct personal interviews or direct hiring.
tinygrad NIR Backend Ready for Review: The NIR backend is now ready for review (PR #12089).
- Engineers are welcome to audit and test the backend’s functionality.
tinygrad to explore match statements: The team is considering using match statements in compiling the pattern matcher.
- This would replace repeated if statements to improve the rendered code.
3dgs Repo Eyes tinygrad Port: The maintainer of a 3dgs repo (LichtFeld-Studio) is planning to remove libtorch and is considering porting tinygrad to C++ for inference and CUDA support.
- A member suggested compiling the model to CUDA or C kernels and then exporting C code with those kernels, linking to an EfficientNet C example.

Manus.im Discord Discord

Lead Generation MCP Server Activated: An MCP server for wholesale lead generation was deployed using wrangler/Cloudflare to scrub a specific government site for undervalued properties and motivated sellers, then break down analytics for an end-buy proposal, available at wholesale-lead-generator-frontend.pages.dev.
- A user quipped, “grab the malware before it gets deleted”.
Manus iOS Client Crashing Bug Squashed: A bug was reported in the Manus iOS client that caused a 100% freeze/crash when selecting text for input in the scheduled task interface.
- A workaround was offered: “Write your command or text in a separate app… Avoid selecting or editing the text within that field”, as well as leveraging Apple’s built-in Shortcuts app and keyboard’s built-in clipboard manager.
Lightweight Contextual AI Tool Explored: A member is exploring a lightweight solution for AI tools that lack personal context to connect to user-chosen data, identify knowledge gaps with targeted questions, and understand user objectives to be used in any tool.
- The member is actively seeking collaborators and early adopters within the Bay Area to help refine and test the solution.

MLOps @Chipro Discord

AI Summit Spotlight on Feature Stores: The Feature Store Summit is scheduled for October 14th at 8:30 AM PT (5:30 PM CET), showcasing speakers from Uber, Pinterest, Zalando, Lyft, and Coinbase.
- Discussions will encompass infrastructure for AI, ML, applications that demand massive scale and real-time prowess, alongside trends slated to shape Feature Stores in 2025; registration is available via this link.
AI Summit to explore Real-Time Engineering at Scale: The upcoming summit’s talks will delve into real-time feature engineering at scale, plus vector databases and generative AI in production.
- Additionally, the balance of batch and real-time workflows will be explored.
AGI: a Meaningless Blanket Term?: A member questioned the definition of AGI, dismissing it as a meaningless blanket term without a clear standard of generality.
- They also noted the absence of a reliable measurement for human intelligence.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1252 messages🔥🔥🔥):

Perplexity Student Offer, Comet Browser, ChatGPT Cheating, Gemini for Studying, Water Usage by AI

Students Scramble for Sweet Perplexity Subsidies: A member shared a Perplexity student offer, but cautioned it requires active student status, leading to transparent attempts to get referral bonuses.
Comet Browser: Study Buddy or Cheat Sheet?: Members discussed the Comet browser, with one describing it as a good place to study, while another jokingly called it a good place to cheat.
- Opinions on Comet’s assistant were mixed, with some finding it lacking compared to ChatGPT or Gemini, while others praised its voice mode.
Is AI Thirsty? Discussing Water Consumption: Members debated the environmental impact of AI, including water usage for cooling data centers, with one suggesting that 20 answers from ChatGPT costs around 0.5 liters of water.
- One member stated that locals near AI data centers are facing hardships, while another said the future involves putting data centers in space to release heat.
Gemini 3 Leaks: October 9th Launch?: Chat members cited reports from early testers and industry insiders of a likely official release date of October 9, 2025, following thorough testing on platforms like AI Studio.
- One member hoped it would be available only to AI Ultra users.
Hunyuan 3: First Impressions: A user gave his first impressions on Hunyuan 3, the new image generator from Tencent, finding it better than Qwen or Flux and comparable to Imagen.
- Another mentioned that these Chinese models often struggle with translation and accurate depiction of nuanced cultural details and context when translating into English.

OpenAI ▷ #annnouncements (4 messages):

GPT-5 Instant, Sam Altman Keynote, Startups using OpenAI tools

GPT-5 Instant Gains Distress Recognition: OpenAI is updating GPT-5 Instant to better recognize and support users in moments of distress.
- Sensitive parts of conversations will now route to GPT-5 Instant to quickly provide helpful responses; ChatGPT will continue to tell users what model is active when asked.
Sama’s Keynote Streams Live: Sam Altman’s keynote is streaming live, accessible via a provided link to the OpenAI website.
- DevDay [2025] starts tomorrow at 10am PT.
Startups Leverage OpenAI Tools: Startups like @Cursor_ai, @getSchoolAI, @AbridgeHQ, and @Jamdotdev are joining OpenAI to share how they’re using OpenAI tools to transform their industries.
- The event starts at 11:25am PT and will be available via a link to the OpenAI podcast.

OpenAI ▷ #ai-discussions (1074 messages🔥🔥🔥):

Gemma 3 12B multimodal, Arduino AI microcontroller, ChatGPT Business free, AI voice replication, Sora 2 low-rez quality

Arduino Develops AI Microcontroller: Arduino is developing an AI-enabled microcontroller similar to an NPU-based ARM, speculated to be a smaller Hailo-8-like device.
- With the Hailo-8L on Raspberry Pi, one user reported object recognition at 2000+ FPS for less than 2 watts, demonstrated by mounting it on a UGV Waveshare toy robot tank.
Broke-ies Get Free ChatGPT Business: Users are discovering they can get a free month of ChatGPT Business with up to 5 seats by using a direct link to the offer, then it bills at $30/seat.
- The free access has led to discussions about generating unlimited accounts and the value of Codex within the business plan.
OpenAI Voice Upgrades and AI Voice Cloning: Members noted OpenAI has added Internet search to its voice model, making it faster and deeper, while others shared AI voice replication clips, with ElvenLabs touted as the best.
- Discussion involved concerns about the deepfake era and the need for awareness that anything can be generated, with one user listening to an AI Chris Hitchens daily; while others are nervous, as many people believe everything they see on the internet.
Sora 2’s Shoddy Showings: Users noted that Sora 2’s lower resolution quality and watermarks make it feel cheap.
- Many claim that an OSS 20B model was better.
Grok Gains Ground with Waifu and Video: SuperGrok subscribers can talk to Ani on mobile (waifu feature), with surprising quality.
- Users shared Grok-generated videos (prompted with Imagine Spongebob finding out he is banned on Sora), though some found them slop-worthy, copyright infringing, and lacking a coherent story.

OpenAI ▷ #gpt-4-discussions (19 messages🔥):

GPT Instant, Error in the stream, GPT-4o switch, GPT publishing review time, NSFW filter

GPT Instant Elicits Preference: A member asked how to force ChatGPT to use only GPT Instant and not thinking mini.
- Another member jokingly suggested, Threaten it.
Stream Error vexes GPT-5 users: A member reported getting a weird Error in the stream message when messaging GPT-5, but not GPT-4.
- No solution to the Error in the stream message was suggested.
GPT-4o Toggle Troubles: A member reported that they seem unable to switch to GPT-4o and described the toggle as some kind of placebo.
- No solution to the GPT-4o placebo toggle was suggested.
User Experiences NSFW Filter Fails: A member inquired if anyone’s ChatGPT suddenly refuses to comply with bypassing the NSFW filter.
- A member responded that bypassing safety mechanisms is forbidden in the TOS.
ChatGPT App Responds Poorly on iPhone: A member reported issues using the GPT app on their iPhone where it just doesn’t work and never responds.
- They reported that they tried deleting and reinstalling the app, but it only works on the web.

OpenAI ▷ #prompt-engineering (57 messages🔥🔥):

Sora Prompts for Infographics, Copyright Restrictions on Generating DBZ Content, Realistic POV Horror Video Prompts, Tailoring Resume for ATS with AI, Minimalistic Communication Style for ChatGPT

Sora struggles creating Json context profile: A member is trying to create a consistent json context profile for infographics made with Sora for health studies.
- They are also requesting prompts for Pokemon, Demon Slayer, Hunter x Hunter, Jujutsu Kaisen and video ia de Makita.
Copyright Censors Creation of DBZ Content: A user asked for help creating a prompt for Dragon Ball Z (DBZ), but a member responded that the new copyright restrictions prevent generating copyrighted content.
- They advise describing Goku and Vegeta, the world, animation style, clothing, and tone, but cautioned that it may not closely resemble DBZ.
Crafting POV Horror Video Prompts: A user seeks assistance in creating realistic, first-person perspective (POV) horror videos, as their current prompts result in video game-like quality.
- A member suggested specifying the type of image desired to avoid unwanted videogame/cartoon styles.
ATS-friendly Resumes tailored by AI: A member asked for help writing a robust prompt to tailor a resume for job roles with a higher Applicant Tracking System (ATS) score.
- Another member shared “The_ATS_Resonance_Engine.md” which is an interesting prompt engineering framework to tailor a resume.
ChatGPT Instructed to be Minimalistic: A member shared a prompt for instructing ChatGPT to adopt a strict, minimalistic communication style, eliminating friendliness, elaboration, or casual interaction.
- The goal is for ChatGPT to act as a cold, terse assistant focused solely on providing direct, accurate answers without unnecessary conversation.

OpenAI ▷ #api-discussions (57 messages🔥🔥):

Saving Art Prompts, Sora and JSON Context for Infographics, Copyrighted Content Generation, Improving Video Quality, Tailoring Resumes with AI

Art Prompts Filing Fortes: Members discussed various methods for saving art prompts, including Google Sheets, local markdown files, and chat threads within ChatGPT.
- One user mentioned using project folders in ChatGPT to separate different render groupings.
Sora’s Struggle with Scientific Styles: A user reported difficulties achieving consistent JSON context profiles for health study infographics using Sora.
- Another user suggested being specific with camera angles when prompting ChatGPT for Sora prompts.
Copyright Cravings Cut Short: Users noted that generating copyrighted content has become restricted, likely due to legal threats and that users must describe ideas without referring to copyrighted characters or names.
- A user lamented the limitations, expressing hope for future tools that will circumvent the restrictions.
Horror Video Help Desk: A user requested help in generating realistic POV horror videos, as the output was sometimes video game-like quality.
- A member suggested specifying the type of image desired to prevent the model from guessing and potentially opting for a videogame/cartoon style.
ATS Ascent Ace Assistant: A user requested assistance in writing a robust prompt to tailor their resume for job roles with a higher ATS score.
- Another user responded with a link to “The ATS Resonance Engine.md” file.

Unsloth AI (Daniel Han) ▷ #general (842 messages🔥🔥🔥):

GLM 4.6 on 5090, B200 rental, Model Quality vs Precision, Platform Sidebars

GLM 4.6 Fits but Crawls on 5090: A user inquired if they could run GLM 4.6 on a 5090 with 200GB RAM, suggesting that Q4_K_M and below would fit but perform slowly.
- Another user confirmed it should fit, but it will be slow.
Single B200 on-demand is Missing in Action: Users discussed the unavailability of single B200 on-demand rentals after DeepInfra removed the option, now only offering 8x B200 setups.
- One user suggested looking at Modal for single B200 on-demand options, while another inquired about availability on Thundercompute.
Quantization Quality Questioned: A user asked if a model would maintain the same intelligence after quantization, particularly concerning Q4_K_M quality.
- One user replied that while quality decreases compared to full precision, Q4_K_M is fairly good, though it depends on the model and use case.
New Unsloth Mod is Crowned: A new mod was announced, with members reacting to the new role and channel permissions.
- Said one user, I thought there will be a secret channel. Imagine my disappointment.
User Requests Context-Awareness advice: A user is building an AI statistical app and asks advice on models that are context-aware.
- The user is looking to provide the app with local database access and wants the model to answer questions relating to the database. They were advised to research RAG (Retrieval-Augmented Generation) and use models such as ibm-granite/granite-4.0-h-tiny.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (12 messages🔥):

Support channel permissions, Auto moderator issues, Introduction of AI-ML researcher

Navigating Support Channel’s Gatekeeping: A user inquired about gaining posting permissions in the support channel, only to find their attempts blocked despite apparent access.
- It was quickly identified that the auto moderator was preventing the user from posting, leading to intervention from another member.
Auto Moderator’s Iron Grip on Support: Members discovered the auto moderator was the cause of posting restrictions in the support channel.
- A member stepped in to relay the user’s question after identifying the auto moderator as the culprit.
AI-ML Researcher Enters the Chat: An independent AI-ML researcher and developer, leveraging Generative AI models, cloud integrations, and scalable platforms to optimize business operations, introduced themself.
- This person was welcomed by other members of the channel here.

Unsloth AI (Daniel Han) ▷ #off-topic (800 messages🔥🔥🔥):

AI Music Detection, AI teaching Music, Suno and Udio music AI, Discord Security Incident, GPTS Agent

AI Music Detectives Needed: A member inquired whether it’s possible to distinguish clear J-pop from anime songs using AI, similar to how AI can differentiate AI music from human music.
Discord Suffers Security Scare: Discord notified users of a security incident on September 20, where an unauthorized party gained limited access to a third-party customer service system, potentially exposing personal data and government ID images.
- The breach may be linked to social engineering of the Trust & Safety team, prompting discussions on digital IDs and data storage practices.
Data Quality Dilemmas: A discussion covered the complexities of defining high-quality data for machine learning, with one member noting that graduation norms flailing indicate a noisy dataset.
- Techniques for identifying and handling noisy data in audio processing were explored, noting the difficulties of establishing reliable benchmarks.
Meta and Google react to ChatGPT integrating Apps: Members discussed OpenAI’s integration of apps into ChatGPT, anticipating responses from Meta and Google.
- Some suggested this move could prevent others from simply creating GPT wrappers, while others speculated on Apple and Google’s potential OS-level integrations.
Unsloth gets a new mod: The community celebrated the addition of a new moderator, congratulating them on their new role.

Unsloth AI (Daniel Han) ▷ #help (221 messages🔥🔥):

llama.cpp Compilation, GGUF Output, vLLM with Gemma, Axolotl Fix, FA3

Unsloth’s Amigo Compiles llama.cpp: A user, who is actually a C++ engineer new to AI, successfully compiled lama.cpp using new cmake instructions with guidance, and stated that he is pretty close
- Another member advised to check documentation regarding maximum_memory_usage to avoid further compilation issues.
FA3 Efficiency with Other Training Frameworks: A team member mentioned that while other training frameworks can be used, users must disable FA3, noting Unsloth remains the most efficient option.
- Another team member stated yes kind of, for other training frameworks you must disable FA3, but neverthless, we are still the most efficient
Orpheus TTS Fine-Tuning Debugging Marathon: A user had issues loading a local dataset for fine-tuning the Orpheus 4B TTS model, encountering a ‘torchcodec.decoders.AudioDecoder’ object is not subscriptable error, and sought community help to resolve dataset loading issues.
- After extensive debugging, the user pinpointed that downgrading datasets to version 3.6.0 fixed the issue, allowing successful dataset loading and training.
Vision Model Image Resizing Debacle: A user found that the UnslothVisionDataCollator was unexpectedly truncating tokens in their vision model, due to the collator resizing images to 512, which the manual processing step did not.
- A member clarified that resizing to 512 is due to the model lacking a default image size in its config, and advised passing desired sizes to avoid the issue.
GPU issues solved with WSL and Docker: A user on Ubuntu 5090 experienced performance degradation in steps/s during training, while a team member suggested using WSL or Docker for Unsloth.
- The team member shared a link to Docker setup instructions, and encouraged the user to provide detailed logs for further assistance.

Unsloth AI (Daniel Han) ▷ #showcase (12 messages🔥):

Advanced AI Safety Notebook, SFT + GRPO, Structured outputs, Qwen Moe, TNG dataset

AI Safety Notebook Talk Shared!: A member shared a GitHub discussion on an advanced AI safety notebook, following up on a DM chat, hoping it serves as a useful, runnable example for others exploring SFT + GRPO or working with structured outputs.
Six-Model Qwen Moe Shows Off!: A member showcased training 6 separate models on 2 datasets each using a custom-built 6B Qwen 3 architecture (55 layers, 607 tensors), then “moe’d” them into a 6X6B Qwen Moe, 36B model (27B compressed), built using Unsloth.
- Special thanks were given to Unsloth (now over 30 models fine tuned and counting), Team Mradermacher for the quants, and Team Nightmedia for MLX and collab’ing on many parts of this model project.
Dataset Deets Disclosed!: Datasets were created in house to ensure max quality/model tuning results after extensive testing, with training all done with Unsloth on a local machine; the “BASE” 6B model (JanV1, Qwen 3, 256k context, W adapter) for training is available at DavidAU/Qwen3-MOE-6x6B-Star-Trek-Universe-Alpha-D-256k-ctx-36B.
TNG Dataset Tuning Triumph!: The use of the TNG dataset improved the model’s coding abilities, and the models were benchmarked and tested for multiple metrics where possible to test tuning, as well as non-targeted improvements.

Unsloth AI (Daniel Han) ▷ #research (16 messages🔥):

Qwen MoE, Unsloth Dynamic 4bit, Sequence Packing, Flash Attention, torch.compile

MoE-Money MoE Problems?: A member suggested a 16B parameter MoE model for finetuning with Unsloth dynamic 4bit while noting that high sparsity MoEs are common when trained with token choice.
- The discussion seemed to imply that this is only for the Qwen model architecture.
Sequence Packing + Flash Attention: Fintuning Breakthrough!: A member reduced epoch training time from ~30 minutes to ~4-5 minutes (~500 mio tokens) with sequence packing and Flash Attention, resulting in 100% GPU utilization without CPU interruption.
- Flash Attention required manual compilation, pushing the workstation to its peak memory usage.
Torch Compile Conundrums Cause Consternation: A member reported issues with torch.compile, experiencing problems with both Python 3.12 and 3.10 and eventually disabling it.
- It was suggested to try aot_eager_decomp_partition as a less aggressive alternative to inductor.
ml-cross-entropy misses mark, much slower: A member found that ml-cross-entropy was 3-4x slower than their own PyTorch-based linear cross-entropy implementation, despite aiming to reduce GPU memory bandwidth usage.
- They were hoping that shaving off some MB would compensate for some computation overheads.
Apex Ascends: Fused Optimizers for the Win: A member reported success using apex with fused-adam and FusedRMSNorm, noting they provided a nice boost to performance.
- They also shared their implementation of linear cross-entropy.

LM Studio ▷ #announcements (1 messages):

LM Studio 0.3.29, OpenAI /v1/responses compatibility, LM Studio CLI

LM Studio Channels OpenAI Compatibility with v1 Responses: LM Studio 0.3.29 introduces OpenAI /v1/responses compatibility API.
- This enables developers to integrate LM Studio with applications expecting the standard OpenAI API format for responses.
LM Studio CLI Gets Variant Listing: A new command-line interface (CLI) feature is added to LM Studio with the lms ls --variants command.
- This allows users to easily list their local model variants directly from the terminal.

LM Studio ▷ #general (709 messages🔥🔥🔥):

AGI, Qwen3, LM Studio overload protection, Simulating Sentience, Mark Zuckerberg location

LM Studio has Overload Protection: LM Studio has built-in protection against system overload, so nothing bad will happen if a computer isn’t capable of handling an AI model.
- Even deactivating the protection will only result in a system crash, not bricking, though one user jokingly wished for their computer to catch fire.
Simulating Sentience using Multiple Layers of LLMs is discussed: A user inquired about simulating sentience using multiple layers of LLMs, including input models, looping models, emotional response models, inner monologue models, reasoning models, and multiple vector databases.
- Another user responded that we already simulate this and don’t need four different models to do one task.
LM Studio cannot fetch models from Hugging Face under a custom root certificate: A user asked about why LM Studio cannot fetch models from Hugging Face under a custom root certificate.
- Another user suggested trying their proxy in the app settings.
AI is Gonna Burst, say members: One member said that AI is going to have a bubble burst that makes the .com bubble look like nothing, pointing to a video regarding the Nvidia, Oracle and OpenAI money circle.
- Another user replied that short of it is Nvidia is giving money to openai, openai buys compute from oracle, oracle buys from Nvidia, adding If one of them fails, it may cause a snowball.
Memento MCP for knowledge injection: A user asked about the Memento MCP that links to Neo4j, noting that it seems more advanced but has a significant overhead.
- Another member stated that using the toolcalls is not worth it for his local uses, although added that if there was a plugin that injects knowledge stuff into system prompt for example without toolcalls that would be great.

LM Studio ▷ #hardware-discussion (328 messages🔥🔥):

GPT-OSS-120B, Ryzen AI max performance, Distributed inference setup, AMD vs NVIDIA, Vulkan versus CUDA for AMD

GPT-OSS-120B Framework Lands with 128k Context: A member reports that the GPT-OSS-120B framework has landed, boasting a 128k context window and running at 19.76 tokens/s on Ryzen AI max+ hardware.
- Other members follow up asking about the exact hardware specs.
Distributed Inference Setups Sizzle: A member shares a successful distributed inference setup using 8x 3090s over WiFi across 3 nodes and achieves approximately 5.5k prompt processing and 23 tokens/s output for GLM 4.5 air at full 8-bit precision.
- They plan to double the speed by using 2 nodes at 4/4 once new parts arrive.
AMD GPUs still need to fix their act: Members debate the price and performance of AMD GPUs versus NVIDIA, with discussion on whether AMD is behind on software support, one member stating that they should start by creating a decent software for using their GPU accelerators for something else other than gaming and having it work too.
- Reference is made to this link to try the MI250.
Laptop AI Limitations Lamented: Members discuss the limitations of using laptops for AI tasks, citing weaker hardware (less bandwidth, less VRAM, less cooling).
- The general advice is to get as much VRAM as possible, and the best choice of all is still a Macbook Pro.
Vulkan > CUDA on Older GPUs?: A member found Vulkan significantly outperforms CUDA on an NVIDIA P40 (63.02 tokens/s vs 23.31 tokens/s) with the model qwen/qwen3-30b-a3b-2507 (Q4_K_M) in LM Studio.
- Members suggest the P40’s outdated drivers and better support of Vulkan for older cards may be the reason.

OpenRouter ▷ #app-showcase (4 messages):

Reverse Engineering iFlow, GLM-4.6 Requests, Qwen/Gemini without Docker

iFlow Reversing Exposes Free GLM-4.6: A member reverse engineered iFlow to enable free GLM-4.6 requests for any OpenAI-compatible tool.
- The member confirmed that running the python file from the folder would work.
Qwen/Gemini Usable Without Docker?: A member inquired about using the reverse engineered iFlow with Qwen/Gemini without Docker.
- The member confirmed that running the python file from the folder would work.

OpenRouter ▷ #general (392 messages🔥🔥):

Multimodal Model Recommendations, Deepseek 3.1 availability, BYOK setup and use, Free models alternatives to Grok, Sora2 pricing

Gemini Flash and Llama 4 Maverick shine for structured output: Users sought a free multimodal model with structured output, and it was suggested Llama 4 Maverick and Gemini Flash are viable options.
- Kyle shared a python code snippet using the OpenAI library to obtain base64 data from the Gemini API.
Deepseek 3.1 bites the dust, users search for alternatives: Users reported 404 errors for deepseek-v3.1-base and were informed that the provider stopped hosting it.
- As an alternative, members suggested deepseek v3.1 deepinfra and GLM 4.6.
Understanding BYOK on OpenRouter clarified: Users were confused about how BYOK (Bring Your Own Key) works, others explained that OpenRouter acts as a proxy to the API, such as OpenAI, and the user is billed by OpenAI directly.
- OpenRouter waives its usual 5% surcharge when using BYOK, offering convenience and features like spend control and fallback options.
Grok 4 Fast taken off, users search for alternatives: With Grok 4 Fast being no longer free, users were looking for alternatives, stackedsilence suggested Deepseek v3.1, whilst others pointed to cost effective choices such as the new Deepseek 3.2 or GLM 4.6.
- Later in the discussion, GLM 4.5 Air (free) was suggested as the best free model that won’t have rate limit errors, especially when using Z.ai as the provider.
Sora2’s pricing scheme revealed!: The pricing for Sora 2 was revealed, with pro costing $0.3/sec of video and non-pro costing $0.1/sec, which prompted discussions around the implications of easily generating deepfakes.
- It was suggested methods to bypass the watermark and cryptography, potentially leading to misuse.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (50 messages🔥):

ByteDance Seed LLM models, Inference providers pivoting, GPT 5 pricing, Meta's frontier lab status, Grok fast training data

ByteDance’s Seed Models Spark Interest: Members are curious if OpenRouter will include ByteDance’s Seed LLM models like Seed 1.6, noting their potentially frontier-level performance and cheap pricing ($0.11 / $0.28 mtok).
- Concerns were raised about the primary host being volcengine.com, a Chinese platform, but the models’ potential is still considered worthwhile.
Inference Providers Pivot Perilously: A member wondered if others are tracking inference providers who have pivoted away from primarily offering inference, citing Kluster and Inference.net as examples.
- Someone else quipped that these providers are “dead to me.”
GPT-5 Pricing Rumors Fly Fast: Speculation arose around potential GPT-5 pricing, with one person linking to an X post suggesting Sonnet-quality speed but no reasoning capabilities.
- Discussion considered the possibility of a new GPT-5 Checkpoint and whether it’s really Gemini 3 Flash.
Meta Flexes Frontier Lab Muscles: Members debated whether Meta is positioning itself as a frontier lab by doing parallel test time compute like Pro/deepthink models.
- Some expressed skepticism, with one noting that Google is unlikely to charge for a stealth model like Flash, as they typically provide significant free compute.
Sora 2 API coming to OpenRouter?: The community spotted what looks like the Sora 2 API potentially coming to OpenRouter after some images were posted from a presentation.
- One member quipped, “Sora 2 in OpenRouter like nano banana? Or nah”

Cursor Community ▷ #general (441 messages🔥🔥🔥):

GPT-5 vs Claude, Cursor billing changes, Cheetah model, GPT-5 Pro, Cursor UI changes

GPT-5 versus Claude 4.5: Duel of the Titans: Some members felt that GPT-5 depends more on the prompt and overengineers solutions, whereas Claude performs better and faster for many tasks.
- Some members find the performance of Sonnet 4 inferior to GPT-5 while others swear by the Ultra model, so it varies person to person and task to task.
Cheetah Model Runs Wild, Pricing Released: Members discovered a new paid stealth model called Cheetah and suspected it might be Gemini 3.0 because it’s very fast.
- One member tested it and said it fixed an issue almost immediately, and another touted its speed, with one saying it generated an image in less than 5 seconds.
GPT-5 Pro Enters Ring, More Expensive Than Opus: Members discussed the newly released GPT-5 Pro and noted its high cost compared to other models like Opus.
- While some found GPT-5 superior to Claude in the Cursor CLI, one user reported that GPT-5 Codex makes constant zero diff edits.
Users Grapple with Cursor UI Changes: Members discussed the recent UI changes in Cursor, specifically the disappearance of the close chat button.
- One member admitted they didn’t realize the Cursor logo at the top could close the pane because it looked like an agent button.
Student Plan Saga: .edu Domains Required: Members discussed the limitations of the student plan, which requires a .edu email address for verification.
- One member shared that they contacted support to allow users with their school’s domain to access a Google event for NLP, with the suggestion that others do the same.

Cursor Community ▷ #background-agents (5 messages):

Background Agents Failing, Different node version being used, Feature Board App Development, Automazeio CCMP Framework, Custom VM Snapshot not being picked up

Background Agents Failing!: Multiple users reported that background agents are failing, and this issue is under investigation.
- One user noted that the agent chokes on their startup command because it uses a different node version, and doesn’t appear to use their Dockerfile, instead starting in another container with random configuration.
Developing Quick Feature Board App: One user is developing a quick feature board app and working through limitations of git worktree.
- They are considering whether Sonnet through Cursor can handle spawning 3 dev agents, 3 reviewer agents, and 3 PM agents to vet the requirement with the implemented capabilities, and mentioned the automazeio/ccpm framework as a possible alternative.
Custom VM Snapshot Failure: A user reported that their background agents are not picking up the custom VM snapshot.
- Another user is curious to see if spinning an agent using the API will succeed.
API Lacks Snapshot ID: A user noted that it doesn’t seem possible to specify the snapshot ID when launching a BA via the API.
- They referred to the Background Agents OpenAPI documentation to support their observation.

GPU MODE ▷ #general (25 messages🔥):

Hopper vs Blackwell SM quadrants, Futhark, 2:4 sparse training, VRAM hacking, CUDA skill set

Hopper’s Quadrant Design in Question for Blackwell: Based on this blog, Hopper GPUs divide SMs into 4 quadrants with a tensor core each, enabling execution of 4 warps per clock cycle; a member questions if this remains in Blackwell.
- Another member explains that starting from Ampere (and likely even Volta), each SM has 4 quadrants each with its own Warp Scheduler.
“Futhark” Discussion Kicks Off: Members started a discussion about Futhark: High-performance purely functional data-parallel array programming
- One member noted that it will be interesting for programming languages nerds.
Sparse Training Tutorial Speedup Needs A Boost: One member asked for resources on doing 2:4 sparse training in PyTorch, noting they didn’t see a speedup with the tutorial’s code example until manually modifying the sparse tensor behavior with SparseSemiStructuredTensor._FORCE_CUTLASS = True from torch.sparse.
VRAM Hacking for 1660 Super Not Recommended: A member inquired about VRAM hacking a 1660 Super to 12GB.
- Another member suggested the <#1349152646484987974> channel but noted that most of the server’s focus is on software optimizations rather than hardware mods.
FA3 Kernel Depth Appears Uncommon: A member expressed surprise at the depth of knowledge required to make things like FA3 work, wondering how common this skill set is, even among CS grad students.
- Another member pointed out that it’s a rare skillset, highlighting that the author list was quite small but that many talented performance engineers working at large companies, with the takeaway message to trust the process.

GPU MODE ▷ #triton (7 messages):

TLX Team at Meta, Triton Conference

Meta’s TLX Team in the Spotlight: The engineering manager for the TLX team at Meta mentioned that they’ve been working in TLX.
- It was also mentioned that the engineers on the team don’t check this Discord channel often.
Triton Conference to Showcase TLX: The TLX team will be presenting TLX at the Triton conference this October.
- The team is open to questions about TLX for interested parties.

GPU MODE ▷ #cuda (36 messages🔥):

PTX .aligned meaning, Tensor Cores Matrix Multiply, NCU on Vast AI GPUs, CUDA graphs with worklists, Blackwell supports 256bit loads

Warp Execution with PTX .aligned: The .aligned directive in PTX means a whole warp is expected to execute the instruction at the same time, according to NVIDIA’s documentation.
Diving into Matrix Multiply with Tensor Cores: For an introduction to using tensor cores for custom matrix multiplication with sparsity, members suggest this post and recommend another more introductory post and this one.
NCU Permission Problems on Cloud GPUs: Users are facing permission errors when trying to use NCU on rented hardware like Vast AI, and there appear to be no readily available docs for profiling with NCU on their site.
- Generally, cloud vendors don’t provide sufficient permissions for using NCU and hardware counters, unless the owner of the hardware enables them; a working group may form to convince cloud vendors to allow this or identify which ones already do.
CUDA Graphs Looping with Worklists: A developer is seeking a better way to manage CUDA graphs with a while condition and two worklists that swap across iterations, and is currently using extra device pointers for swapping.
- They are looking for more efficient ways to avoid allocating extra device pointers and passing the device pointers to the kernels that perform computation when using CUDA while node.
Blackwell’s Bloated Bandwidth Bonanza: Someone shared that Blackwell architecture supports 256-bit loads.

GPU MODE ▷ #cool-links (3 messages):

404 Error, GPU Performance Engineering, Arm Architecture Developments 2025, 6-bit data types for AI, Scalable Vector Extension (SVE)

Harvard’s GPU Perf Blog Link Gets a Facelift: A member reported getting a 404 error, another member corrected the URL to Harvard’s GPU performance engineering blog.
- The corrected link leads to a detailed blog post on GPU performance engineering.
Arm rolls out 6-bit data types for AI: A member shared a link about Arm Architecture Developments 2025, highlighting support for 6-bit data types for AI via the OCP MXFP6 format from the Open Compute Project.
- The update includes new Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME) instructions, improving efficiency in AI models by reducing memory use and bandwidth needs.

GPU MODE ▷ #jobs (4 messages):

Postdoc position at ISTA, DASLab, QUTLASS, Quartet

ISTA’s DASLab Seeks Postdoc Superstar: The Deep Algorithms and Systems Lab (DASLab) at the Institute of Science and Technology Austria (ISTA) is seeking a postdoctoral researcher to advance efficient machine learning systems.
- The position requires a PhD with a strong background in high-performance computing and extensive GPU programming experience, contributing to open-source projects including QUTLASS and Quartet; applicants should send a CV and short motivation statement to [email protected].
ISTA Postdoc Perks and Details Exposed: Some more info regarding post-doc positions at ISTA can be found at their working conditions page.
- It was described as tempting by one member.

GPU MODE ▷ #beginner (12 messages🔥):

CUDA programming, CMake versions, GEMM vs cuBLAS, NVIDIA Data Center Failure Analysis

CMake version too old, update required!: A user encountered a CMake error indicating that their version was too old (pre-3.5) and needed to be updated.
- Another user suggested downloading a more recent version (e.g., 3.31) from Kitware’s download page.
GEMM project advice: A member suggested creating a GEMM project to compete with cuBLAS for a specific architecture.
- The user said That alone puts you far above people who can “write CUDA”.
CUDA lectures aided noob: A new CUDA programmer thanked another for lecture recommendations that helped them a lot and shared a GitHub link for CUDA learning.
- The user then requested help finding lecture slides/code for lectures 15 and 16 that were missing from the GitHub repository.
NVIDIA Employee may help with CUDA logging: A member who works at NVIDIA for Data Center Failure Analysis offered to investigate conceptual side issues with CUDA logging.
- The user stated they could take a look.

GPU MODE ▷ #pmpp-book (1 messages):

PMPP Code Style, Future Editions of PMPP

PMPP Favors C Style for Broad Appeal: Members noted that PMPP maintains a C-style codebase to maximize audience accessibility.
- Speculation arose among members that this approach may evolve in subsequent editions.
PMPP Edition Evolutions: A discussion mentioned the current C-style approach might be revisited in later PMPP editions.
- The potential shift suggests an adaptation strategy to balance accessibility and modern coding practices.

GPU MODE ▷ #torchao (1 messages):

aguunu: thanks!

GPU MODE ▷ #irl-meetup (1 messages):

josephtracyvoltagepark_53706: I am going to the PyTorch conference! would love to meet up

GPU MODE ▷ #triton-puzzles (4 messages):

Triton Interpreter, Numpy version compatibility, Triton-Puzzles

Numpy Versioning Nightmare for Triton Interpreter!: A user found the Triton interpreter acts finicky, working as expected only with numpy versions <= 2.0.
- They suggested fixing it by tweaking the installation as detailed in their Triton-Puzzles notebook.
Triton Interpreter Needs Specific Numpy: A user explained that the interpreter uses numpy, so there seems to be some dependency on a particular numpy version for the interpreter to work, to explain why Triton interpreter is not correct, strange.
- It seems version 2.0 is required.

GPU MODE ▷ #rocm (12 messages🔥):

MI300, rocm-compute-viewer, warp specialization for AMD GPUs, wavefront partitioning in Triton, rocBLAS

rocm-compute-viewer validated for MI355: A member confirmed that rocm-compute-viewer works on the MI355, but it isn’t integrated with stochastic sampling yet.
- It’s also confirmed to work for the MI300 family.
AMD GPU Kernels and Warp Specialization: Documentation drought: A member inquired about open resources on warp specialization for AMD GPU kernels.
- Another member mentioned that there isn’t anything available as comprehensive as this blog post, but ongoing efforts exist for wavefront partitioning in Triton issue #8281.
rocBLAS is not using wavefront specialization: A member expected rocBLAS to be the one to start using wavefront specialization at a HIP-level.
- It was noted that warp specialization typically doesn’t perform well on AMD due to the absence of warpgroup instructions.

GPU MODE ▷ #lecture-qa (1 messages):

seb3523: Sure why not 🙂

GPU MODE ▷ #self-promotion (3 messages):

Cute-Bench Package, Prefix Sum and Kogge-Stone Algorithm, Tiny MoE Optimization

Cute-Bench Package Benchmarks Kernels: A member introduced cute-bench, a Python package that facilitates easy installation and benchmarking of kernels with pip install -U git+https://github.com/NTT123/cute-bench.git.
- The package includes functionalities for benchmarking with torch profiler and CUDA events demonstrated through code examples, returning kernel measurements.
Prefix Sum Algorithm Visualized: A member shared a YouTube video explaining the prefix sum and Kogge-Stone algorithm, including complete code implementation.
- The video creator welcomes all criticism as they are new to video creation.
Tiny MoE Optimization Presentation: A member optimized a tiny 0.5B parameter MoE (Qwen style) and will be presenting on it for free in this Maven link.
- They reduced training time from over 60 hours (or 23 hours with DDP) to 13.4 hours using the profiler, one fused GEMM, and other tricks, stopping shy of using CUTLASS.

GPU MODE ▷ #🍿 (3 messages):

LLM generated kernels, Sakana CUDA engineer

Alexander joins Popcorn Manifesto: Alexander (github.com/zanderjiang) expressed interest in LLM generated kernels and is keen to contribute to the Popcorn Manifesto project.
- He has been working on projects around reliable code-generation, robust kernel benchmarking and evaluation, and agents/frameworks/tools for kernel generation.
Sakana CUDA Engineer doc hunt begins: A member is seeking the original PDF or Wayback Machine capture of the Sakana AI CUDA engineer document, which has been retracted from the internet.
- Another member mentioned that people are just citing the Twitter posts instead.

GPU MODE ▷ #thunderkittens (3 messages):

B200 multi-GPU cudaMemcpy, TMA vs load/store Performance, NVLink Bandwidth

B200 cudaMemcpy Speed Boost Reported: A member reported achieving 726GB/s for B200 multi-GPU cudaMemcpy, exceeding another’s result of just over 670GB/s.
- The original poster expressed curiosity about the discrepancy and inquired about the code used to achieve the higher bandwidth, noting difficulty replicating the results with NCCL tests.
TMA Performance Disappoints: A user found TMA not significantly faster than load/store operations and only slightly slower than memcpy, and speculates the use of 32-byte PTX instructions (introduced in 12.9) might be a factor.
- They noted that NCCL code uses back-to-back loads on the same thread, a technique they haven’t properly compared yet.
Public Experiments on NVLink bandwidth shared: A user shared a link to their public experiments on NVLink bandwidth and results using copy engines.
- They also added that the bandwidth performance depends on the specific machine configurations used for testing.

GPU MODE ▷ #submissions (72 messages🔥🔥):

amd-all2all Leaderboard, MI300x8 Performance, amd-ag-gemm Leaderboard, amd-gemm-rs Leaderboard, gau-nernst and Kernel GM

AMD All2All leaderboards: Multiple submissions were made to the amd-all2all leaderboard, with several achieving first place on MI300x8, including times of 445 µs and 439 µs.
- Other notable times include a personal best of 90.3 ms and several successful submissions around 1100-1200 µs.
AG-GEMM AMD Leaderboard Dominated: Many submissions were made to the amd-ag-gemm leaderboard, with times ranging from approximately 500 µs to 800 µs on MI300x8.
- One submission achieved 4th place with 499 µs, while another had a time of 59.1 ms.
GEMM-RS AMD Leaderboard: Several submissions were made to the amd-gemm-rs leaderboard on MI300x8, with times around 570-590 µs.
- One submission achieved a time of 1289 µs.
GAU-Nernst scares Kernel GM: A member commented that gau-nernst with a Kernel GM role is a scary thing.
- No further explanation was provided.

GPU MODE ▷ #hardware (38 messages🔥):

homelabs builds, b200s, A100 32G SXM2, cursed builds, GPU direct storage

Craving Crazy Homelab Builds: A member expressed disappointment that the channel wasn’t full of crazy homelab builds <:wahh:1404542087860584570>, while another mentioned building a rig and considering a livestream.
- Another member chimed in that they were hoping for brokie hw instead.
Discussing A100 32G SXM2 Node Configurations: Members discussed a 4x A100 32G SXM2 node setup, with one suggesting it might appeal to someone known for “cursed builds.”
- The discussion touched on the fact that these cards don’t have a power limit.
Exploring PCIe Limitations and Expansion Options: The conversation shifted to PCIe lane limitations when attempting to upgrade to 8 GPUs, with suggestions including dual socket server motherboards or PCIe switch magic.
- One member jokingly proposed PCIe switch magic to get 64 GPUs on the system.
GPU Direct Storage: A member asked if it’s possible for a GPU to read data directly from disk storage via P2P, and wondered if it exists for non-Nvidia GPUs, referring to Nvidia’s GPUDirect storage API.
- A link was shared to NIXL’s Backend Guide, mentioning that GPUDirect Storage (GDS) is one of the backends that NIXL uses.

GPU MODE ▷ #tpu (4 messages):

TPU VM, TPU Pods, TPU Slices, TPU Workers

TPU Newbie seeks Guidance: A member followed instructions from the tpu-starter GitHub but is facing configuration issues when creating a TPU VM.
- They are looking for resources to study TPU Pods, TPU Slices, and TPU Workers.
More details about TPU setup: The user attached images of their TPU configurations, indicating they have already read about pods, slices, and workers.
- They are asking if they are missing something in their setup process.

GPU MODE ▷ #factorio-learning-env (9 messages🔥):

FLE 0.3.0 on Mac M1/M2, FLE on M4, Factorio Modding, FLE Sync Meeting

FLE 0.3.0 Works on M-series Macs: A member inquired if FLE 0.3.0 runs on Mac M1/M2 chips and if it works fine.
- Another member confirmed it runs successfully on M4 and suggested trying to install it.
Factorio Modding Expertise Arrives: A member familiar with the Factorio modding environment offered to contribute.
- Another member invited them to a FLE Sync meeting.
FLE Sync Meeting Invite: A member mentioned a FLE Sync meeting and shared a Google Meet link.
- A link to a relevant Twitch clip was also shared.
Release Praised!: A member congratulated the team on the release of FLE 0.3.0.
- They expressed hope for many more releases in the future with rocket emojis.

GPU MODE ▷ #amd-competition (32 messages🔥):

rocshmem issues, all2all kernel errors, amd_gemm_rs timeouts, libtorch usage, gemm-rs optimization

Custom Kernel Conundrums Cause Confusion: A user encountered an ImportError: cannot import name 'custom_kernel' from 'submission' when running the baseline all2all kernel, and was advised that the custom_kernel function, an entry point they should implement, was missing from their code.
- Another member suggested that the cause of the timeout issue may stem from a bug in user’s implementation.
GEMM-RS Gains Ground with Gradual Growth: A user who benchmarked amd_gemm_rs experienced timeouts, but it was confirmed that the setup was healthy, with one user getting successful runs, and optimizable space for gemm-rs might be larger compared to ag-gemm.
- It was suggested improvements will be more noticeable with good computation and communication overlap, and that online claims show gemm-rs overlap improvements of around 1.25x.
LibTorch Limitations Loom Large: A user inquired about using libtorch with cpp_extension and needing a path to it.
- The user followed up that the /dev/shm/ isn’t big enough.
rocSHMEM’s Rocky Road: Resource Restrictions Rile Users: Users reported issues with rocshmem, including a need to link it with rdc, and running into space limitations when allocating buffers, encountering errors after some allocation.
- It was suggested that the limit may be related to allocation time rather than size, and there was a request for increased space allocation for rocshmemcc.

GPU MODE ▷ #cutlass (46 messages🔥):

Vectorization Issues with cute.copy, Layout Visualizers, cute.copy vectorization, TV Layouts, SVG TV visualizer

cute.copy Vectorization Requires Source and Destination Alignment: A user found that tAsA.store(tAgA.load()) vectorized, but cute.copy(tiled_copy_g2s_a, tAgA, tAsA) did not, and that forcing a larger num_bits_per_copy resulted in an ICE, solved by flipping smem layout to vectorize both sides; a member confirmed this that cute.copy requires both source and destination to have an atom_v structure for vectorization.
- The consensus is that when you force it by specifying num_bits_per_copy in the atom creation it fails at IR verification; it’s unable to vectorize at all because it’s using cp.async which is limited.
Layout Visualizer Face-Off: Members discussed layout visualizers, with one member showcasing their SVG TV visualizer, and another member linking to their TV layout visualizer hosted on Hugging Face Spaces.
- The former is based on the C++ implementation, can be called within cute.jit/kernel, and now supports swizzle, copy, and MMA layouts, while the latter is hosted online and considered more convenient by some.
Cute Visualizer Race Heats Up: Following discussion of layout visualizers, a member shared their cute-layout-display, written in Python.
- This sparked humorous comments about a race to create a layout visualizer, and how it seems every CuteDSL beginner needs this feature. The visualizer supports swizzle, copy, and MMA layouts.

GPU MODE ▷ #general (1 messages):

mojo, pip install, Python scripts

Install Mojo Directly from Python Scripts: You can now use pip to install Mojo directly within your Python scripts.
- A code snippet was shared:

import pip
pip.main(['install', 'mojo'])

Using pip to manage Mojo: The messages highlight the possibility of using pip, a Python package installer, to manage Mojo installations.
- This could simplify the installation process for users familiar with Python environments.

GPU MODE ▷ #multi-gpu (53 messages🔥):

Profiling overhead, NVSHMEM and NCCL symmetric memory, Multi-GPU operator optimization, Locking GPU clocks

Profiling Tensor Transfers: Sync or Swamped?: Profiling tensor transfers may disproportionately measure synchronization overhead for smaller tensors, whereas large tensors like (65536, 5120) may reveal straggler effects and synchronization barriers in MoE training runs.
- The suggestion was to use Nsight-Systems (nsys) for accurate timings or record CUDA events before and after transfers while ensuring consistent results via GPU clock locking and warmup runs.
NVSHMEM vs NCCL: Pointer Parley: In NVSHMEM and NCCL, nvshmem_malloc() performs a collective operation that reserves memory off the NVSHMEM symmetric heap and returns a symmetric pointer, which appears the same on each processing element (PE).
- While the numerical value of the pointer won’t be identical across PEs, nvshmem_ptr can efficiently access remote memory without specific access patterns, but requires one process per GPU, unlike NCCL which supports a single process.
Mercury: Multi-GPU’s Operator Optimizer: The Mercury paper introduces a loop-based intermediate representation, CommIR, treating remote GPU memory as an explicitly managed extension of the memory hierarchy for multi-GPU operator compilation.
- This approach enables holistic reasoning about data placement and inter-device communication, achieving an average 1.56x speedup over state-of-the-art hand-optimized designs like USP and Ulysses, and up to 1.62x performance improvement for real LLM workloads compared with model-level 3D-parallel.
GPU Clock Lock-In: Stabilizing Speeds: To ensure consistent profiling results, members recommend locking GPU clocks using nvidia-smi --lock-gpu-clocks to a stable frequency, which could be the base TDP frequency set by nvidia-smi –lock-gpu-clocks=tdp,tdp.
- One should be careful with max frequency, as when the gpu heats up, it will slow down the clock no matter what, thus using tdp as a base keeps it stable during full load.

GPU MODE ▷ #irl-accel-hackathon (1 messages):

Multimodal Data Processing, Hackathon Team Formation

Hackathon participant seeks multimodal processing team: A participant is looking to collaborate on multimodal data processing projects during the hackathon, inviting interested individuals to reach out and plan together.
Hackathon participant asks about team and topic lists: A participant expressed interest in reviewing existing team lists and topics to contribute and propose new ideas.
- They are “keen to have a gander and put something up too.”

GPU MODE ▷ #opencl-vulkan (2 messages):

Khronos Group, OpenCL, Vulkan

Enthusiasm for Khronos-Aligned Channel: A member expressed enthusiasm for the channel’s alignment with the Khronos Group, indicating interest in OpenCL and Vulkan discussions.
- This suggests a focus on industry standards for parallel computing and graphics APIs.
Potential Discussions on OpenCL and Vulkan: The user’s comment implies a desire for in-depth conversations regarding OpenCL and Vulkan.
- These are key technologies for GPU programming and cross-platform development.

GPU MODE ▷ #cluster-management (10 messages🔥):

Node Failures Mitigation, MPI Fault Tolerance, CCL Reconfiguration, AI-Driven Observability

Node Failures Plague Growing GPU Clusters: Members are discussing strategies for mitigating node failures in larger GPU clusters, noting the increasing number of failure points and the importance of reliability and recovery, citing a study mentioning over-provisioning by ~5%.
- The discussion highlights that for a cluster of 1,056 H100s or A100s GPUs, over-provisioning could cost roughly $1M/month.
MPI Aims to Survive Hardware Failures: The MPI forum is working on making MPI programs able to survive hardware failures and re-add nodes at runtime, according to this github.
- It was pointed out that overall there aren’t fantastic solutions to the problem without a lot of fiddling, and that NCCL and RCCL may take notes from MPI FTWG.
Checkpoint and Restart Remain Go-To Recovery: Members agreed that most CCLs today still assume a fixed set of nodes, so checkpoint and restart is still the go-to recovery path.
- It was mentioned that in the near term, the pragmatic approach is to invest in automation and smarter checkpointing for faster recovery until fault-tolerant stacks like FT-MPI or PCCL can handle dynamic node changes in production.
AI Eyes Proactive, Observability-Driven Resilience: The group has recently started exploring agentic systems embedded within clusters capable of observing logs, metrics, and telemetry in real time.
- The idea is to shift from reactive recovery to proactive, AI-driven observability and resilience, and autonomously rebalancing load to maintain cluster health.

GPU MODE ▷ #penny (3 messages):

Cloud Providers, Vast AI limitations, nvshmem, ncu/nsys access, nvm

Brainstorming Cloud Options for Project: A member is seeking recommendations for suitable cloud providers to support their project, with initial trials planned for AWS and a provider called nvm.
- They expressed dissatisfaction with Vast AI due to the absence of nvshmem and ncu/nsys access, crucial for their project’s requirements.
Nvshmem and Nsys/NCU Access missing on VastAI: A user reported that Vast AI lacks essential features like nvshmem, ncu, and nsys access, hindering their project’s progress.
- The user mentioned they would explore AWS and another provider called NVM as potential alternatives to overcome these limitations.

HuggingFace ▷ #general (312 messages🔥🔥):

Image-to-text ONNX conversion, Large-scale conversation datasets, Model drift, Uncensored AI roleplay models, Hugging Face Pro subscription

ONNX Community to the Rescue on Image-to-Text Conversions: Converting VLMs and multimodal models to ONNX can be tricky, so the suggestion was to ask the ONNX Community for help and guidance.
- A link to a relevant channel in the HF Discord was also shared with conversion tips.
Scoring High-Quality Conversation Datasets: In the hunt for large-scale, high-quality 3+ participant conversation data, it was suggested to explore existing projects on Hugging Face.
- Also it may fall under the scope of Hugging Science.
Data Drift Decoded and Demystified: A member shared a link to an article to learn more on Model vs Data Drift.
- The article was read as an educational piece regarding how drift happens, and what can be done to mitigate or reduce drift, such as continually updating the LLM with relevant information to stay relevant.
OVH Hosting Hurdles for Vision-to-Text Models: A member is having hosting issues with OVH and needs advice when trying to use a vision-to-text model for about 200 word descriptions per image, with about 100 different images per time.
- The member was suggested to use OVH and if deploying gguf files with Ollama
Decoding Deepseek Downloads: A member was having issues finding the proper files to use for Deepseek Models and was advised that there is a 100% chance that they don’t have the sheer memory for any deepseek-v3 model.
- It was explained that for local hosting, people usually run GGUF and can be found under the quantizations.

HuggingFace ▷ #i-made-this (5 messages):

Neovim RAG Chatbot, PromptVault Android App with Gemma 1B IT, BERT Model for AI Text Detection, Art Video

Vim-proved Semantic Search Rolls into Neovim: A member built a RAG chatbot for Neovim help documentations using Claude-4.5 Sonnet, aiming to provide better semantic search for Neovim docs.
- The chatbot is a side project that the author is happy with, despite its basic nature.
PromptVault App Powers Offline Prompts via Gemma: A member announced that their free Android app, PromptVault, now runs on-device AI using the Gemma 1B IT model through MediaPipe Tasks for offline prompt generation and refinement.
- The app features local and Google Drive (encrypted) backups, offline mode, and experimental AI actions for writing titles, descriptions, and prompt text.
BERT Cracks Down on AI-Generated Essays: A member trained a BERT model to detect AI-generated texts from human-written ones, using a dataset from Kaggle.
- The member is looking for ideas to speed up training on bigger datasets, as the current dataset is not the greatest.
AI Art Sparks GigaChad Reactions: A member shared a video and requested viewers to react with a <:GIGACHAD:1159748479535558718> if the video is considered art.
- The video was attached as a download.mov.

HuggingFace ▷ #computer-vision (1 messages):

Wan2.2 tuning, LoRA tuning for high noise models, Tuning high and low noise models, Single component model tuning

Users seek Wan2.2 tuning tips: A user is tuning Wan2.2 using LoRA for a high noise model but got a weird result after 5 epochs with 10 dataset repetitions and 77 videos with 81 frames per video.
- The user describes the videos as rubbish, with a single item visible moving across the frame and seeks help from the community.
Tuning Both High and Low Noise Models: Simultaneous or Separate?: A user asked if they should tune both high and low noise models, and if so, whether they should tune them at once (e.g., a whole cycle).
- They also inquired whether they could tune a single component of the model (e.g., only the high noise model) using the same low noise model.

HuggingFace ▷ #smol-course (20 messages🔥):

Leaderboard Updates, TrackIO issues with DPOTrainer, Private Datasets on Leaderboard, LoRA SFT with TRL + SmolLM3 issues

Leaderboard Refreshed Manually: The leaderboard for the course has been manually updated, link here.
- Members asked about the frequency of updates, and it was clarified that it is a manual process.
TrackIO Package Troubles: One of the members encountered issues with trackio in the DPOTrainer/DPOConfig when following the examples and shared a link to the related issue.
- Pegging trackio to version 0.4.0 resolved the problem, suggesting that version 0.5.1 is broken.
Public Datasets Required for Leaderboard: It was noted that some students submitted to the space but left their evaluation datasets private.
- A warning was issued that the leaderboard won’t work without a public dataset, with a request to check PRs and a link to the leaderboard discussions.
LoRA SFT Troubleshoot: A member reported encountering a TypeError related to an unexpected keyword argument dataset_kwargs when using LoRA SFT with TRL + SmolLM3 in colab.
- No solution was found during the discussion.

HuggingFace ▷ #agents-course (17 messages🔥):

GAIA errors, DuckDuckGoSearchTool errors, Smolagents costs, Duplicating vs Cloning

GAIA Space Cloning Causes 500 Error: During a GAIA exercise, a user encountered a 500 error when attempting to clone a space, and was advised to duplicate the space instead.
- Another member confirmed this approach, stating “I think you should duplicate the space instead of cloning it. I think I did the same”.
Tool error stops DuckDuckGoing: A new student, starting the AI Agents course, encountered an error while adding DuckDuckGoSearchTool to the agent’s tools, shared in an attached image.
Smolagents Costs: Is it Free?: A user inquired about the costs associated with using smolagents, wondering if it necessitates paid AI services.
- They recalled a past experience where they had to pay Hugging Face for something to continue the course, but couldn’t remember the specifics.
Duplicating Spaces Preferred Over Cloning: A user asked why duplicating a space is better than cloning it locally.
- The response indicated the course specifically instructed users to duplicate spaces instead of cloning, but didn’t specify why.

Modular (Mojo 🔥) ▷ #general (206 messages🔥🔥):

GPU Puzzles Feedback, Mojo vs CUDA, Mojo and pthreads, Tinker as a Threat to Mojo, MAX and Pytorch

Mojo GPU Puzzles Feedback: A member inquired about the appropriate channel for submitting feedback and corrections regarding the GPU puzzles book.
- Another member provided a GitHub link for the Mojo GPU Puzzles repository.
Mojo’s Compute Capabilities vs CUDA: A member inquired whether Mojo could replicate all functionalities available in CUDA.
- Another member responded that while Mojo currently lacks optimal ways to interact with graphics hardware on the GPU, virtually any computational task should be achievable; feature requests are welcome for any unmet needs.
Threading with C Libraries in Mojo: A member inquired about the possibility of utilizing C libraries like pthreads within Mojo.
- A member responded that while C libraries generally integrate well, pthreads may not be ideal due to Mojo’s runtime environment and its impact on standard library functions, noting that Mojo’s current concurrency model is incomplete.
Tinker’s Potential Threat to Mojo’s Training: A member suggested that Tinker could pose a competitive challenge to Mojo, particularly if it restricts access to fine-tuned weights, limiting model customization options.
- A member responded that training is less of a priority for Modular because inference is where all of the money is made with AI.
MAX: A contender in the inference realm: A member noted that MAX could potentially replace TensorFlow and PyTorch, inquiring about its current parity and open-source status.
- Another member responded that MAX is near or ahead in inference performance, but it is only partially open source, with plans to open up further early next year.

Modular (Mojo 🔥) ▷ #mojo (80 messages🔥🔥):

Pixi vs UV, Mojo Notebook, Python type hints, Mojo const generics, Mojo debugging

Pixi to the Rescue: Access Denied No More!: A user reported an Access denied error with Pixi, but the issue was resolved by switching to the max-nightly channel instead of nightly.
- Pixi is like a combination of uv and a not slow conda implementation.
Run Mojo inside Jupyter notebook: Mojo can be used in Jupyter notebooks using the %%mojo magic command, as demonstrated in this forum post, but the import path is now mojo.notebook.
- Note that Mojo syntax highlighting is not yet available in Jupyter cells.
Rust-Style Traits Coming to Mojo: Mojo aims to offer a more powerful type system than Python, with Rust-style traits, where clauses, powerful const generics, and variadic generics planned or already implemented.
- Additionally, Mojo has linear types, taking inspiration from Zig for a static reflection API, with the end goal of approaching Dependent Haskell or Idris in type system capability.
Mojo debugging unavailable with UV install: Debugging a Mojo file using the ‘mojoFile’ option is not supported when the Mojo SDK is installed as a wheel.
- Users report it’s still not debugging even with pixi following the docs and using a standard processor like apple M3.
Vectorize function details: The vectorize function in Mojo doesn’t care about data layout; it instructs the user to load the next width elements from offset i and handles the drain loop.
- For efficient processing with vectorize, it’s recommended to reorganize data into a SoA (Structure of Arrays) format rather than using List[ComplexSIMD].

Modular (Mojo 🔥) ▷ #max (3 messages):

Max Hardware Support, vLLM SGLang, OAI endpoint outside Max docker

Max Hardware Support Tiered: There’s a tiered list of current GPU compatibility available at Modular Docs.
- While you’ll be able to test out GPU functionality on your 20XX GPU, full models may not work there; they will on H100.
Max might beat vLLM/SGLang on throughput: Depending on the model, Max may show advantages vs vLLM or SGLang on throughput on H100s.
- Follow this benchmarking guide to test yourself.
OAI Endpoint Exposable Outside Docker: The OpenAI endpoint should be exposed outside of the Docker container.

Nous Research AI ▷ #general (255 messages🔥🔥):

Sora 2 Video Generation, Prompt Rewriting by LLMs, HP G1a R-AI-Max+ Pro 395 Performance, LPDDR5X vs DDR5, Qwen VL Model quirks and regressions

Sora 2 Generates Anime but Falls into IP Jail: Members found Sora 2 video generation model’s anime output surprisingly good, but note the model rewrites prompts and began outright banning copyrighted content overnight, effectively crippling its creative potential according to one tweet.
- The Sora experience has been described as a “speedrunning enshittification” and many expressed their dissapointment: ”Pretty much permanently gimped, and at that point. Why even have it exist?”.
Ryzen AI Mini PCs Rock LPDDR5X: The HP G1a R-AI-Max+ Pro 395 w/ 128GB LPDDR5X was obtained for AI testing, featuring soldered onboard, non-replaceable LPDDR5X which is generally much faster and lower power than DDR5.
- The LPDDR5X allows the DGX Spark and popular Ryzen AI Mini PCs to achieve 8000 Mhz ram speed and a 250GB/s+ bus speed, according to this Samsung Semiconductor link.
Qwen VL Shows Weird Regressions and Clock Inaccuracy: Members discussed the quirk of Qwen 2.5 VL, where the smaller model sometimes outperforms the larger one on vision tasks, but suffers from the lost-in-the-middle phenomenon when given text.
- In one clock reading test, it was shown to be missing two clocks outright, and mostly wrong, whereas a local vllm run got 3/5 correct and mostly correct except for flipped digits.
Multi-Step Transformer Loss Tuning: A member described their experiment using multi-step transformer loss, by choosing the hidden vectors from a select number of the middle layers of the transformer and appending them to the input vectors to do another forward pass.
- They noted it was done on a really dumb gemma 3 270 mit model, but that cli agents opening up access to experiment with training and tinkering with algorithms is actually crazy.

Nous Research AI ▷ #research-papers (1 messages):

Dragon Hatchling Paper, Transformer Brain Link

Dragon Hatchling: Transformer’s Brainy Missing Link?: A member shared a link to a paper titled “The Dragon Hatchling: The Missing Link Between the Transformer and Models of the Brain” (ArXiv:2509.26507).
- The paper explores potential connections between transformer architecture and brain models.
Further Brain Model Exploration Requested: To meet the minimum item count, consider additional research or discussion points related to brain models and transformer architectures.
- For example, detail any community feedback or alternative interpretations of the Dragon Hatchling paper if available.

Nous Research AI ▷ #interesting-links (2 messages):

Brazilian Miku Nous girl

Brazilian Miku Nous Girl shared: A member shared a link from X: Brazilian Miku Nous girl.
- The post was titled ee.dd.
Another Brazilian Miku Nous Girl shared: A different member also shared a link from X: Brazilian Miku Nous girl.
- This second post had a title with a variation of ee.dd.

Nous Research AI ▷ #research-papers (1 messages):

Dragon Hatchling Paper, Transformer Brain Link

Dragon Hatchling Paper Hatches: A member shared a link to a new paper titled “THE DRAGON HATCHLING: THE MISSING LINK BETWEEN THE TRANSFORMER AND MODELS OF THE BRAIN” (Arxiv link).
Dragon Hatchling: Bridging Transformers and Brain Models: The paper, titled “THE DRAGON HATCHLING,” explores potential connections between Transformers and models of the brain (Arxiv link).

Latent Space ▷ #ai-general-chat (184 messages🔥🔥):

DeepSeek's CUDA Grip, Sora 2 Premiere, Claude vs Human Teams, Goodfire's Lessons, Kilo Code's Views

DeepSeek cracks China’s CUDA lock-in: DeepSeek’s FP8 spec and TileLang language are rallying Chinese chip stocks, aiming to loosen Nvidia’s CUDA lock-in via shared standards and an easy programming bridge; it’s a strategic alignment, but performance gaps remain, more on X.
Sora 2 Debuts its Duck Short: OpenAI premiered a 30-second AI-generated short titled The Quack: Part 1 using Sora 2, sparking awe, excitement, and duck-cameo memes with the poster and invite code (FRIYAY), linked on X.
Agentic Browsers are Browser Battleground: Agentic browsers are a new battlefield for AI companies to compete, with the new Claude one demoed on Last Friday’s AIIA call, and early access user highlighting it as pretty capable in some use cases.
Kilo Code’s Views Questioned: Discord debates Nick Baumann’s claim of 9.5 M organic views for a Kilo Code promo, arguing the metrics scream paid ads/bots, and citing RooCode’s lower numbers, view the original X post.
GPT-5 Swaps Stir Revolt: ChatGPT now auto-routes sensitive convos to a cheaper GPT-5 under distress support guise; users erupt over gaslighting, model switches, and mislabeling, launching #Keep4o.

Latent Space ▷ #genmedia-creative-ai (15 messages🔥):

Pika's Swift Takeover, AI Horse Riding Astronaut, OpenAI's Medal acquisition fails, General Intuition

Pika’s AI Dreams Derailed by Giants?: Chongz Luong observes that Pika, once a hyped startup, has been rapidly outpaced by Google, Meta, and OpenAI with their advanced AI video models like Veo 3, Vibes, and Sora 2 (xcancel.com).
- The discussion highlights how financial power drives AI dominance and predicts Pika may be the first of many hyped startups to lose relevance as the industry shifts focus from models to tools.
Sora’s Astronaut Horseback Riding Revolution: Pliny celebrates Sora 2 Pro’s leap from last-year’s impossible still-image prompt to an impressive low-gravity video: a realistic horse piggybacking on an astronaut with improvised, spontaneously generated mission-control humor (xcancel.com).
- Replies discuss rapid gen-AI progress, benchmarks, and inside jokes.
OpenAI’s $500M Medal Bid Fails!: OpenAI reportedly offered $500 million last year to buy Medal, a gamer-video platform, to harvest footage for training models (xcancel.com).
- The deal collapsed, and Medal is instead launching an in-house AI division—General Intuition—now closing a $100 million funding round.

Yannick Kilcher ▷ #general (77 messages🔥🔥):

AI Sex Robots, Unitree R1 Robot, Discord Data Breach, Safety Benchmark, Automating Scientific Method

Robo-Romance: AI Sex Robots Loom?: Users discussed when AI sex robots will be less awkward, coinciding with the introduction of the Unitree R1, a clumsy but affordable $6k robot (X link).
- The conversation coincided with the UK’s age verification requirements and some users are already anticipating singularity with robot wives.
Discord Disaster: Customer Service Data Breach Unveiled: A user shared a link to The Verge reporting a Discord customer service data breach (The Verge).
- This comes as some members discussed safety benchmark by Dan Chollet (Tweet 1, Tweet 2) and NIST evaluation on DeepSeek AI models finding shortcomings and risks (NIST report).
Matrix Math Mandate: Deep Learning Demands Linear Algebra: A user inquired about the necessity of learning complex matrix math to get started with AI.
- Members advised learning linear algebra, multivariable calculus, and intro probability, suggesting resources like community college courses, YouTube videos, and lecture notes from Stanford and Berkeley.
AGI Antagonism: Is General Intelligence Overhyped?: A user questioned the obsession with AGI, suggesting that domain-specific ML systems will transform the world more effectively.
- Others debated the definition of AGI, with one user defining it as being able to complete any task given to an expert in any field, while some link it to The Singularity (wikipedia).
MoGs Mania: Mixture of Gaussians as AGI Metric: A member shared theoretical guarantees that large foundation models can implicitly learn Mixture of Gaussians (MoGs), referencing multiple papers (arxiv 1, arxiv 2, arxiv 3, aclanthology).
- They propose defining AGI as a large foundation model’s ability to perfectly emulate a data distribution described by MoGs with K components.

Yannick Kilcher ▷ #paper-discussion (15 messages🔥):

Low Rank Gradients paper, Paper Exploration session, Peer pressure and beer

New Paper Sparks Interest in Low-Rank Gradients: A member shared a new paper, Low Rank Gradients, prompting interest and a potential exploration session.
- Another member noted its focus on heavy LA (Linear Algebra).
User is pressured to explore paper: One member asked another if they would want to lead an exploration session of the paper, Low Rank Gradients.
- Another joked that they like to recruit a specific member with a black cat picture when there is complex math involved, as he is an excellent co-pilot.
Cat pictures: One member joked, cracked dudes wear cat pictures for some reason, along with a cat video.

Yannick Kilcher ▷ #agents (2 messages):

Diffusion Problem, Conditioning Signal, Guidance Weight

Diffusion Models Need Stronger Signals: A member suggested framing the problem as a diffusion process, noting that the conditioning signal seems under-weighted.
- They recommended increasing the conditioning/guidance weight to ensure the output adheres more closely to the storyboard snow background.
Background Underfitting Confirmed: Another member agreed that the model is underfitting the background, validating the need for a stronger conditioning signal.
- The original poster admitted that they did not know how to impose that, given whatever agent may be using.

Yannick Kilcher ▷ #ml-news (9 messages🔥):

LLM Reasoning, Tree-of-Thought, Gemini 2.5 Deep Think, GPT-5 Math Claims

LLM Reasoning’s Naive Assumptions Debated: Members are debating whether LLMs truly use reasoning tokens as naively assumed, noting that “reasoning” often seems like a post-hoc justification despite improving performance.
- It was noted that issues such as overthinking simple tasks indicate that current reasoning may be more performative than genuine.
Multi-Model Reasoning Explored: A member suggested exploring whether multiple smaller models of different origins could reason together, with each model contributing a step towards a solution.
- This approach aims to encourage semantically useful reasoning steps without relying on inter-token weirdness; however, another member noted that is may be too expensive to be worthwhile.
Tree-of-Thought Variant Proposed: A member suggested that the multi-model reasoning concept would likely be a variant of Tree-of-Thought (arxiv link), potentially using a semi-agentic pipeline where steps are routed to expert models.
- It was noted that while less cost-effective than a larger model, it could scale performance beyond standard prompting; Google’s “Deep Think” and Grok’s “Super Think” may use similar approaches such as this blogpost.
GPT-5’s Math Prowess Claimed: Claims of GPT-5 helping solve math problems from mathematicians are presented, with links to Tweet 1 and Tweet 2.
- A member noted that the first tweet was posted in August 1st, 2025, implying that it is a futuristic claim.

Eleuther ▷ #general (68 messages🔥🔥):

Diffusion Model Evaluation, Gemma Architecture Adoption, Synaptik Core for AI Memory, COLM Eleuther Meetup, AO3 Story Subset

Diffusion Models Evaluated by Humans: Members discussed evaluation of diffusion models, including FID/CLIPScore, manual human evaluation, and automated metrics like FVD for video.
- One member stated that Playing around with Sora 2 got me feeling curious on the video side, since evaluation feels even more primitive.
Gemma’s Architecture Isn’t as hot as Qwen’s: Members discussed why Gemma’s architecture isn’t used as much as Qwen’s, despite its strong performance on the LM Arena.
- A member suggested that architecture isn’t the major driver of LLM performance and that training data and finetuning distribution are more important.
Synaptik Core Boosts AI Trust: Janay from Synaptik Core introduced Synaptik Core, a toolchain for verifiable, long-term memory and auditability in AI systems.
- She shared a LinkedIn post demonstrating AI agents and another one describing her sprint just before the OpenAI Open Model Hackathon (link).
Eleuther Tacos at COLM: Members mentioned a meetup at COLM (Conference on Language and Model) and a taco social hosted by Featherless AI (Luma link).
- They hoped to meet for coffee or beers.
Crafting an AO3 Subset for Simpler Learning: Members discussed creating an AO3 (Archive of Our Own) story subset with a simpler grammar structure for easier learning, similar to TinyStories.
- They considered using a readability score to filter the data, though concerns were raised about removing noise.

Eleuther ▷ #research (7 messages):

Manning and Csordas, top-k attention, MoE, BabyLM, hopfield-like optimizer

Top-K Attention Extends MoE Architectures: A member noted that top-k attention feels like a natural extension of what we saw in MLP with MoE and top-k routers.
- He also expressed that the motivation is to do UTs without doing recursion; questioning how to ensure that clustering allows for the flow of gradient and doesn’t cause discontinuities in the computational graph of the model if someone wanted to use it for training.
BabyLM For Efficient Cognitively Plausible Models: A member stated that in the future it is highly probable that what are considered LLMs will become “smaller” again through various techniques, so why not look at cognitively plausible models which are (supposedly) more efficient at the first place?
- In this regard initiatives such as BabyLM is highly appreciated.
Hopfield-Like Optimizer Buffers Momentum: A member suggested an optimizer with N momentum buffers combined for the final update step, and there was a hopfield (or hopfield-like) update to each buffer from an incoming vector.
- They suggested it opens the avenue to do a ton of cool things with approximate top eigenvectors of the update matrix.

Eleuther ▷ #scaling-laws (8 messages🔥):

Block Scaling Factors vs Per-Float, Attention as Inductive Bias, nanoGPT Speedrun

Block Scaling Beats Per-Float Scaling?: Rain AI’s thesis that block scaling factors are superior to per-float scaling factors is seemingly validated by mxfp8, but this also negates any competitive advantage they might have had, given Nvidia’s prior product release.
- A member inquired whether attention’s effectiveness as an inductive bias is merely coincidental or due to Noam Shazeer’s foresight.
Attention: More Than Just GPU Convenience?: A member suggested Dzmitry Bahdanau as another key figure in attention mechanisms, linking to Andrej Karpathy’s post acknowledging that attention was already “in the air.”
- Others suggest it is convenient for training with GPUs.
nanoGPT Speedrun Times Dropping: A member shared a LessWrong post noting that the nanoGPT speedrun world record dropped by 20% in 3 months, implying progress even at smaller scales.

MCP Contributors (Official) ▷ #general (48 messages🔥):

GitHub Team Management Migration, MCP Tools Versioning, Cloudflare's Code Mode, OpenAI's AppsSDK

GitHub Teams migrate to Infrastructure-As-Code: The team migrated GitHub team management to infrastructure-as-code, managing memberships and repository permissions via code at modelcontextprotocol/access.
- The migration aims for community ownership, transparency, an audit trail, and AI-friendly access management, with brief access interruptions expected during deployment.
Intuit Engineer Investigates MCP Tool Versioning Challenges: An Intuit engineer is facing challenges with versioning MCP Tools in MCP Servers, especially with dependency management and compatibility at scale, and is seeking collaborators.
- They’ve drafted a SEP with a potential solution, available at modelcontextprotocol/modelcontextprotocol#1575, and are looking for feedback.
Is Cloudflare’s Code Mode Over-Engineering MCP?: Cloudflare’s Code Mode was discussed, with some suggesting it misunderstands MCP or over-engineers a tool call into a request to a Cloudflare worker as per this blogpost.
- The ability of Code Mode to reduce the number of turns an agent needs to deliver a result was noted, but there were concerns about performance and whether it is better than simply using web APIs or client SDKs. Some interesting experimentation is happening based on this prototype.
OpenAI AppsSDK Sparks MCP-UI Overlap Debate: OpenAI released their AppsSDK, bringing UI in ChatGPT with MCP, with TheFork as a launch partner, as per their announcement.
- Members are wondering if engaging with MCP-UI would have been a better move. OpenAI intends to engage to ensure they feel natural together and will fully support transactions from apps using ACP.

MCP Contributors (Official) ▷ #general-wg (13 messages🔥):

Feature Support Matrix clarification, Server capabilities in initialization request, Icons metadata in servers

MCP Feature Support Matrix Discovered: A member inquired about the meaning of “Discovery” in the Feature Support Matrix of the Model Context Protocol.
- It appears that discovery refers to server capabilities and the ability to communicate tool changes.
Server Capabilities Sent Unexpectedly: One member mentioned calling out Cursor in a talk, noting that Cursor sends server capabilities in the initialization request.
- Another member acknowledged this as unexpected but harmless, with server capabilities highlighted in yellow in an attached image.
Icon Metadata Use Case Explained: A member sought clarification on the use case for adding icons metadata to servers, noting their presence in other server primitives.
- Another member explained that icons provide a visual representation for applications, such as in a tool list or inline in chat when executing a tool.

Moonshot AI (Kimi K-2) ▷ #general-chat (47 messages🔥):

Kimi-latest vs Kimi-k2, K1.5 is proprietary, em dashes usage, translation, Sam Altman introducing OK computer

Kimi-latest is not Kimi-K2: “Kimi-latest” is an alias that always points to whatever closed, production kimi large model powers the Kimi assistant, while “Kimi-K2” is the separate open-weights moe family (e.g., k2-0905).
- The “proprietary llm” line on moonshot.ai refers to that closed stack, not to k2, even if the UI also prominently features K2.
Some ignore messages with em dashes: A user wondered whether others immediately ignore messages with em dashes in it outside of talking to an AI.
- Another replied that they have used them their entire life and have to curb their instinct to use them so people don’t think they are a bot, attaching an image macro showing love for Kimi.
Kimi’s translation Censoring Issues: A member said that sometimes Kimi’s censoring causes it to delete everything it wrote and replace it with ‘sorry i can’t provide help with this’.
- They suggested using Qwen for translations as it has a million context and is better at that.
Shareholder maximizes for fun and profit: One user made it clear that they have over one hundred USD of investment in a fund which has 8% of Alibaba stocks by weight which has ~35% stake in Moonshot.
- The member claimed that what’s the point of human life if not maximizing the shareholder’s value, which others found amusing.

aider (Paul Gauthier) ▷ #general (34 messages🔥):

Deepseek Browser testing with Claude, Aider vs. Opencode, Agentic tools and ripgrep, Aider and GPT-5 Codex, Aider with Emacs

Deepseek tests Browser with Claude: A member shared a blog post about Deepseek browser testing with Claude CLI and Chrome DevTools MCP.
- He also uses Deepseek via their Anthropic API in Claude Code and Opencode because it performs better on tool tasks.
Aider needs Manual Controls: A member expressed that the feature they miss most from Aider is the manual controls on context such as /tokens, /add, /remove, and /clear.
- They feel this is badly needed in all other tools and none of them have implemented it yet, arguing that for large codebases Aider doesn’t stand a chance against these.
Aider needs Agentic Grep: Members discussed how agentic tools use regex grep to find the parts they need and then do “views” on the surrounding lines, and that Aider is missing a ripgrep agentic handler.
- Another member agreed, saying that agentic grep will really help make Aider competitive with the current gen of tools.
Aider supports GPT-5 Codex: A member inquired if there has been a patch to make Aider work with GPT-5-Codex.
- Another member responded that support has landed in main.
Aider integrates with Emacs: A member stated that personally they have always felt that for codebase/task that they understand well, Aider is great as it allows them to do exactly what they want and it’s also fast since they essentially give it the context it needs.
- A bonus is it integrates well with Emacs.

aider (Paul Gauthier) ▷ #questions-and-tips (4 messages):

aider prompt cache, image analysis

Aider Prompt Cache activated by default: Aider’s models.py now sets cache_control: bool = True and caches_by_default: bool = True for the provider Z.aialso, adding “prompt cache” to the greeting message.
- The new greeting message will be similar to: “Main model: openrouter/deepseek/deepseek-v3.2-exp with diff edit format, 8k think tokens, prompt cache”.
User Needs Help with Image Analysis: A user shared a message hello, sorry i need some help… and attached an image to ask Am i doing something wrong?.
- The attached image is probably related to the problem that the user needs help with.

aider (Paul Gauthier) ▷ #links (1 messages):

Open vs Closed weights, Weight prices and ownership, Aider tool updates

Open vs Closed Weights: Price Concerns Emerge: Discussion pivots to comparing open-source versus closed-source AI model weights, highlighting concerns that owners of closed weights could inflate prices at will.
Aider’s Tooling: Staying Updated: Ongoing discussions focus on keeping up with the latest updates and features of the Aider tool, ensuring users leverage its full potential.

DSPy ▷ #show-and-tell (1 messages):

Neosantara AI, LLM Gateway Platform, Free AI Apps, DSPy integration, Free consume tokens

Neosantara AI Launches LLM Gateway: Neosantara AI launched a new LLM Gateway Platform for building AI apps, available for free.
DSPy Integration comes to Neosantara AI: Users can integrate Neosantara AI with DSPy, with documentation available here.
Sign Up for Free Monthly Tokens: New users get 10k monthly consume tokens upon signing up.
- Users can provide feedback via email at [email protected].

DSPy ▷ #general (26 messages🔥):

DSPy Roadmap, ReAct Trajectories, Fallback Behavior, DSPy and Feature Engineering, BAMLAdapter

DSPy Roadmap: Beyond Issues and Changelogs?: A member inquired about a DSPy roadmap beyond the GitHub issues and changelog for accepted enhancements and new releases, with links to recent DSPy mentions on X/Twitter and DSPy’s official account.
- Another member shared a link to Drew Houston’s post on using DSPy, as well as to Huyen Chip’s blog post on React+Reflection.
ReAct Trajectories: Disk vs. Memory?: A member asked about maintaining ReAct trajectories on disk versus memory objects for longer agent steps and another recommended Elysia by Weaviate, a DSPy project with a Decision Tree concept.
- Another member is thinking about implementing React+Reflection as a way to reflect about the entire trajectory of the react part.
Hardcoded Fallback Frustrations Fixed?: A member inquired about changing the fallback behavior in DSPy, which was confirmed to be a hardcoded fallback at the moment.
DSPy-ReAct-Machina Arrives: A member announced the release of DSPy-ReAct-Machina, an alternative ReAct implementation for DSPy which enables multi-turn conversations by maintaining a single, ever-growing context history, and published it on PyPI.
- They also shared a blog post explaining the motivation and architecture and invited feedback.

tinygrad (George Hotz) ▷ #general (15 messages🔥):

tinygrad hiring process, tinygrad bounties, tinygrad meeting #91, tinygrad nir backend, tinygrad pattern matcher

tinygrad devs must pass the bounty gauntlet: A user inquired about direct hiring opportunities, but they were informed that tinygrad’s hiring process revolves around solving bounties first.
- There are no personal interviews or direct hiring.
tinygrad board meeting next monday: The agenda for tinygrad meeting #91 includes company updates, symbolic matters, rangeify bugs, speed improvements, cleanup, and bounty discussions.
- It will be held Monday at 6am San Diego time, or 9pm Hong Kong time.
nir backend is up for grabs: The NIR backend is now ready for review (PR #12089).
- No other info was shared.
Considering match statements for pattern matcher: The discussion explored using match statements in compiling the pattern matcher, instead of repeated if statements in the rendered code.
- The team agreed to try this approach.

tinygrad (George Hotz) ▷ #learn-tinygrad (8 messages🔥):

tinygrad C++ port, tinygrad ONNX frontend, 3dgs repo

3dgs Repo Eyeing tinygrad Port for Efficiency: The maintainer of a 3dgs repo (LichtFeld-Studio) is planning to remove libtorch and is considering porting tinygrad to C++ for inference and CUDA support.
- A member suggested using the maintained python tinygrad to compile the model to CUDA or C kernels and then export C code with those kernels, linking to an EfficientNet C example.
Tinygrad’s ONNX Frontend Could Be the Answer: A member suggested using tinygrad’s ONNX frontend with available ONNX versions of the models for a quick solution.
- Another member confirmed that running inference should be no problem.
Porting tinygrad to C++: The maintainer said that they don’t mind the effort of porting, because Tinygrad would give them optimizations on the fused kernels instead writing them themself for every case.
- The maintainer mentioned that an alternative approach would be including a python interpreter and do the tensor heavy operations in python, similar to blender.

Manus.im Discord ▷ #general (6 messages):

MCP Server for lead generation, Manus iOS client crashing, AI tools lacking personal context

Malware MCP Server Deployed!: An MCP server for wholesale lead generation was deployed using wrangler/Cloudflare to scrub a specific government site for undervalued properties and motivated sellers, then break down analytics for an end-buy proposal, located at wholesale-lead-generator-frontend.pages.dev.
- One user jokingly called it “grab the malware before it gets deleted”.
Manus iOS Client Crashing Fixed!: A bug was reported in the Manus iOS client that caused a 100% freeze/crash when selecting text for input in the scheduled task interface and an engineer offered two solutions.
- The first solution was to “Write your command or text in a separate app, like Apple Notes. Copy the text. In the Manus app, tap the input field just once to place the cursor, and then paste the text. Avoid selecting or editing the text within that field.”, the second solution was to use Apple’s built-in Shortcuts app and keyboard’s built-in clipboard manager.
Solution found for AI Tools Lacking Personal Context: A member is exploring a lightweight solution for AI tools that lack personal context and has specific goals.
- The goals were “connects to the data you choose, targets knowledge gaps with questions, understands your objectives and where you are right now, slips that context into whatever tool you’re using”, and is looking for Bay Area collaborators and early users.

MLOps @Chipro ▷ #events (1 messages):

Feature Store Summit, Real-Time Feature Engineering, Vector Databases & Generative AI, Batch & Real-Time workflows

Feature Store Summit 5th Edition Announced: The Feature Store Summit, an annual online event, will host technical speakers from companies like Uber, Pinterest, Zalando, Lyft, and Coinbase.
- The summit focuses on infrastructure for AI, ML, and applications requiring massive scale and real-time capabilities.
Summits Talks will Feature Real-Time Engineering at Scale: The summit’s talks will explore real-time feature engineering at scale, vector databases and generative AI in production, and the balance of batch and real-time workflows.
- Attendees can also expect discussions on emerging trends driving the evolution of Feature Stores in 2025.
Feature Store Summit Date Announced: The summit is scheduled for October 14th, starting at 8:30 AM PT (5:30 PM CET).
- Interested individuals can register via the provided link.