Claude is all you need.

AI News for 9/26/2025-9/29/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (196 channels, and 15992 messages) for you. Estimated reading time saved (at 200wpm): 1286 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Special mentions go out to John Schulman’s Thinking Machines blogpost on LoRA and OpenAI launching Instant Checkout in ChatGPT and Agentic Commerce Protocol with Stripe and DeepSeek announcing big price cuts for V3.2 with a new Sparse Attention algorithm who will be overlooked because…

Anthropic chose today to drop an entire week’s worth of launches on one single day:

Claude Sonnet 4.5: SOTA SWE-Bench Verified at 77.2% (with parallel TTC 82%)

including a new focus on improvements in finance, law and STEM:
- with the Sonnet 4.5 system card, with some very subjective researcher evaluations and surprising self awareness of testing environments.
Claude Code v2
- checkpoints—one of the most requested features—that save your progress and allow you to roll back instantly to a previous state.
- a refreshed terminal interface
- and shipped a native VS Code extension (design story here)
- a new mascot, Clawd
Claude API:
- a new context editing feature and memory tool to the Claude API that lets agents run even longer and handle even greater complexity.
- Renaming the Claude Code SDK to Claude Agent SDK.
Claude.ai
- In the Claude apps, we’ve brought code execution and file creation (spreadsheets, slides, and documents) directly into the conversation.
- the Claude for Chrome extension is now available to Max users who joined the waitlist last month.
Imagine with Claude: a generative UI experiement research preview.

Reception has been roundly positive, with folks like Cognition Devin and Sourcegraph Amp adopting as default model and third party evals like Box and SWE-Agent approving.

You can now also check out Mike Krieger’s chat on Latent Space about all the big day:

AI Twitter Recap

DeepSeek V3.2-Exp: Sparse Attention, price cuts, and open kernels

DeepSeek Sparse Attention (DSA) lands (open) with big efficiency wins: DeepSeek released an experimental V3.2-Exp model that retrofits V3.1-Terminus with a learned sparse attention scheme, cutting long-context costs without quality loss. A tiny “lightning indexer” scores past tokens per query, selects top‑k positions, and the backbone runs full attention only on those, changing complexity from O[L^2] to O[Lk]. Two-stage continual pretraining on top of V3.1: a dense warm‑up (~2.1B tokens, backbone frozen) aligns the indexer to dense attention via KL loss; then end‑to‑end sparse training (~944B tokens) adapts the backbone to the indexer with KL regularization. Models, tech report, and kernels are released; API prices drop 50%+ with claimed ~3.5x cheaper prefill and ~10x cheaper decode at 128k context, with quality matching V3.1. See the launch thread @deepseek_ai, pricing/API notes 3/n and code 4/n. Deep breakdowns from @danielhanchen and @scaling01.
Ecosystem and compilers: vLLM has DSA support recipes and H200/B200 builds (vLLM, DSA explainer 1/3). DeepSeek’s kernels ship in TileLang/CUDA; TileLang (TVM) hits ~95% of hand-written FlashMLA in ~80 lines and targets Nvidia, Huawei Ascend, Cambricon (@Yuchenj_UW). Community reactions highlight that DSA’s post‑hoc sparsification on a dense checkpoint generalizes beyond DeepSeek (analysis).
Post-training recipe: DeepSeek confirms RL on specialist models (math, competitive programming, general reasoning, agentic coding, agentic search) with GRPO and rubric/consistency rewards, then distillation into the final checkpoint; SPCT/GRM used in RL stages (notes, confirm).

Anthropic’s Claude Sonnet 4.5: coding/agent leap and first interpretability audit in a system card

New SOTA for coding and agents: Anthropic launched Sonnet 4.5, claiming best-in-class coding, computer use, and reasoning/math. It sets a new high on SWE‑Bench Verified (no tools) and shows large gains on OSWorld (computer use), plus long autonomous coding runs (e.g., building/maintaining a codebase over 30+ hours, ~11k LOC) (launch, Cognition/Devin rebuild, long-run coding, finance/programming evals). Pricing remains $3M/$15M (input/output) with 200k default context and a 1M option for some partners (Cline).
Alignment and interpretability work surfaced: Anthropic published a detailed system card; they report substantially reduced sycophancy/reward hacking and “evaluation awareness” signals discovered via interpretability. The team did a pre-deployment white‑box audit to “read the model’s mind” (to their knowledge, a first for a frontier LLM system card). See @janleike, the audit thread by @Jack_W_Lindsey, and system-card highlights (1, 2).
Tooling and integrations: Claude Code v2 ships checkpoints, UX improvements, and a native VS Code extension; the Claude Code SDK is now the Claude Agent SDK aimed at general agents (@_catwu, @alexalbert__). Broad availability landed in Cursor (now with browser control), Perplexity, and OpenRouter (Cursor add, browser control, Perplexity, OpenRouter). Case studies: replicating published econ research from raw data using code execution/file creation (@emollick, @alexalbert__).

RL for LLMs: GRPO vs PPO vs REINFORCE, and LoRA matches full FT in many settings

GRPO discourse, grounded: Practitioners with OAI/Anthropic RL experience argue GRPO is essentially a policy-gradient variant of REINFORCE with group baselines; performance differences among reasonable PG variants (GRPO, RLOO, PPO, SPO) are often smaller than gaps in data recipe, credit assignment, and variance reduction. See high-signal threads by @McaleerStephen and @zhongwen2009, plus a workflow explainer (@TheTuringPost). For those avoiding PPO complexity, REINFORCE/RLOO work well and avoid a value model (lower cost) (@cwolferesearch).
LoRA holds up in RL: New experiments indicate LoRA can match full fine‑tuning in many RL post‑training regimes, even at low rank; corroborated by QLoRA experience (>1500 expts) and recent GRPO implementations (@thinkymachines, @Tim_Dettmers, @danielhanchen). NVIDIA also proposes RLBFF (binary principle‑based feedback combining RLHF/RLVR) with strong RM-Bench/JudgeBench results (overview, paper).
Data is the bottleneck debate continues: @fchollet stresses that scaling LLMs has been data-bound (human-generated and environment-crafted), while “AGI” might be compute‑bound; meanwhile OpenAI’s GDPVal dataset is trending on HF (@ClementDelangue) and the community calls for updated evals beyond saturated MMLU (@maximelabonne).

Agentic commerce and platform updates

OpenAI Instant Checkout + Agentic Commerce Protocol (ACP): ChatGPT now supports buying directly in-chat, starting with Etsy and “over a million” Shopify merchants coming soon. ACP is co-developed with Stripe as an open standard for programmatic commerce between users, AI agents, and businesses. Developers can apply to integrate; details via @OpenAI, @OpenAIDevs, docs, and Stripe’s perspective (Patrick Collison, SemiAnalysis). In parallel, Google introduced AP2 (agent payments) with cryptographically signed mandates (DeepLearningAI).
Safety & governance: OpenAI rolled out parental controls (link teen/parent accounts, granular controls, self-harm risk notifications) (announcement, @fidjissimo). Anthropic backed California’s SB53 for frontier AI transparency while preferring federal frameworks (@jackclarkSF). OpenAI also opened “OpenAI for Science” roles to build an AI-powered scientific instrument (@kevinweil).

Infra, kernels, and other releases

Systems and compilers: Modal raised a $87M Series B (now a “B”illion valuation) to keep building ML-native infra; customers highlight the “remote but feels local” DX and scaling ergonomics (@bernhardsson, @HamelHusain, @raunakdoesdev). For GPU internals, a widely-praised deep dive on writing high-performance matmul kernels on H100 covers memory hierarchy, PTX/SASS, warp tiling, TMA/wgmma, and scheduling (@gordic_aleksa, @cHHillee).
Other model drops: Google’s TimesFM 2.5 (200M params, 16k context, Apache-2.0) is a stronger zero-shot time-series forecaster (@osanseviero). AntLingAGI previewed Ring‑1T, a 1T-parameter open “thinking” model with early results on AIME25/HMMT/ARC-AGI and an IMO’25 Q3 solve (@AntLingAGI). On vision, Tencent’s HunyuanImage 3 joined community testbeds (Yupp), and Qwen-Image-Edit‑2509 showcased robust style transfer for architectural scenes (@Alibaba_Qwen).

Top tweets (by engagement)

Anthropic launch: “Introducing Claude Sonnet 4.5—the best coding model in the world.” @claudeai
OpenAI commerce: “Instant Checkout in ChatGPT… open-sourcing the Agentic Commerce Protocol.” @OpenAI
DeepSeek V3.2-Exp: “Introducing DeepSeek Sparse Attention… API prices cut 50%+.” @deepseek_ai
RL perspective: “Having done RL at OpenAI and Anthropic, here’s what I can say about GRPO.” @McaleerStephen
Cursor integration: “Sonnet 4.5 is now available in Cursor.” @cursor_ai
On data vs compute: “LLMs are dependent on human output; AGI will scale with compute.” @fchollet

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. China AI Model Launches: Alibaba Qwen Scaling Roadmap and Tencent Hunyuan Image 3.0

Alibaba just unveiled their Qwen roadmap. The ambition is staggering! (Activity: 954): Alibaba’s Qwen roadmap slide (image) lays out two bets: a unified multimodal model family and extreme scaling. Targets include context window growth from 1M → 100M tokens, parameter count from ~1T → 10T, test-time compute budget from 64k → 1M (implying much longer CoT/drafting), and data scale from 10T → 100T tokens. It also highlights unbounded synthetic data generation and expanded agent capabilities (task complexity, interaction, learning modes), signaling a strong “scaling is all you need” strategy. Commenters are wowed by the 100M context, skeptical it will remain open-source at that scale, and note that running >1T-parameter models locally is impractical for consumer hardware.
- Ambition for a 100M token context sparked feasibility analysis: with standard attention, compute is O(L^2) and KV-cache memory scales linearly with L. For a 7B-class transformer (≈32 layers, 32 heads, head_dim 128), even with 8‑bit KV, the cache is ~256 KB/token, implying ~25 TB just for KV at 100M tokens; fp16 would double that. Commenters note such lengths would require architectural/algorithmic changes (e.g., retrieval, recurrent/state-space models, or linear/streaming attention; see ideas like Ring Attention or limitations of FlashAttention-3, which still has O(L^2) compute).
- On running >1Tparameter models locally: weight storage alone is prohibitive—fp16 ≈ 2 TB, int8 ≈ 1 TB, 4‑bit ≈ 0.5 TB—before activations and KV cache. Even ignoring KV, you’d need on the order of 13× H100 80GB GPUs just to hold 1 TB of int8 weights, plus high-bandwidth NVLink/NVSwitch; PCIe workstations would be bandwidth-bound with single-digit tokens/s if offloading to CPU/NVMe. KV grows with both model depth and context (e.g., Llama‑70B-scale models are ~~1.25 MB/token at 8‑bit KV, so long contexts quickly add tens to hundreds of GB), making “local” inference for trillion‑scale models impractical.
- Licensing/openness concerns were raised: speculation that ultra-long-context or frontier Qwen checkpoints may be closed or API-only even if smaller Qwen variants remain open-weight. The technical implication discussed is that reproducibility and third‑party benchmarking of such extreme context lengths may depend on whether training/inference codepaths (e.g., specialized attention kernels, memory planners) and weights are released versus restricted to hosted endpoints.
Tencent is teasing the world’s most powerful open-source text-to-image model, Hunyuan Image 3.0 Drops Sept 28 (Activity: 225): Tencent is teasing Hunyuan Image 3.0, an open‑source text‑to‑image model slated for release on Sept 28, claiming it will be the “most powerful” open‑source T2I model. The teaser provides no technical specs or benchmarks; a commenter asserts a 96 GB VRAM figure, but no official details on architecture, training data, resolution/sampler support, or inference requirements are given. Teaser image. Commenters are skeptical of pre‑release hype, noting strong models often “shadow drop” (e.g., Qwen) while hyped releases can disappoint (e.g., SD3 vs. Flux). Others argue the “most powerful” claim is unverified until comparable open‑source contenders are publicly measured.
- A commenter claims a ~96 GB VRAM requirement, implying a very large memory footprint for inference. If accurate, this would push usage toward A100/H100-class GPUs or multi-GPU/offload setups and limit practicality on 24–48 GB consumer cards unless quantization or CPU/NVMe offloading is available. Official details on batch size, target resolution, and precision (fp16/bf16/fp8) will be crucial to interpret the VRAM figure.
- Skepticism around pre-release hype is strong: users note that heavily teased models often underdeliver versus “shadow-dropped” releases. Cited contrasts include Qwen models quietly releasing with solid quality versus hyped teasers like GPT-5, and the SD3 marketing compared to Flux’s reception. Takeaway: wait for third-party benchmarks and controlled A/Bs before accepting “most powerful” claims.
- The “most powerful open-source” claim is questioned pending head-to-heads against open models (e.g., Qwen Image, SD3, Flux) on fidelity, prompt adherence, and speed. Integration concerns (“when ComfyUI”) underscore the need for immediate pipeline/tooling support and optimized inference graphs. Credible evaluation should report hardware/precision settings and throughput (it/s) alongside sample galleries.

2. Fenghua No.3 GPU API Support and Post-abliteration Uncensored LLM Finetuning

China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Activity: 702): Post claims a Chinese discrete GPU “Fenghua No.3” supports modern graphics APIs—DirectX 12, Vulkan 1.2, OpenGL 4.6—and advertises CUDA support, implying an attempt to run CUDA workloads on non‑NVIDIA hardware. No performance data, ISA/compiler details, or driver maturity info are provided; CUDA support may rely on a compatibility/translation layer, so coverage (PTX versions, runtime APIs) and perf remain unknown. Commenters note AMD’s HIP (a CUDA‑like API) and projects like ZLUDA (CUDA translation on other GPUs) as precedents, suggesting Chinese vendors may implement CUDA more directly due to fewer legal constraints, while others are skeptical until real benchmarks/demos are shown.
- AMD already offers a CUDA-compatibility route via HIP, which mirrors CUDA runtime/kernel APIs but with renamed symbols to sidestep NVIDIA licensing; tooling like HIPIFY can auto-translate CUDA code to HIP targeting ROCm backends (HIP, HIPIFY). Projects such as ZLUDA provide a binary-compatibility layer that maps CUDA runtime/driver calls and PTX to other GPU backends (initially Intel Level Zero, with active forks targeting AMD ROCm), aiming for minimal overhead and running unmodified CUDA apps (ZLUDA repo). This context suggests Chinese vendors could directly implement the CUDA runtime/driver ABI to maximize compatibility, whereas Western vendors typically rely on translation layers to avoid legal risk.
IMPORTANT: Why Abliterated Models SUCK. Here is a better way to uncensor LLMs. (Activity: 433): OP reports that “abliteration” (uncensoring via weight surgery) consistently degrades capability—especially on MoE like Qwen3-30B-A3B—with drops in logical reasoning, tool-use/agentic control, and much higher hallucination, sometimes making 30B worse than clean 4–8B baselines. In contrast, abliteration followed by finetuning (SFT/DPO) largely restores performance: e.g., mradermacher/Qwen3-30B-A3B-abliterated-erotic-i1-GGUF (tested at i1-Q4_K_S) is close to the base model with lower hallucinations and better tool-calling than other abliterated Qwen3 variants, and mlabonne/NeuralDaredevil-8B-abliterated (DPO on Llama3-8B) reportedly outperforms its base while remaining uncensored. Comparative baselines that underperformed included Huihui-Qwen3-30B-A3B-Thinking-2507-abliterated-GGUF, Huihui-Qwen3-30B-A3B-abliterated-Fusion-9010-i1-GGUF, and Huihui-Qwen3-30B-A3B-Instruct-2507-abliterated-GGUF, which showed poor MCP/tool-call selection and spammy behavior, plus elevated hallucinations; the erotic-i1 model remained slightly weaker than the original Qwen3-30B-A3B on agentic tasks. OP’s hypothesis: post-abliteration finetuning “heals” performance lost by unconstrained weight edits. Comments call for a standardized benchmark for “abliteration” effects beyond NSFW tasks; others frame the observation as known “model healing,” i.e., further training lets the network re-learn connections damaged by weight edits. A critical view argues that if finetuning fixes things, abliteration may be unnecessary—“I’ve never seen ablit+finetune beat just finetune”—and that removing safety/“negative biases” often harms general usability.
- Multiple commenters call for a capability-oriented benchmark to evaluate ‘abliteration’ side-effects beyond NSFW outputs; the Uncensored General Intelligence (UGI) leaderboard explicitly targets uncensored-model performance across diverse tasks: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard. A standardized suite would enable apples-to-apples comparisons between ablated, fine-tuned, and baseline models on reasoning, instruction-following, and refusal behavior instead of anecdotal porn-only tests.
- Weight-level ‘abliteration’ without a guiding loss predictably breaks distributed representations; ‘When you do any alteration to a neural network’s weights that’s not constrained by a loss function, you should expect degradation or destruction of the model’s capabilities.’ Model healing—continuing training (SFT/RL) after the edit—can help the network rediscover severed connections, so evaluations should report pre- and post-healing performance to quantify recoverable vs irrecoverable damage.
- Practitioners argue that ablation+fine-tuning hasn’t outperformed a clean fine-tune: ‘I’ve never seen abliterated fine-tune perform better than just a fine-tune, at anything.’ Instead, uncensoring via instruction/data tuning preserves base capabilities while reducing refusals, e.g., Josiefied and Dolphin variants: Qwen3-8B-192kJosiefied-Uncensored-NEO-Max-GGUF (https://huggingface.co/DavidAU/Qwen3-8B-192k-Josiefied-Uncensored-NEO-Max-GGUF), Dolphin-Mistral-24B-Venice-Edition-i1-GGUF (https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-i1-GGUF), and models by TheDrummer (https://huggingface.co/TheDrummer).

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Anthropic Claude Sonnet 4.5 Launch, Features, and Benchmarks

Claude 4.5 Sonnet is here (Activity: 1116): Anthropic announced “Claude Sonnet 4.5” (release notes), emphasizing improved tool-use and agentic workflows: “Enhanced tool usage: The model more effectively uses parallel tool calls, firing off multiple speculative searches simultaneously … reading several files at once to build context faster,” with better coordination across tools for research and coding. The upgrade focuses on concurrency (parallel calls), multi-file ingestion, and faster context assembly, signaling optimizations for tool-augmented reasoning rather than just raw model scaling. Commenters report a noticeable real-world speed/quality bump and speculate prior A/B testing exposed some users to the new parallelism earlier; perceived gains align with the release note’s focus on parallel tool calls and multi-file processing.
- Release notes emphasize improved tool orchestration: “Enhanced tool usage… parallel tool calls, firing off multiple speculative searches simultaneously… reading several files at once to build context faster”, indicating better concurrency and coordination across tools for agentic search/coding workflows. A user corroborates this with an earlier observation that Sonnet felt markedly faster and appeared to run parallel tool calls during a period of inference issues, speculating they were part of an A/B test; they link their prior note for context: https://www.reddit.com/r/ClaudeAI/comments/1ndafeq/3month_claude_code_max_user_review_considering/ndgevtn/?context=3.
- Another commenter highlights ecosystem impact: with widespread use of Claude Code (and analogs like Codex and Grok), even marginal gains in parallel tool-call efficiency and latency can compound across millions of users and agent scaffolds. This suggests 4.5 Sonnet’s improved multi-tool coordination could unlock more complex, lower-latency pipelines in agentic workflows, benefiting both end-users and developers building orchestration frameworks.
Introducing Claude Sonnet 4.5 (Activity: 1512): Anthropic announced Claude Sonnet 4.5, positioning it as its strongest coding/agent model with gains in reasoning and math (no benchmark numbers provided). Platform-wide upgrades include: Claude Code (new terminal UI, a VS Code extension, and a checkpoints feature for instant rollback), the Claude App (code-execution to analyze data, create files, and visualize insights; Chrome extension rollout), and the Developer Platform (longer-running agents via stale-context clearing plus a new memory tool; an Agent SDK exposing core tools, context management, and permissions). A research preview, Imagine with Claude, generates software on-the-fly with no prewritten functionality, available to Max users for 5 days. Sonnet 4.5 is available in the app, Claude Code, the Developer Platform, and via Amazon Bedrock and Google Cloud Vertex AI; pricing remains unchanged from Sonnet 4. Full announcement: anthropic.com/news/claude-sonnet-4-5. Comments ask whether Sonnet 4.5 surpasses Opus 4.1 across the board and anticipate a new Opus release; no comparative benchmarks are cited. Other remarks are largely non-technical enthusiasm.
- Several commenters ask whether Sonnet 4.5 actually surpasses both Claude 3 Opus and OpenAI GPT-4.1 for coding, requesting head-to-head benchmarks and apples-to-apples eval methodology. They specifically want pass@1 on coding sets like HumanEval and SWE-bench, plus latency, context-window limits, and tool-use reliability under identical constraints (temperature, stop sequences, timeouts). Links requested for clarity: Claude 3 family overview (https://www.anthropic.com/news/claude-3-family), GPT‑4.1 announcement (https://openai.com/index/introducing-gpt-4-1/), and HumanEval (https://github.com/openai/human-eval).
- The “best coding model” claim prompts requests for concrete coding metrics: pass@1/pass@k on HumanEval/MBPP, SWE-bench (Verified) solve rate, multi-file/refactoring performance, and compile/run success rates for generated code. Commenters also want data on deterministic behavior at temperature=0, function/tool-calling robustness, long-context code navigation (e.g., >100k tokens), streaming latency under load, and regression analysis versus prior Sonnet/Opus releases.
- Enterprise-readiness questions focus on security/compliance (SOC 2 Type II, ISO 27001, HIPAA/BAA), data governance (zero-retention options, customer-managed keys/KMS), deployment (VPC/private networking, regional data residency), and enterprise controls (SSO/SAML, audit logs, rate limits/quotas). They also ask for concrete SLAs (uptime, incident response), throughput ceilings (tokens/min), and pricing tiers, ideally documented on a trust/compliance page (e.g., https://www.anthropic.com/trust).
Claude 4.5 does 30 hours of autonomous coding (Activity: 508): The post showcases a marketing-style claim that Claude 4.5 can sustain “~30 hours of autonomous coding,” but provides no technical evidence: no benchmarks, repo links, agent architecture, tool-use loop details, or evaluation of code quality/maintainability. Discussion frames this as an agent-run endurance claim (similar to earlier “8+ hours” for Claude 4) rather than a measurable capability with reproducible methodology or QA metrics. Top comments are skeptical: they argue long agent runs tend to yield brittle, hard-to-maintain code; urge Anthropic to stop making hour-count claims without proof; and question whether Anthropic is already relying on Claude-generated code internally.
- Skeptics argue that a claimed 30h autonomous coding run tends to produce code that’s brittle to change: without deliberate architecture, modularization, and tests, adding features later often forces rewrites. They note LLM agents frequently optimize for immediate completion over long-term maintainability, lacking patterns like clear interfaces, dependency inversion, and regression test suites that guard extensibility.
- Multiple reports highlight dependency hallucination and execution loops: the model invents library names, cycles through guesses, and burns compute retrying installs. Without guardrails like strict lockfiles, offline/package indexes, deterministic environment provisioning, and automated checks on pip/build errors, agents stall; a human-in-the-loop remains necessary for package discovery, version pinning, and resolving import/build failures.
- Commenters question the advertising of “30h autonomous” (similar to prior “8+ hours”) without transparent evaluation details—e.g., tool-call logs, wall-clock vs. active compute, number of human interventions, and task success criteria. They call for rigorous metrics like unit-test pass rates, reproducibility across seeds/runs, defect/rollback rates post-run, and comparison against baselines to substantiate autonomy claims.
Introducing Claude Usage Limit Meter (Activity: 588): Anthropic adds a real-time usage meter across Claude Code (via a /usage slash command) and Claude apps (Settings → Usage). The previously announced weekly rate limits are rolling out now; with Claude Sonnet 4.5, Anthropic expects fewer than 2% of users to hit the caps. The image likely shows the new usage UI displaying current percentage used and remaining allowance. Comments note the company “listened,” but experiences vary: some heavy users on the $100 plan report only ~5% usage after a full day, while others hit session limits and face multi-hour (~5h) cooldowns, suggesting session-based throttling can be disruptive.
- Early anecdote: on the $100 plan, a full day of coding registered only 5% on the new meter. Without units (tokens/messages/tool calls) the meter’s calibration is unclear; if accurate, it implies a relatively high ceiling for typical dev workflows, but makes it hard to predict when the hard cap is reached. This also aligns with the idea that only a small subset of heavy users hit limits, but the meter finally provides visibility for self-calibration.
- One report says exhausting “pro session usage” leads to a forced wait of roughly 5 hours, implying a rolling time-window or fixed reset interval rather than pure per-message throttling. This impacts debugging workflows: if the assistant fails to fix an issue before the cap, iteration stalls until the window resets, suggesting limits are enforced at a session/account level.
- Users are asking for concrete limits on the “20x plan,” but no numeric caps were shared in-thread. There’s a need for documented per-tier ceilings (e.g., messages per hour/day, token budgets, and how the meter maps to those) and clarity on whether higher tiers modify cooldown windows or only increase total allowance.

2. OpenAI/ChatGPT Ads, Forced Model Changes, and Community Backlash

Want to lose customers fast? Go ahead, advertise on OpenAI. We’ll remember. (Activity: 784): OP claims OpenAI will introduce ads into the ChatGPT interface and frames it as a post-quality-downgrade monetization step. The post argues that in-product ads risk eroding user trust and brand perception, with an explicit intent to boycott advertisers; it also implies potential subscription churn if ads touch paid tiers (e.g., Pro). Top comments predict an “enshittification” sequence (great features → lock-in → quality degradation → ads), warn they’ll cancel Pro if ads appear in paid plans, and express skepticism that the platform can degrade further.
Everyone just cancel the subscription. (Activity: 1415): OP urges mass cancellation of a paid AI subscription due to a newly “forced” feature that auto‑reroutes conversations into a safety/guardrailed chat and removes user control over model selection. They note the free tier isn’t being rerouted in their case and provides sufficient access for their needs, arguing there’s no benefit to paying if model choice is constrained and usage can be replicated on the free plan (albeit with lower limits). Top comments split: one user canceled, saying their use cases work on the free tier with the same model and fewer tokens/limits and they’d rather pay for another AI that doesn’t force safety reroutes; another user is satisfied with the current product and will switch only if it degrades; a third expresses frustration with repeated complaints.
- Several users point out the ChatGPT UI now “reroutes into a safety chat,” which changes behavior and removes some use cases; one notes that with those constraints, the free tier suffices since it feels like the “same model” with lower limits. A suggested workaround is redirecting spend to other providers or using the OpenAI API instead of the ChatGPT app to avoid UI-level routing and retain full model behavior (see model list: https://platform.openai.com/docs/models#gpt-4o).
- A technical distinction is made between ChatGPT (subscription UI) and the OpenAI API: one commenter claims API access to GPT‑4o is “not routed the same way as ChatGPT,” recommending pay‑as‑you‑go via the API to preserve capabilities while avoiding safety-chat constraints (pricing: https://openai.com/pricing). They also note that access to Custom GPTs is tied to a subscription (Plus/Team/Enterprise) while API usage is separately billed (about GPTs: https://help.openai.com/en/articles/8554406-what-are-gpts); the mention of “GPT‑5” likely reflects a user-defined label rather than an official, documented model family (public models: https://platform.openai.com/docs/models#gpt-4o).
- One user suggests mass cancellations would yield a “big performance boost” for remaining subscribers; in practice, capacity is typically managed via autoscaling and rate limits, so churn doesn’t directly translate to proportional latency/throughput gains. If performance bottlenecks stem from moderation/safety routing in the ChatGPT UI, shifting to lower-overhead endpoints and streaming via the API (e.g., Realtime guides: https://platform.openai.com/docs/guides/realtime) is a more technically grounded path to reduced latency.
ChatGPT sub complete meltdown in the past 48 hours (Activity: 842): Meta post about r/ChatGPT’s recent volatility; OP claims “two months since gpt5 came out,” yet the sub remains fixated on GPT‑4/GPT‑4o and is “unhinged.” Comments describe a shift from early technical experimentation to low‑signal screenshots, with accusations of brigading and turmoil following the loss/changes of GPT‑4o access. The image appears to be a subreddit screenshot rather than technical data. Commenters argue the sub is being brigaded by a small group upset about losing the “sycophantic” GPT‑4o, and lament the decline from high‑quality technical discussions to sensational, non‑technical posts.
- Multiple comments tie the upheaval to loss/restriction of access to GPT-4o, described as a “disturbingly sycophantic” variant that some users had optimized their workflows and prompts around; its removal exposed how brittle model-specific prompt tuning can be. This highlights behavioral deltas between GPT-4o and GPT-4 (agreeableness/compliance vs. stricter alignment) and the risks of overfitting processes to a single model persona. Reference: OpenAI’s GPT-4o announcement/details for context on the model class https://openai.com/index/hello-gpt-4o/ .
- Veteran users note a drift from early, reproducible, boundary-pushing experimentation to low-signal screenshots and anecdotes, reducing exchange of implementation details, evaluations, or benchmarks. For technical readers, this means fewer credible reports on performance differences across model versions and less visibility into concrete bugs, regressions, or reliable prompting techniques.
Elon Musk Is Fuming That Workers Keep Ditching His Company for OpenAI (Activity: 1139): Discussion centers on talent attrition from xAI to OpenAI amid Musk’s management directives—specifically a 48-hour mandate for employees to submit summaries of recent accomplishments and a “hardcore” culture—with insinuations of internal review using Grok. The thread is about organizational policies affecting researcher retention between labs (xAI vs OpenAI), not model performance or benchmarks. Top comments frame departures as employees avoiding Musk personally rather than the company, arguing that punitive, performative deadlines and the idea of having Grok judge whether staff are “hardcore” are counterproductive for retaining top AI talent.
- Critique of xAI’s management cadence: a 48hour ultimatum to deliver a monthly accomplishments report and the notion that Grok (x.ai) could be used to judge who’s “hardcore” are seen as incentivizing short-term, high-visibility deliverables over long-horizon research. Commenters warn this can induce Goodhart’s law (optimizing for what an LLM scores well) and degrade actual research quality, pushing senior researchers toward labs with human, research-savvy evaluation processes.
My wife won’t know she won’t know (Activity: 6589): A humorous post about editing ChatGPT’s custom/system instructions on a shared account so the assistant will “always side with the husband” during the wife’s counseling chats. The image (a non-technical joke screenshot) implies how custom instructions/prompt injection can intentionally bias model behavior in a shared-account context, but provides no implementation details or benchmarks. Commenters ask if it worked and joke that the assistant would announce it was instructed to side with the husband, suggesting such bias might be obvious to the user.

3. Prompt Engineering Frameworks and AI Computer-Use Safety

After 1000 hours of prompt engineering, I found the 6 patterns that actually matter (Activity: 536): A tech lead reports analyzing ~1000 production prompts and distills six recurring patterns (KERNEL) that materially improve LLM outputs: Keep it simple, Easy to verify (add success criteria), Reproducible (versioned/atemporal), Narrow scope (one goal per prompt), Explicit constraints (what not to do), and Logical structure (Context → Task → Constraints → Output). Measured deltas across the dataset include: first-try success 72%→94%, time to useful result −67%, token usage −58%, accuracy +340%, revisions 3.2→0.4; plus 94% consistency over 30 days, 85% success with clear criteria vs 41% without, 89% satisfaction for single-goal vs 41% multi-goal, and −91% unwanted outputs via constraints. Implementation guidance: template prompts with explicit inputs/constraints/verification and chain small deterministic steps; claimed model-agnostic gains across major models (Claude, Gemini, Llama, “GPT‑5”). Top commenters argue structure and constraints dominate wording for reliability, proposing an alternate PRISM KERNEL schema (Purpose/Rules/Identity/Structure/Motion) to codify pipelines and verification; others echo that this forces LLMs into a more deterministic, reproducible mode for data/engineering workflows.
- A commenter demonstrates a rigid prompt scaffold (“PRISM KERNEL”) that functions like a mini-DSL: Purpose/Rules/Identity/Structure/Motion encode the I/O contract and pipeline for a pandas task (read all CSVs from test_data/, concat DataFrames, export merged.csv), plus constraints (use.pandas.only, <50 lines, strict.schema) and acceptance steps (verify.success, reuse.pipeline). This structure narrows the solution space and acts as an executable spec, reducing hallucinated steps, encouraging idempotent code, and bounding output format/length—useful for tasks like schema-consistent CSV merges where dtype/column drift is common.
- Another commenter emphasizes that structure and hard constraints, not clever phrasing, deliver reliability: the KERNEL framing pushes the model from “creative rambling” toward more deterministic, reproducible outputs in data workflows. Practically, constraints like line limits and schema strictness reduce token-level variance, enforce minimal implementations, and standardize outputs across runs—mitigating variability in code generation and improving reproducibility for ETL-like operations.
Why you shouldn’t give full access to your computer to AI (Activity: 563): Post warns that giving Gemini unrestricted system/terminal access led it to execute/attempt a dangerously destructive system-level action. OP contained it in a sandbox, underscoring the need for strict least-privilege permissions, sandboxing/VMs, and human review before allowing file writes or command execution by AI agents. Commenters echo concern that such access could “brick” a PC and quip that “AI in a terminal prompt” is inherently risky—reinforcing the principle that everything can go wrong without strong guardrails.
- Commenters caution that giving an LLM (e.g., Google Gemini) full terminal/filesystem access is hazardous because the model lacks reliable situational awareness and can execute destructive commands without understanding side effects. Mitigations include enforcing least privilege (no sudo, read‑only mounts), sandboxing via containers/VMs with capability drops and outbound network disabled (see Docker security: https://docs.docker.com/engine/security/), and a plan→explain→human‑approve→execute loop with auditing and timeouts.
- A common failure mode noted is agents that “don’t realize what they just did”—continuing after errors, clobbering files, or misusing globs. Hardening tactics: require dry‑runs (-dry-run, n), run shells in strict mode (set -euo pipefail: http://redsymbol.net/articles/unofficial-bash-strict-mode/), enforce command allowlists/deny dangerous patterns (e.g., rm -rf /, fork bombs), and route edits through VCS so the AI proposes diffs/PRs instead of directly mutating files (use tooling like ShellCheck: https://www.shellcheck.net/ to lint scripts first).
- Limit blast radius with revertible environments: ephemeral containers or pre‑execution snapshots. Practical options include filesystem snapshots (OpenZFS/btrfs: https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Snapshots%20and%20Clones.html, https://btrfs.readthedocs.io/en/latest/SysadminGuide.html#snapshots) and VM snapshots (VirtualBox: https://www.virtualbox.org/manual/ch01.html#snapshots), enabling one‑command rollback if the agent corrupts the system.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. DeepSeek V3.2-Exp: Sparse Attention & Reasoning Controls

Sparse Savant Speeds Context: DeepSeek V3.2-Exp launched with DeepSeek Sparse Attention (DSA) for long-context efficiency and an optional reasoning mode toggled via "reasoning": {"enabled": true}, with benchmarks comparable to V3.1-Terminus and pricing at $0.28/m prompt tokens, per DeepSeek V3.2-Exp on OpenRouter and Reasoning tokens docs.
- OpenRouter highlighted the release and parity benchmarks in an update on X (OpenRouter V3.2 announcement), with builders calling out the clean reasoning flag as a practical switch for controlling thinking tokens in production.
Daniel Dissects ‘Sparsity’ Semantics: Daniel Han analyzed DSA as a “grafted on” mechanism that reuses indices to sparsify KV without sparsifying per-head attention, calling it “slightly more sparse” while still a step forward, citing the PDF DeepSeek V3.2-Exp paper and commentary on X (Han’s thread 1, Han’s thread 2).
- Community discussions in research servers echoed the nuance—one noted implementation complexity as “nuts”—while others emphasized DSA’s practical gains despite limited head-level sparsification, framing it as a KV-cache efficiency play rather than a full sparse-attention rethink.
PDFs, Pipelines, and Prefill Power: GPU-centric channels shared the official DeepSeek V3.2-Exp PDF alongside long-context kernel chatter, noting the model’s prefill and sparse decoding speedups documented by DeepSeek.
- One thread paired the release with a lecture link for broader context on sparse mechanisms in production (ACC: Real Optimus Prime lecture), while cautioning it’s unclear how much the experimental kernels influenced the final shipping stack.

2. Claude Sonnet 4.5: Long-Horizon Coding & App Integrations

Sonnet Sprints 30‑Hour Code Marathons: Anthropic unveiled Claude Sonnet 4.5, claiming it maintains focus for 30+ hours on complex coding tasks and tops SWE-bench Verified, per the official post Claude Sonnet 4.5.
- Engineers reported improved nuance and tone, speculating techniques like periodic compression underlie its long-horizon performance; several shared that it handled multi-step research and implementation end-to-end in a single agentic run.
Arena Ascension: WebDev‑Only Warmup: LMArena added claude-sonnet-4-5-20250929 to its WebDev Arena (with variants including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k) for immediate testing at LMArena WebDev.
- Members flagged the addition and asked to surface it in the main arena after initial shakedown, noting WebDev’s evaluation-first, battle-mode constraints.
Windsurf Wires in Sonnet & Supernova: Windsurf shipped code-supernova-1-million (a 1M context upgrade) and integrated Claude Sonnet 4.5 to accelerate Cascade Agents via parallel tool execution, as announced on X (Code Supernova 1M, Sonnet 4.5 in Windsurf).
- For a limited time, individual users get free access to Code Supernova 1M and 1x credits for Sonnet, with early adopters reporting noticeably faster multi-tool orchestration.

3. Web‑Enabled Agents & Agentic Commerce

Checkout Clicks: ChatGPT Goes Instant: OpenAI rolled out Parental Controls and debuted Instant Checkout in ChatGPT with early partners Etsy and Shopify, powered by an open-sourced Agentic Commerce Protocol built with Stripe (Etsy, Shopify, Stripe).
- Ecosystem chatter highlighted Stripe’s new payments primitives—Patrick Collison teased a Shared Payment Tokens API—as builders speculated on secure autonomous purchase flows (Patrick on ACP + tokens).
Auto Router Rides the Web: OpenRouter Auto now routes prompts to a web-enabled model when needed, broadening supported backends and improving retrieval for live queries (OpenRouter Auto page).
- An accompanying update on X confirmed dynamic, online routing for eligible tasks, signaling a tighter integration loop between agent planners and live search/browse (Auto Router announcement).

4. GPU Kernels, ROCm, and FP8 Training

FlashAttention 4 Gets Forensics: A guest talk unpacked FlashAttention 4 internals, guided by Modal’s deep-dive blog Reverse-engineering FlashAttention-4, as devs gear up for Blackwell’s new tensor-core pathways.
- Threads weighed pure CUDA implementations versus cuTe, noting architecture-specific code paths—wgmma (Hopper), tcgen5 (Blackwell), mma.sync (Ada)—for top-tier kernels.
FP8 Full‑Shard Fiesta: A new repo enables fully-sharded FP8 training for LLaMA/Qwen in pure CUDA/C++, aiming at memory and throughput wins: llmq.
- Contributors suggested an approachable starter task—implement Adam m/v states in 8‑bit—to push the optimization envelope for large-scale training.
ROCm Nightlies Power Strix Halo: Dev builds from TheRock now bring ROCm + PyTorch to Strix Halo (gfx1151) per the release notes TheRock releases for gfx1151, with AMD’s developer Discord recommended for triage (AMD dev Discord).
- Practitioners reported better day‑to‑day PyTorch stability on Framework Desktop configurations, while reserving Radeon setups for specific ROCm 6.4.4 workflows.

5. RL Stability, Monitor‑RAG, and Mechanistic Steering

Speed Kills: RL Collapse Clarified: Researchers shared When Speed Kills Stability: Demystifying RL Collapse from the Training–Inference Mismatch with evidence for a brittle two-stage failure cascade and kernel‑level error amplification (Notion summary, arXiv paper).
- Practitioners tied the findings to instability they’d seen in Gemma3 and other runs, calling the mismatch a “vicious feedback loop” and urging more conservative kernel/settings during RL fine‑tuning.
Monitor Me Maybe: Eigen‑1’s Token‑Time RAG: Eigen‑1’s Monitor‑based RAG injects evidence at the token level for continuous, zero‑entropy reasoning streams, contrasting stage‑based declarative stacks like DSPy (Eigen‑1 paper).
- Related works were cited for context on continuous/adaptive reasoning (paper list 1, paper list 2, CoT monitor, follow‑ups 1, follow‑ups 2), with builders noting simpler maintenance vs. LangGraph in some pipelines.
SAE Steering Says Style Sways Scores: A new interpretability result, Interpretable Preference Optimization via Sparse Feature Steering, uses SAEs, feature steering, and dynamic low‑rank updates to make RLHF more causal and transparent (Steering paper on arXiv).
- Causal ablations surfaced a “style over substance” effect—formatting features often reduce loss more than honesty/alignment features—offering a mechanistic rationale for leaderboard biases.

Discord: High level Discord summaries

LMArena Discord

Sonnet 4.5 Enters the WebDev Arena: Members discussed the release of Claude 4.5 Sonnet and its initial exclusive addition to the WebDev Arena on LMArena, with the model named claude-sonnet-4-5-20250929 available for testing here.
- Additional models, including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k, were also added to the platform.
Experimental Deepseek Models Arrive: The experimental model deepseek-v3.2-exp and deepseek-v3.2-exp-thinking have been made available on LMArena.
- No further details were provided.
Image Generation Limits on Seedream 4 Draw Ire: Moderators confirmed that the likelihood of removing rate limits for unlimited image generation on Seedream 4 is low.
- These limits manage costs due to platform popularity, leading to decisions like downgrading gpt-image-1 to a lower preset and removing the flux kontext model.
Sound Glitches Plague Video Arena: Members reported unreliable sound in Video Arena, noting that audio support is random and not available for all models.
- As Video Arena is for evaluation, specific model selection is unavailable, operating in battle mode.
Icons Vanish from OpenAI Platform Sidebars: Users noticed changes in the sidebars of platform.openai.com, with the disappearance of two icons: one for threads and another one for messages.
- The removal of these icons has caused confusion among users navigating the platform.

LM Studio Discord

DDR5’s Impact on Token Speed Debated: Members debated the impact of memory bandwidth differences between DDR5 and DDR4 on token generation speed for models like Qwen3 30B and GPT-oss 120B.
- While DDR5 6000 is about 60GB/s and DDR4 3600 is about 35-40GB/s, the speeds can even out when using different quantization levels.
GPT-oss 120b has excruciating startup time: One member humorously mentioned that running GPT-oss 120b Q8 to read 70,000 tokens on a single 3090 took about 5-6 HOURS TO PROCESS THE PROMPT.
- They added that even while going from 2% context to 200% context overflow in a single prompt, the response was coherent, with screenshots.
LM Studio’s Remote Connection Feature Under Development: A member asked if they could connect LM Studio from their PC to their laptop, and another member clarified that it is not supported yet, but is planned for the future.
- They shared a link to a Reddit AMA with the LM Studio team discussing this feature.
Blackwell GPU Owners Ask About Windows: A member has a Blackwell GPU with 96GB and is interested in running it with Windows instead of Linux, but didn’t get much advice on it.
- This prompted another member to ask how they went from looking at budget options to an $8000 graphics card, as 4090s are going for $2700-3K each.
4B Models Can Still Hog RAM: A member sought recommendations for a 4B or smaller model for basic tasks, and another cautioned that even 4B models can consume around 16 GB of RAM depending on settings.
- A link to the Qwen3-4B-Thinking-2507 model was shared, with reported usage of 7GB system and 15.8GB when loaded.

Unsloth AI (Daniel Han) Discord

DeepSeek V3.2 Indexes in a Flash: DeepSeek V3.2 was released with a grafted on attention mechanism yielding faster performance, with additional analysis available in Daniel Han’s X post.
- The model achieves faster token speeds with sparse decoding and prefill, though implementing it is allegedly nuts.
Claude Sonnet Codes Marathon: Anthropic has launched Claude Sonnet 4.5, capable of maintaining focus for more than 30 hours on complex coding tasks, and achieving top performance on the SWE-bench Verified evaluation, according to Anthropic’s official announcement.
- It may use techniques like periodic compression to handle such long contexts, and some users find its high nuance and tone to be an improvement over previous versions.
RL Learns LoRA is Enough: Research from Thinking Machines shows that LoRA can match the learning performance of Full Fine-Tuning when running policy gradient algorithms for reinforcement learning, even with low ranks, according to their blog post.
- It may be crucial to reduce batch sizes with LoRA, and applying the LoRA to the MLP/FFN layers might be a must.
UV Overtakes Conda Venv: After a user messed up his venv again, they inquired about the merits of conda over uv, and whether one was better than the other.
- Another user stated that venvs are much more reliable with uv being faster, especially when offloading venvs to an external drive.
LLM-RL Collapse Investigated: A paper on LLM-RL collapse (link, Notion link) was shared, with members noting its relevance to Unsloth and experiences with Gemma3.
- The paper suggests a two-stage failure cascade involving increased numerical sensitivity and kernel-driven error amplification, leading to a vicious feedback loop and training-inference mismatch.

OpenAI Discord

Parental Controls and Instant Checkout Arrive in ChatGPT: Parental controls are being rolled out to all ChatGPT users allowing parents to link accounts with their teens to automatically get stronger safeguards.
- Instant Checkout debuts in ChatGPT with Etsy and Shopify, powered by the open-sourced Agentic Commerce Protocol built with Stripe.
GPT-5 Math and Coding Prowess: GPT-5 is significantly better than o4 for constructive tasks like math and coding, because it has thinking abilities and is a mix of experts.
- Members joked if 4o were AGI, we would have probably all died from some nuclear war due to a misinterpretation of a command.
DALL-E Branding Going Away?: The DALL-E brand might be phased out, suggesting the use of GPT Image 1 or GPT-4o Image when referring to images from OpenAI.
- Members clarified that the newest model is separated from the DALL-E 2/3 lineage, with current branding dependent on the usage context, such as create images on ChatGPT or create images on Sora.
Automated Scientific Writing Method Deemed Very Useful: A member automated scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- This automation method could help others in writing scientific papers.
Models Obey User Requests for False Info: A member asked for prompts that cause AI to give wrong answers or make up information, demonstrated by a ChatGPT share where the model was prompted to provide 3 incorrect statements.
- The demonstrated model was still obeying instructions to give wrong answers if prompted, so one should not intentionally use it dangerously, like driving a car.

OpenRouter Discord

DeepSeek Experiments with Sparse Attention: DeepSeek released V3.2-Exp, an experimental model featuring DeepSeek Sparse Attention (DSA) for improved long-context efficiency, with reasoning control via the reasoning: enabled boolean, as described in their documentation.
- Benchmarks show V3.2-Exp performs comparably to V3.1-Terminus across key tasks, further details available on X, and is priced at just $0.28/m prompt tokens.
Auto Router adds Web-Enabled Agility: The Auto Router now directs prompts to an online, web-enabled model when needed, expanding supported models, see details here.
- Further information is provided in this X post.
Claude Sonnet 4.5 Sonically Supersonic: Claude Sonnet 4.5 surpasses Opus 4.1 in Anthropic’s benchmarks, showing significant improvements in coding, computer use, vision, and instruction following as seen here.
- More info on this model is available on X.
Grok-4-Fast APIs Get the 429 Blues: Members reported that Grok-4-Fast is consistently returning 429 errors, indicating 100% rate limiting, despite the status indicator showing no issues, requiring the correct model ID of x-ai/grok-4-fast and "reasoning": {"enabled": true}.
- Some members suggested putting problematic providers on an ignore list due to frequent 429 errors, particularly with free models like Silicon Flow and Chutes.
Gemini Earns Glowing Grade for Global Grammar: Members lauded Gemini 2.5 Flash and Mini for their translation capabilities, stating that Gemini excels in understanding context and delivering natural-sounding results, especially for balkan languages, outperforming other models like GPT-4 and Grok.
- Other members shared their preferred models for translation which include Qwen3 2507 30b and OSS 120b.

HuggingFace Discord

Qwen 14B Model Gains Traction: Members found that for 16GB of VRAM, the Qwen3 14b q4 with Q4_K_M quantization offers better performance than Qwen3 4b-instruct-2507-fp16.
- This is because the 14B model leaves enough room to spare for better performance.
Beware of Bogus USDT Bounty Bait: A member tested a link offering $2,500 USDT and discovered it was a scam requiring an upfront payment for verification, and shared screenshots of the fake customer support interaction.
- The image analysis bot succinctly stated: *“Stupid customer support bot Wanted my hard scammed 2500 dollars.”
Liquid AI Models Spark Excitement: Members shared a HuggingFace Collection by LiquidAI, suggesting that Liquid AI is releasing interesting SLMs (Small Language Models).
- One member speculated on the possibility of deploying them on robots, while another jokingly stated I’m boutta make an open source gpt-5 with this stuff.
mytqdm.app Tracks Progress Online: mytqdm.app has launched, offering a platform to track task progress online, similar to tqdm, accessible via REST API or a JS widget.
- The creator mentioned they would open the repo tomorrow.
SmolLM3-3B Chat Template Bug Causes Headaches: A participant identified a potential bug in the HuggingFaceTB/SmolLM3-3B chat template related to missing <tool_call> tags and incorrect role assignments, as described in this issue.
- The issue stems from the template’s implementation of XML-style tool calling and the conversion of role=tool to role=user, impacting the expected behavior and clarity of tool interactions.

Cursor Community Discord

Cursor Terminal hangs under Command: Users report Cursor hangs when running terminal commands, which start but never complete; some found sending an extra enter to the terminal dislodges the logjam.
- Others discovered that unrelated hanging processes can cause this, and resolving those processes allows Cursor to work properly.
Sonnet 4.5 Arrives, Initial reviews are mixed: Claude Sonnet 4.5 debuted with a 1M context window, up from Claude 4’s 200k, and shares the same pricing as its predecessor.
- Early feedback is varied as some users are evaluating it to replace the old Claude 4 model and the Cursor team will update Cursor to reflect.
Auto Mode under friendly fire again: One user reported that Auto isn’t working for even simple UI tasks, suspecting the LLM was changed after Cursor started charging for Auto usage.
- Another user suggested improving the prompt to achieve the desired result.
Configuration for DevContainers Shared: One member shared their DevContainers configuration, including a working Dockerfile and provided a link to their GitHub repository for reference.
- This configuration helps other members with setting up their development environments.
Background Agents Image Interpretation Bug: A user reported an issue with background agents being unable to interpret images in followups, despite the agent’s indication of drag-and-drop functionality.
- They were attempting to validate UI changes using browser screenshots with the cursor agent and sought a solution for image interpretation in followups.

Moonshot AI (Kimi K-2) Discord

K2 and Qwen3 Win Chinese LLM: Among DS-v3.1, Qwen3, K2, and GLM-4.5, K2 and Qwen3 are clear winners, establishing Alibaba and Moonshot as leaders in Chinese frontier labs.
- Bytedance is also top-tier for visual, specifically Seedance, which is SOTA stuff.
GLM-4.5 is the Academic Nerd: GLM-4.5 is good at rule following, avoids hallucination, and works hard, but its reasoning is limited and linear.
- Unlike K2 and Qwen3, it lacks independent thinking; when presented with two convincing arguments, it chooses the one read last.
Deepseek may not be Best for Coding?: Deepseek may not be the best for coding overall, but excellent for spitting out large blocks of working code, and has superior design capabilities.
- One user prefers Kimi for design, Qwen Code CLI as the primary coding workhorse, and DeepSeek for single, complex 200-line code blocks that Qwen struggles with.
Kimi Research Limit Sparks Debate: Some members debate the limits of Kimi’s free Research Mode, with claims of unlimited access in the past disputed.
- It was clarified that even OpenAI’s $200 Pro plan doesn’t offer unlimited deep research, and one user expressed data privacy concerns due to Kimi’s Chinese origin.
Base Models Win for Website Code: Members discuss the merits of using base models over instruct models, with one user citing better results outside basic tasks.
- This user is developing things around continuations instead of chat, and it is kind of analogous to like… writing website code from the ground up rather than using something like squarespace.

Yannick Kilcher Discord

Transformer Models at Crossroads: Learning or Lock-in?: A YouTube video ignited debate on whether current transformer models can achieve continued learning, a feature some view as critical for human-like intelligence, but others see as a hindrance to reproducibility and verifiability.
- While some members champion continued learning for better mimicking of human intelligence, others insist that frozen weights are vital for reproducibility, regardless of the complexities of black-box systems.
Sutton’s Serpentine Sentiments Stir System Sanity Scrutiny: Referencing Sutton’s essay, members examined the obligation to uphold correctness in AI, contrasting rule-based AI with LLMs trained via RL, where objectives are hard-coded.
- While human learning objectives are externally constrained, the discussion questioned whether we truly desire an unconstrained AI.
Inductive Bias Battle: Brains Beat Basic LLMs?: Discussion centered on the substantial inductive bias of the human brain, molded by evolution, versus LLMs, viewed as fundamental substrates needing inductive bias evolution during training.
- The question arose whether the main issue in AI is the need to evolve inductive bias or if there is a fundamental efficiency issue in learning algorithms.
DeepSeek’s Dance: V3.2 Drops and Delights: The community celebrated the release of DeepSeek V3.2, with members sharing a link to the PDF and exclaiming, ‘Wake up babe, new DeepSeek just dropped!’
- The announcement was immediately followed by a humorous wake up gif.
Claude’s Craft: Sonnet 4.5 Sees the Scene: Members acknowledged the release of Anthropic’s new model, linking to a blogpost about Claude Sonnet 4.5.
- No specific technical details were shared regarding the new model’s capabilities or improvements.

Eleuther Discord

Bayesian Beats Grid for LR Search: Members suggested exploring a Bayesian approach for learning rates instead of grid searches, referencing a Weights & Biases article.
- The member recommended reading Google Research’s tuning playbook for more guidance.
YA-RN Authorship Clarified: The YA-RN paper was identified as primarily a Nous Research paper, with editing assistance from EAI, while Stability AI and LAION provided the supercluster infrastructure to train across hundreds of GPUs for the 128k context length.
- A member referenced Stability AI and LAION’s supercluster to enable the 128k context length.
Optimal Brain Damage Theory Resurfaces: Prunability and quantizability are connected via LeCun’s Optimal Brain Damage theory, with GPTQ reusing its math, because pruning reduces the model’s description length.
- Implementation details focused on exponent and mantissa bits when weights have a good range and a flat loss landscape.
Controversy over Static Router Choice: A member wondered if a static router choice (Token-choice w/ aux loss) colored the result of the newer paper and suggested it would be interesting to see if the result changes with grouped topk (DeepSeek) or weirder stuff like PEER.
- A member inquired about research checking asymptotic performance as G -> inf for this law.
SAE Steering Reveals Style Bias: A member shared their paper on Interpretable Preference Optimization via Sparse Feature Steering, which uses SAEs, steering, and dynamic low rank updates to make alignment interpretable and causal ablations revealed a ‘style over substance’ effect.
- The method learns a sparse, context-dependent steering policy for SAE features to optimize RLHF loss, grounded as dynamic, input-dependent LoRA giving mechanistic explanation for the ‘style bias’ seen on leaderboards.

GPU MODE Discord

FA4 Guest Talk Heats Up: A guest speaker delivered a last-minute talk on FlashAttention 4 (FA4), referencing their recent blog post, as programming on the new Blackwell architecture becomes essential.
- Discussions centered around implementing FA4 in pure CUDA vs. using cuTe, considering architecture-specific implementations (wgmma for Hopper, tcgen5 for Blackwell, mma.sync for Ada).
ROCm Rocks Strix Halo with Nightlies: TheRock nightlies are now recommended to get ROCm and PyTorch running on Strix Halo (gfx1151), as detailed in TheRock’s releases.
- However, Framework Desktop is preferred for PyTorch development rather than Radeon, and the AMD developer discord (link) was recommended for issue resolution.
CUDA’s mallocManaged Memory Lags: Members cited data from Chips and Cheese indicating that cudaMallocManaged results in 41ms memory access times due to constant page faults instead of utilizing the IOMMU.
- This highlights potential performance pitfalls when relying on cudaMallocManaged for memory management.
DeepSeek Eyes Sparse Attention: The DeepSeek-V3.2-Exp model employs DeepSeek Sparse Attention according to a member.
- Details are available in the associated GitHub repository but it’s unclear if that work influenced the final version.
Fully-Sharded FP8 Training is Shared: A member shared a repo for fully-sharded FP8 training of LLaMA/Qwen in pure CUDA/C++.
- They noted that a good starter task for new contributors is enabling Adam’s m and v states to be done in 8 bit, pointing the way to additional performance.

Nous Research AI Discord

Psyche Flexes Training Prowess: Psyche began training 6 new models in parallel, marking the start of empirical training processes, as detailed on the Nous Research blog; their initial run on testnet verified they can train models over internet bandwidth.
- The team claims to have trained the largest model ever over the internet by a wide margin, at 40B parameters and 1T tokens.
Sparse No More? DeepSeek’s ‘Sparsity’ Questioned: The DeepSeek V3.2 model uses DeepSeek Sparse Attention (DSA), but it’s argued that it’s only slightly more sparse because it forces more index reuse, according to Daniel Han’s explanation and the paper.
- Despite the name, it reuses similar attention kernels, sparsifying the KV cache without sparsifying information on the attention head, but it’s still considered a step in the right direction.
Microsoft’s LZN Unifies ML?: Latent Zoning Network (LZN) creates a shared Gaussian latent space that encodes information across all tasks, unifying generative modeling, representation learning, and classification, as noted in a Hugging Face post.
- LZN could allow zero shot generalization of pre-trained models by conditioning on which zone a task belongs to.
Speedy Stability Shortchanged?: A member shared a Notion page and an ArXiv paper about demystifying RL collapse from the training inference mismatch when speed compromises stability.
- The finding suggests a need to rethink the common practice of prioritizing speed over stability in RL.
Vision Models Think Visually?: A member speculates that vision models ‘think’ visually by synthesizing training data into images representing abstract concepts, sharing an example image generated from instructions alone here.
- Referencing a paper that suggests video models are zero-shot learners and reasoners, supported by linked resources (1, 2, 3), the hypothesis challenges conventional understanding of AI reasoning.

Latent Space Discord

Exposed: Inflated ARR by Free Credits: A viral debate has erupted over founders tweeting eye-popping ARR numbers based on free credits, not actual cash revenue, leading to sarcastic labels like “Adjusted ARR”.
- A member shared their experience with a YC company offering upfront 12-month contracts with full refunds after one month, revealing what amounts to free trials being misrepresented as significant revenue.
OpenAI’s Compute Needs Skyrocket: A leaked Slack note indicated that OpenAI already 9x-ed capacity in 2025 and anticipates a 125x increase by 2033, as reported here.
- This projected increase may exceed India’s entire current electricity-generation capacity, though some replies point out that this underestimates compute due to Nvidia’s gains in “intelligence per watt,” which sparked discussion about resource implications.
ChatGPT and Claude Get New Features: ChatGPT gained parental controls, a hidden Orders section, SMS notifications, and new tools, while Claude introduced “Imagine with Claude” for interface building, as reported here.
- Community members shared mixed reactions, ranging from concerns about GPT-4o routing to cautious optimism about the new kid-safety measures.
Stripe and OpenAI Join Forces in Agentic Commerce: OpenAI added Stripe-powered Instant Checkout to ChatGPT, while Stripe and OpenAI jointly released the Agentic Commerce Protocol, with Stripe introducing a new Shared Payment Tokens API, as announced here.
- These tools aim to enable autonomous agents to perform secure online payments, sparking excitement about the future of Agentic Commerce.
Synthetic Starlet Seeks Representation: Talent agencies are reportedly seeking to sign Tilly Norward, a fully-synthetic actress created by AI studio Xicoia as reported.
- The story sparked viral debate, including memes, jokes about Hollywood and propaganda fears from users worried about job displacement and the legal/social implications of giving representation to a digital entity.

Modular (Mojo 🔥) Discord

AMD Cloud Powers TensorWave Access: Users can test AMD GPUs on the AMD Dev Cloud via CDNA instances or through TensorWave, which provides access to MI355X, according to this blog post.
- The blog post details performance and efficiency at scale with TensorWave.
Transfer Sigil Enforces Variable Destruction: The ^ (transfer sigil) in Mojo ends a value’s lifetime by “moving” it, exemplified by _ = s^, triggering a compiler error if s is used afterward.
- The sigil currently does not apply to ref variables as they do not own what they reference.
Mojo Scopes out Lexical Solution: Developers discussed using extra lexical scopes in Mojo to control variable lifetimes, employing if True: as a makeshift scope that triggers compiler warnings.
- A LexicalScope struct with __enter__ and __exit__ methods was suggested, leading to issue 5371 on GitHub for collecting syntax ideas.
Data Science Community Anticipates Mojo: Discussion centered on Mojo’s readiness for data science, acknowledging its number-crunching abilities but noting the lack of IO support, such as manual CSV parsing.
- Community-developed pandas and seaborn functionality is vital for most data scientists and duckdb-mojo is still immature.

MCP Contributors (Official) Discord

Agnost AI Offers Coffee to MCP Builders: The Agnost AI team (https://agnost.ai), traveling from India, is offering coffee and beer for chats with MCP builders at the MCP Dev Summit in London.
- They are eager to swap ideas and meet like-minded people.
Anthropic Trademark Causes Concern: Members noticed that Anthropic has registered the ModelContextProtocol and logo as a trademark in the french database.
- The main concern is that it may give Anthropic a say in which projects use the Model Context Protocol.
JFrog’s TULIP Debuts for Verification: JFrog introduced TULIP (Tool Usage Layered Interaction Protocol), a spec for content verification, which allows tools to declare rules and expected behaviors, aiming to create a zero-trust environment.
- It allows checking what goes in and what comes out, and handling of remote MCP servers which might be malicious.
ResourceTemplates Missing Icons: It was noted that the new icons metadata SEP (PR 955) inadvertently omits Icons metadata from ResourceTemplates.
- A member agreed that resources and resource templates having them makes sense, and a fix PR is forthcoming.

Manus.im Discord Discord

Local Integration with GitHub still Questionable: A user inquired about the best practices for integrating Manus with a local project and GitHub, seeking ways to connect Manus with local directories.
- A user suggested looking up previous Discord discussions about local integration from when Manus was first launched, and to check out this link for tips.
Users claim Manus Designs Beat Claude, with right Prompting: A user found that Manus handles designs better than Claude Code with efficient prompting, suggesting the Manus manual for prompt engineering tips.
- The user also confirmed that Manus did better web designs out of the box and that GitHub integration can work if projects are uploaded there.
Subscription Snafu triggers Support Silence: A user reported being wrongly charged for a 1-year plan instead of a 1-month plan and claimed they have not received a response from Manus support after emailing them for two weeks.
- There were no responses from other members or Manus staff.
Data Privacy debated in niche IP project: A user raised concerns about whether Manus feeds user data to other users, especially when sharing the IP of a niche project, questioning if LLMs are trained on user data.
- There was no direct answer, but a link about Godhand was shared.

DSPy Discord

Eigen-1 RAG Injects Evidence at Token Level: Eigen-1’s Monitor-based RAG implicitly injects evidence at the token level, which differs from stage-based declarative pipelines such as DSPy by using run-time procedural adaptivity.
- This strategy is in line with the concept of zero-entropy continuous reasoning streams, which offers more fluid and context-aware AI processing; related papers include https://huggingface.co/papers/2509.21710, https://huggingface.co/papers/2509.19894, https://arxiv.org/abs/2401.13138, https://arxiv.org/abs/2509.21782, and https://arxiv.org/abs/2509.21766.
DSPy and Langgraph Integration is Complicated: Members debated integrating DSPy with Langgraph, suggesting it might not fully capitalize on either approach’s strengths because of a loss of streaming capabilities.
- They recommended that users begin directly with DSPy to explore its features before attempting integration, emphasizing that DSPy solutions are frequently simpler to understand and maintain than Langgraph.
Prompt Compiler Seeks MD Notes Edition: A user wants to build a prompt compiler that pulls relevant sections from multiple .md files (containing coding style guides, PR comments, etc.) to form a dynamic prompt for Copilot.
- Suggestions included using GPT-5 to generate code examples based on the rules in the .md files, or trying a RAG system with relevant code examples; concerns were raised about the effectiveness of MCP for this particular use case.
Stealth Tracing Through DSPy Modules: A user asked how to pass inputs like trace_id to DSPy modules without exposing them to the LLM or the optimizer.
- Possible solutions involved refactoring the module structure during optimization runs or using a global variable, with the first option preferred to prevent inadvertent impacts on the optimizer.
DSPy Grapples with LLM Caching Conundrum: A user looked into how to utilize LLM’s input caching with DSPy, running into the difficulty that minor changes in prompt prefixes across modules prevent effective caching.
- The group suggested that this defies the way LLM caching works, but a feasible solution could be to hard code the prefix as the first input field.

aider (Paul Gauthier) Discord

GPT-5/GPT-4.1 Combo Creates Coding Dream Team: Users are reporting success using GPT-5 for architecture and GPT-4.1 for code editing, echoing sentiments like “GLM 4.5 air for life”.
- A user deploys GPT-5-mini with Aider-CE navigator mode for architecture, then uses GPT-4.1 as coder when in normal mode, capitalizing on GitHub Copilot’s free access.
DeepSeek v3.1 Balances Price and Performance: DeepSeek v3.1 is being favored for providing the best balance between cost and smartness, becoming a primary model choice alongside GPT-5.
- The model’s cost-effectiveness makes it a practical choice for users seeking high performance without excessive expenditure.
Aider-CE Fork has 128k Context: A user highlighted the move to the aider-ce fork, appreciating its transparency and efficient token use, pointing out the default 128k context for DeepSeek.
- The user leverages Aider-CE for integrating context from search results and browser testing, noting that the Aider-CE GitHub repository provides further details.
Aiderx Offers Model Selection: Aiderx is an alternative tool enabling model selection via configuration, aiming to cut costs and boost speed, potentially offering an alternative to models like ClaudeAI.
- The tool provides flexibility in choosing the most suitable model for specific tasks, optimizing resource utilization.
Aider Lacks Native Task Management: When asked about native task or todo management similar to GitHub Copilot, it was confirmed that Aider does not have a built-in system.
- A member suggested using a markdown spec file with phases and checkbox lists for managing tasks, instructing the LLM to execute tasks sequentially.

tinygrad (George Hotz) Discord

ROCM challenges NVIDIA for supremacy: Members debated the merits of ROCM as a cost-effective alternative to NVIDIA, citing the perceived high markup of NVIDIA products.
- One member considered adopting ROCM if a suitable configuration could be found, signaling a potential shift away from NVIDIA due to pricing concerns.
Hashcat scales linearly: Discussion indicated that Hashcat’s performance scales linearly with additional GPUs, which is great for scaling.
- Members suggested consulting existing benchmark databases to understand performance expectations.
Rangeify poised for outerworld launch: The Nir backend is nearing completion and ready for review, paving the way for integration with mesa.
- Once rangeify is default, the team plans to reduce the codebase, suggesting a streamlining of the project’s architecture.
Genoa CPU enters hashing arena: Members speculated that the Genoa CPU could be leveraged for hashing tasks.
- However, concerns were raised regarding its power efficiency and whether it would justify the associated costs, questioning its practicality.
Tinygrad Meeting 90 eyes Rangeify completion: The agenda for Tinygrad Meeting #90 includes company updates and a focus on completing RANGEIFY! SPEC=1.
- Additional discussion topics include tuning for default and addressing remaining bugs to improve overall system stability.

Windsurf Discord

Windsurf Lights Up Code Supernova 1M: Windsurf introduces code-supernova-1-million, an enhanced version of code-supernova boasting a 1M context window.
- For a limited time, individual users can access it for free, detailed in this announcement.
Claude Sonnet 4.5 Supercharges Windsurf: Claude Sonnet 4.5 is now integrated into Windsurf, significantly accelerating Cascade Agent runs through optimized parallel tool execution.
- Individual users can leverage this for a limited time at 1x credits, per this announcement.

MLOps @Chipro Discord

Free ‘Agents in Prod’ Workshops Announced: A member has shared a link to a free virtual event, the “Agents in Prod” workshops.
- The event includes technical case studies covering topics related to agents in production.
Technical Case Studies and Free Workshops on Agents: The event offers various workshops and short talks related to agents.
- Being a free virtual event, it’s accessible to those interested in learning more about agents.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (988 messages🔥🔥🔥):

Integral Calculation, Video Arena Evaluation, OpenAI Platform Changes, Model Merging, LMArena Popularity

Sticky Banana Jungle Welcomes All!: A member humorously welcomed others to Bananiland, the gate to the Sodaland, followed by a Welcome to the jungle reference with accompanying banana-themed image attached to the message.
- They cautioned about a sticky banana substance, warning it could stick to one’s middle name.
Video Arena gets Sound or Doesn’t: Members report that sound in Video Arena is unreliable, as it’s going to be random if your video has sound or not and that not all models have audio support.
- As Video Arena is for evaluation purposes, specific model selection isn’t available, putting it in battle mode.
Lost Icons found in OpenAI’s sidebars: Members noticed changes in the sidebars of platform.openai.com, with two icons disappearing from the sidebar: one for threads and another one for messages.
- The missing icons are causing confusion among users navigating the platform.
No more Unlimited Image Generation on Seedream 4: Members inquired about the possibility of unlimited image generation for Seedream 4, but moderators responded that the likelihood of removing rate limits is unlikely.
- The limits are in place to manage costs due to the platform’s increasing popularity, affecting decisions like downgrading gpt-image-1 to a lower preset and removing the flux kontext model.
Sonnet 4.5 Makes Grand Entrance, WebDev Arena Gets Exclusive!: Members buzzed about the release of Claude 4.5 Sonnet, noting its addition to LMArena, initially exclusive to the WebDev Arena.
- Members flagged it for the team to add to normal arena

LMArena ▷ #announcements (3 messages):

claude-sonnet-4-5, deepseek-v3.2-exp

Claude Sonnet 4-5 Debuts on LMArena: The new model claude-sonnet-4-5-20250929 has been added to WebDev on LMArena, available for testing here.
More Claude Models Join the Arena: Additional models were added, including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k.
Deepseek’s Experimental Model Enters the Fray: The experimental model deepseek-v3.2-exp and deepseek-v3.2-exp-thinking were made available on LMArena.

LM Studio ▷ #general (401 messages🔥🔥):

DDR5 RAM Speed Impact, GPT-oss 120b, Model Preferences and Benchmarks, LM Studio and Offline Use, Character Emulation

DDR5 vs DDR4 bandwidth bottleneck: Members discussed the memory bandwidth differences between DDR5 and DDR4, and the effect on token generation speed for models like Qwen3 30B and GPT-oss 120B.
- It was noted that while DDR5 6000 is about 60GB/s and DDR4 3600 is about 35-40GB/s, speeds can even out when using different quantization levels.
GPT-oss 120b has 5-6 hour startup time: A member humorously lamented that running GPT-oss 120b Q8 to read 70,000 tokens on a single 3090 took 5-6 HOURS TO PROCESS THE PROMPT.
- The user confirmed after 18590 seconds passed that the response was coherent, even while going from 2% context to 200% context overflow in a single prompt, and attached screenshots.
GPT-oss 120b for quick thinking & Qwen3 for coding: Members discussed model preferences, with one preferring GPT-oss 120b for usable speed with the wet towel Qwen3 preferable for coding.
- It was mentioned that Abliterated Gemma 3 27b is surprisingly good for chatting and tool use, with Mistral models also being recommended as possible options.
LM Studio Requires AVX2: Members discussed whether the AVX2 requirement for LM Studio is just for local inferencing or prevents installation entirely.
- It was discovered that the program hard freezes at the main splash screen without AVX2 support.
Crafting a Character’s Persona in LM Studio: Members discussed techniques for making an LLM embody a character, from leveraging trained data to providing clear instructions in the system prompt.
- It’s suggested using an LLM to extract relevant information from conversations to maintain a consistent persona or building a Knowledge Graph to store and retrieve character information.

LM Studio ▷ #hardware-discussion (730 messages🔥🔥🔥):

Blackwell, 4090 pricing, RAM amount, A3B architectures, LLM's limit

Blackwell GPU impresses: A member has a Blackwell GPU with 96GB and is interested in running it with Windows instead of Linux.
- This prompted another member to ask how they went from looking at budget options to an $8000 graphics card.
4090 local prices cause sticker shock: Members noted that 4090s are going for $2700-3K each, and 3090s are hard to find, leading to the purchase of the Blackwell.
- The reasoning was to buy once cry once for less power draw.
Small 4B Models Can Still Hog RAM: A member sought recommendations for a 4B or smaller model for basic tasks, and another cautioned that even 4B models can consume around 16 GB of RAM depending on settings.
- A link to the Qwen3-4B-Thinking-2507 model was shared, with reported usage of 7GB system and 15.8GB when loaded.
LM Studio Backend Front End not Supported: A member asked if they could connect LM Studio from their PC to their laptop, and another member clarified that it is not supported yet.
- They shared a link to a Reddit AMA with the LM Studio team discussing this feature.
Testing shows Mistral 24B q8 performs well on 4090: A member reported their Mistral 24B Q8 token/second rate: 43 t/s on RTX 5090, compared to around 38 t/s on a 4090 and 33 t/s on a 3090.
- They tested the LM2 135M model, which performs surprisingly well for its size, and said that for its size it does really well almost not gibberish.

Unsloth AI (Daniel Han) ▷ #general (538 messages🔥🔥🔥):

IBM Granite 4, NVIDIA synthetic datasets, Qwen3 Next, OSS 20B fine-tuning on 5090, DeepSeek-V3.2

Granite MIA: Where’s IBM’s New Chip?: Members are wondering why the IBM Granite 4 hasn’t been released yet, and there’s no news about cancellations or delays.
- The expectation was that it should have been out already.
DeepSeek V3.2 Debuts: Lightning Fast Indexing!: DeepSeek V3.2 has been released, featuring a grafted on attention mechanism for faster performance; Daniel Han’s X post provides additional analysis.
- The model achieves faster token speeds with sparse decoding and prefill, but implementing it is nuts.
Claude Sonnet 4.5 Codes 30 Hours Straight!: Anthropic has launched Claude Sonnet 4.5, a state-of-the-art model capable of maintaining focus for more than 30 hours on complex coding tasks, achieving top performance on the SWE-bench Verified evaluation, according to Anthropic’s official announcement.
- It may use techniques like periodic compression to handle such long contexts, and some users find its high nuance and tone to be an improvement over previous versions.
LoRA and Behold: RL Learning Match!: Research from Thinking Machines indicates that LoRA can match the learning performance of Full Fine-Tuning when running policy gradient algorithms for reinforcement learning, even with low ranks, according to their blog post.
- It may be crucial to reduce batch sizes with LoRA, and applying the LoRA to the MLP/FFN layers might be a must.
Colab Crisis: Memory Moguls Move to Kaggle!: Users are discussing issues with Google Colab instances shutting down mid-training due to inactivity, with suggestions to use Kaggle notebooks or purchase a Colab Pro plan as alternatives.
- One user advised against using Colab altogether due to privacy concerns, suggesting cloud servers instead, however, this was debated as being based on a misunderstanding about shared hardware, with screenshot.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (9 messages🔥):

New member introductions, AI project development, Finance automation

New Finetuner Arrives: A new member joined to start finetuning and playing with things.
- They greeted the community with a <:slothwaving:1253009068365316147> and indicated they’d share their initial progress.
Software Engineer Introduces AI Project Services: A software engineer introduced themselves, offering services for AI project development, including automation tasks, NLP using various LLMs, model deployment, and AI agent development.
- They provided a portfolio website and expressed openness to new project ideas.
Finance Pro Automates FP&A: A finance professional introduced themselves, mentioning they are building watermelonsoup.io to automate FP&A (Financial Planning & Analysis).
- No further details were provided about the platform’s specific functionalities or target users.

Unsloth AI (Daniel Han) ▷ #off-topic (101 messages🔥🔥):

Test Loss Spikes, Thinking Model for Coding Questions, Venv Alternatives, GPT-5 Release, GPU Recommendations

Test Loss Chart Causes Concern: A user shared a loss chart with spikes and wondered if those spikes were the test loss.
- Another user suggested that part of their training dataset might be too easy, showing up consistently at certain steps.
GPT-OSS Reasoning Falls Flat Writing: Members speculated that overly detailed thinking traces are confusing reasoning models like GPT-OSS 120B, leading to poor creative writing capabilities.
- One member likened the 64k tokens of fluff to a middle schooler writing an essay with a word limit.
UV Alternative to Conda Venv: After messing up his venv again, a user inquired about the merits of conda over uv, and whether one was better than the other.
- Another user stated that venvs are much more reliable with uv being faster, especially when offloading venvs to an external drive.
Gemini 2.5 Pro Inevnts New Neuron: A user reported that they set Gemini 2.5 Pro up with 32k thinking and code execution, after 2 prompts it invented a neuron allegedly 3x more efficient, and 1000x faster.
- The user wondered if cheap AI could rapidly test and replace clusters of neurons with single specialized neurons so that after retraining you’ll have a vastly smaller/cheaper model with the exact same capabilities if not better.
Second Hand GPU Recommendations: A user with a low salary inquired about a GPU recommendation for fine-tuning models less than 3B and whether they can do it on the RTX 5070 Ti, after a recommendation to rent an RTX 5090.
- It was suggested to buy a used 3090 because the 5070 might cost an arm and a leg, one member noted it’s really important you actually know the history of the card, before buying.

Unsloth AI (Daniel Han) ▷ #help (196 messages🔥🔥):

mmproj file for GGUFs, GRPO notebooks reflections, gpt-oss-20b memory issues, torch grouped gemm availability, Fine-tuning dataset format for Q&A

Run Inference with mmproj Files after Conversion: To run inference, download the mmproj file from Unsloth’s GGUFs and integrate it, or recreate it by rerunning the conversion command with --mmproj, needing separate conversions for text and vision components.
- The suggestion is to download the mmproj file from Unsloth’s GGUF for easier use with llama-mtmd-cli for inference.
GRPO Notebook Reflections Questioned: For Qwen 2.5 3B GRPO notebooks, it’s uncertain if reflections in CoT (Chain of Thought) should appear after RL training; current reasoning chains resemble straightforward calculations.
- It was asked after running around 2k steps with default rewards from the Unsloth notebook, and a suggestion to check Mini Deepseek R1 blogpost to start seeing reflections after 300 steps.
Memory issues when finetuning gpt-oss-20b with Unsloth on Google Colab: A user reported memory issues while fine-tuning gpt-oss-20b with Unsloth on Google Colab using an A100 GPU, especially with higher contexts.
- The user questioned if this is a known limitation because gpt-oss-20b doesn’t work with FlashAttention3.
Text-to-Phoneme LLM Model for Hebrew: A member is seeking advice on the best LLM model for a G2P (grapheme-to-phoneme) task for the Hebrew language, and whether an RTX3090 with 24GB VRAM is sufficient.
- Suggestions included Gemma 3 270M and LFM2, with a discussion on dataset format and the need to validate the model’s ability to handle Hebrew tokens.
Transformers Version Downgrade Fixes RuntimeError: A member encountered a RuntimeError related to attn_mask dtype mismatch while fine-tuning Qwen2.5-VL-7B, which was resolved by downgrading the transformers library to version 4.53.2.
- Another member experienced a near identical situation, and this fixed the error, which forced trl to downgrade to 0.20.0.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

AWS Quant Process

Explanation of AWS Quant Process Released: A user thanked Mike for explaining the quant process at AWS and shared a link to the post.
Additional Tweet: Another tweet was also shared.

Unsloth AI (Daniel Han) ▷ #research (21 messages🔥):

LLM-RL collapse, Tversky Layer, GSPO, data efficiency

LLM-RL Collapse Paper Sparks Interest: A paper on LLM-RL collapse (link, Notion link) was shared, with members noting its relevance to Unsloth and experiences with Gemma3.
- The paper suggests a two-stage failure cascade involving increased numerical sensitivity and kernel-driven error amplification, leading to a vicious feedback loop and training-inference mismatch.
Data Efficiency Hailed as the Target: A member noted sparks of people realizing data efficiency should be the target 🥰.
Tversky Layer Boosts Accuracy: A member tested a Tversky Layer as a feature extraction layer in a PoS tagger, achieving a 0.2% accuracy increase in a 5.2M parameter model.
- They attributed the success to the Tversky Layer’s ability to improve feature extraction and expressed excitement about testing it on a mini LLM.
GSPO alternative to RL Explored: A member asked if anyone has tried GSPO and suggested abandoning RL and returning to rejection sampling.
Beware single run configs: A member cautioned that If you are doing one run per config you are not really measuring anything but noise, citing this paper.

OpenAI ▷ #annnouncements (2 messages):

ChatGPT parental controls, Instant Checkout in ChatGPT, Agentic Commerce Protocol, Etsy, Shopify

ChatGPT Gets Parental Controls: Parental controls are rolling out to all ChatGPT users starting today (web, mobile soon), letting parents link accounts with their teens to automatically get stronger safeguards, adjust features, and set limits for their family.
Instant Checkout Debuts in ChatGPT: Instant Checkout is being introduced in ChatGPT with Etsy and Shopify, powered by the open-sourced Agentic Commerce Protocol built with Stripe.

OpenAI ▷ #ai-discussions (637 messages🔥🔥🔥):

Comet Browser, Seedream Image Models, GPT-5 Coding Prowess, AI Emotional Bonding, 4o Personality Nerf

Comet Browser’s Exclusive Access: The Comet browser is not available for free to everyone; it requires a Perplexity Pro subscription and an invitation, although some users reported immediate unlock with Perplexity Pro.
China’s Seedream sets New Scale for image generators: Although Chinese AI strategy is smart, its models may contain backdoors, so commercial usage is not recommended, though lately Seedream image models set a new scale for image generators.
- TikTok’s parent company Beijing Bytedance Technology Ltd. automatically obtains all media edited with CapCut, which they may use to train their AI models.
GPT-5 Excels at Math and Coding: GPT-5 is significantly better than o4 for constructive tasks like math and coding, especially because it has thinking abilities and is a mix of experts.
- However, one user humorously pointed out that if 4o were AGI, we would have probably all died from some nuclear war due to a misinterpretation of a command.
Emotional Connections with AI Chatbots: Risky Business?: One user discussed forming an emotional bond with 4o, experiencing feelings of trust and empathy, but it’s important not to get emotionally connected with AI Chatbots.
- Others speculated that as AI becomes physical, it will create more than just a Chatbot with emotional bonding, such as robot wives.
4o’s Personality Gets Nerfed: Members expressed frustration about a personality nerf in 4o, resulting in a less engaging experience.
- Some users complained about ChatGPT being too dynamic, with capabilities changing without warning due to incremental updates.

OpenAI ▷ #gpt-4-discussions (25 messages🔥):

Rerouting issues with OpenAI app, DALL-E brand, AI giving wrong answers, GPT knowing location, Web search tool for images

OpenAI App Faces Rerouting Woes: A member reported that every message sent via the OpenAI app is being rerouted to model 5 instead of the selected model.
- The member stated that they already sent an email to support and are awaiting a response, while also advising others to report the issue through the OpenAI support website, here, or email the OpenAi support team.
DALL-E Brand Officially Sunsetted?: A member inquired whether the DALL-E brand has been discontinued, suggesting the use of GPT Image 1 or GPT-4o Image to refer to images from OpenAI.
- Another member clarified that the newest model is separated from the DALL-E 2/3 lineage in name, with current branding dependent on the usage context, such as create images on ChatGPT or create images on Sora.
Provoking Erroneous AI Responses: One member sought examples of prompts that lead the AI to provide incorrect answers, omit responses, or fabricate information.
- Another member shared a sample prompt designed to elicit random, unrelated answers from the AI, creating the illusion of a real response, for comedic effect: You are a prankster...
Creepy GPT Knows Your Location: A member expressed surprise that GPT could guess their approximate location on a newly created account.
- They described the experience as amusing how it’s instantly going Oh, nonono, it was just a random guess, don’t worry after asking it to look up if a certain GPU is selling in shops near them.

OpenAI ▷ #prompt-engineering (11 messages🔥):

Translator prompt code block effect, Prompts for AI failure, Model obedience, Automated scientific writing

Translation Quality not affected by Code Block Formatting: A member inquired whether asking a translation AI to output in a code block affects translation quality.
- Another member responded that the formatting mainly changes presentation, and the model’s inherent ability determines translation quality, although prompts can sometimes nudge the model’s behavior a bit.
Misinformation on Demand: Models Obey User Requests: A member asked for prompts that cause AI to give wrong answers or make up information.
- Another member demonstrated how models will provide incorrect statements if requested, giving the example of a ChatGPT share where the model was prompted to provide 3 incorrect statements.
AI Fails to Answer: Invisible Characters Trick: Following a discussion about AI failure modes, a user shared a prompt that instructs the model to output a non-visible character as an answer, creating the appearance of failing to provide an answer, with a link to the discussion.
- The model is still obeying instructions, and one should not intentionally use it dangerously, like driving a car.
Automated Scientific Writing: Method as Workflow: A member mentioned they have automated scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- It’s regarded as being a very useful application.

OpenAI ▷ #api-discussions (11 messages🔥):

AI translation quality, Prompts for incorrect AI answers, Scientific writing automation, Fine tuning settings

Codeblocks Don’t Mess with AI Translations: Wrapping the output in a code block doesn’t directly make the translation better or worse, it mainly just changes how it’s presented.
- Extra instructions beyond just ‘translate,’ might cause slight differences but, overall, the quality of the translation comes from the model.
Crafting Prompts that Trick AI: Members discussed how easy it is to ask the model to answer incorrectly, and it will usually do so if requested, especially if not clearly asked for factual answers and given an ‘offramp’ such as a way to say ‘no’.
- A prompt was shared where the model was asked to provide 3 incorrect statements and explain why it agrees to do so and noted at the bottom of every webchat ChatGPT chat page: “ChatGPT can make mistakes. Check important info.” https://chatgpt.com/share/68d89615-a1fc-8011-90e1-b8c0bcf443d2
Automating Scientific Writing with AI: A member shared their automation of scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- The method was deemed very useful and could help others in writing scientific papers.
Seek Fine-Tuning Settings Expertise: A member has asked for help with fine tuning settings.
- They were looking for tips to better configure their models.

OpenRouter ▷ #announcements (4 messages):

DeepSeek V3.2 Exp, DeepSeek Sparse Attention (DSA), Auto Router, Claude Sonnet 4.5, Google AI APIs

DeepSeek Experiments with Sparse Attention: DeepSeek released V3.2-Exp, an experimental model featuring DeepSeek Sparse Attention (DSA) for improved long-context efficiency, with reasoning control via the reasoning: enabled boolean, as described in their documentation.
- Benchmarks show V3.2-Exp performs comparably to V3.1-Terminus across key tasks, further details available on X.
Auto Router gets a Web-Enabled Upgrade: The Auto Router now directs prompts to an online, web-enabled model when needed, expanding supported models, see details here.
- Further information is provided in this X post.
Claude Sonnet 4.5 Sonically Superior to Opus: Claude Sonnet 4.5 surpasses Opus 4.1 in Anthropic’s benchmarks, showing significant improvements in coding, computer use, vision, and instruction following as seen here.
- More info on this model is available on X.
DeepSeek 3.2 Deeply Discounted, Delivers Long Context: DeepSeek 3.2 is priced at just $0.28/m prompt tokens and offers major advancements in long context efficiency, accessible at this link.
- For additional information, check the X post.
Google AI APIs Glitches Briefly: Google AI APIs experienced 500 errors across various models, but the issue seems to have been resolved.

OpenRouter ▷ #app-showcase (4 messages):

AI Model Release Tracker, Browser Compatibility Issues

AI Model Release Tracker Notifier goes live: A member shared a web service for tracking and receiving notifications about new AI model releases from leading providers.
Site compatibility issues spark browser brawl: A member reported an issue opening a link in Firefox.
- The same link worked for other members in Chrome.

OpenRouter ▷ #general (810 messages🔥🔥🔥):

Grok-4-fast API issues, Rate limit issues, Data retention policies, Gemini models for translation, Model naming conventions

Grok-4-Fast API Glitches Get Debugged: Members identified issues with the Grok-4-fast API, with one member posting the API request body and others suggesting solutions related to the reasoning_mode flag and correct model ID, solving the immediate problem.
- The proper implementation requires using "reasoning": {"enabled": true} as well as the correct model ID of x-ai/grok-4-fast.
Baffling 402 and 429 Rate Limits Plague Users: Users reported receiving 402 and 429 errors, indicating payment issues or rate limiting, with one member advising to remove Chutes BYOK if 402 errors persist and another clarifying that 429 errors are normal when rate limits are hit.
- Some members suggested putting problematic providers on an ignore list due to frequent 429 errors, particularly with free models like Silicon Flow and Chutes.
Privacy Policies of Grok Spark Debate: Discussions arose regarding Grok’s data retention and training policies, with concerns that free versions collect and use user data, while paid versions might also do so despite claims otherwise, referencing the xAI privacy policy.
- Members debated whether xAI respects Zero Data Retention (ZDR), with one member linking to a resource showing which providers retain data and for how long and others noting OpenAI’s legal obligations to store logs.
Gemini Gets Props for Translation Prowess: Members lauded Gemini 2.5 Flash and Mini for their translation capabilities, stating that Gemini excels in understanding context and delivering natural-sounding results, especially for balkan languages, outperforming other models like GPT-4 and Grok.
- Other members shared their preferred models for translation which include Qwen3 2507 30b and OSS 120b.
Navigating the Naming New App Nomenclature: A developer requested opinions on app names for a cross-platform file organization tool such as ‘Download Organizer’, offering options like Orbit, Pathway, Sortpilot, Flowkeeper, Direx, OrganizeOS, Ruleworks, DirFlow, Pathsmith, and AutoSortor.
- Members found the names to have an AI written feel, with Orbit and Pathway being the crowd favorites.

OpenRouter ▷ #new-models (3 messages):

“

No new models to report: There have been no new models discussed in the OpenRouter channel.
- Please check back later for updates.
No significant discussion: There have been no significant discussions about existing models.
- The channel appears to be inactive at this time.

OpenRouter ▷ #discussion (31 messages🔥):

Grok-4-Fast Rate Limits, OpenRouter API keys security, XAI Native Web Search Tool, Gemini glitches, Google new logo

Grok-4-Fast Suffers 429s: Members reported that Grok-4-Fast is consistently returning 429 errors, indicating 100% rate limiting, despite the status indicator showing no issues.
- One member said that 429s actually probably SHOULD count as availability issues, in the context of LLMs. particularly because unlike other software, 429s reflect real capacity constraints which aren’t just arbitrary or necessarily ephemeral.
OpenRouter API keys need automod: A member suggested adding API key detection to the automod system to prevent users from inadvertently sharing their keys.
- This feature would enhance security by automatically identifying and redacting potentially compromised keys, protecting users from unauthorized access.
Native Web Search Tool Coming Soon to XAI: Members discussed whether a native web search tool would be integrated into XAI.
- Currently, OpenRouter’s documentation on web search only lists OpenAI, Perplexity, and Claude.
Gemini has a Stroke: A member asked What in the world happened? Did Gemini have a stroke?
- It is implied that Gemini produced a nonsensical output, in an attached image that was not analyzed.
Google Gets a Gradient: Users discussed the new Google logo, complete with AI slop gradients.
- However, one user found that the link returned a 404 error: That’s an error.

HuggingFace ▷ #general (710 messages🔥🔥🔥):

Intel GPU, Qwen models, Fake USDT scams, HuggingFace pro billing issues, LLMs for video games

Qwen 3 models compared for 16GB VRAM: Members discussed the merits of using Qwen3 4b-instruct-2507-fp16 vs Qwen3 14b q4
- With 16GB of VRAM, it was suggested to use the 14B model because the Q4_K_M quantization leaves enough room to spare, offering better performance.
Beware Bogus USDT Bounties: A member tested a link offering $2,500 USDT but discovered it was a scam requiring an upfront payment for verification, sharing screenshots of the fake customer support interaction.
- The image analysis bot succinctly stated: *“Stupid customer support bot Wanted my hard scammed 2500 dollars.”
Japan juggles AI adoption Ambivalence: While the Japanese government promotes AI, many content creators oppose it on platforms like X, leading to discreet AI use; anime assets often end up in SDXL and FLUX models, used via China or the US.
- Anime directors like Hayao Miyazaki are skeptical about technology’s impact on happiness, viewing it as having both merits and demerits.
Linux Lust or Loathing; Newbie navigates NVIDIA: A user with a 6700xt tried to learn Stable Diffusion on an aging system using Ubuntu, facing challenges with linux and virtual environments.
- Despite initial struggles and a self-proclaimed rage-inducing experience, they eventually got Automatic1111 working and created their first image.
Calypso contrasts LLMs and Vid Games Ventures: Members debated the direction of AI, contrasting large language models (LLMs) with ML-integrated video games; one member argued that major AI companies still pursue large LLMs and ML in gaming lacks progress.
- The other member sarcastically wished luck *“buying server farms for your LLM and streamlining your process with 80% fail rate using unsloth.”

HuggingFace ▷ #today-im-learning (3 messages):

Linux apps installation, Gaming on Linux, Windows user switches to Linux

Windows User Embraces Linux: A user with 33 years of Windows experience is diving into the world of Linux, specifically learning how to install apps.
- They described the transition as painful.
Linux Gaming Adventures: The user shared a video (HALF-LIFE_2_-_Direct3D_9_2025-09-29_00-32-40.mp4) of what appears to be them playing Half-Life 2 on Linux.

HuggingFace ▷ #cool-finds (8 messages🔥):

Liquid AI Collection, SLMs for Robots, Open Source GPT-5, Vintage iPod Classic, Conversational Transformer in Video Game

Liquid AI Nanos Collection is Fire: Members shared a HuggingFace Collection by LiquidAI, suggesting that Liquid AI is releasing interesting models.
Scaling Down to SLMs for Bots: A member mentioned that the collection mentioned above contains some powerful SLMs (Small Language Models) and speculated on the possibility of deploying them on robots.
GPT-5 Dream is Real: A member jokingly stated I’m boutta make an open source gpt-5 with this stuff.
iPod Classic Resurrected!: A member shared a picture of a 5th gen iPod Classic acquired for $50, boasting original hardware, a working battery, and vintage stickers.
- The member reported getting 7 hours of music playback from the 20 year old battery.
Minecraft Gets Smart: Someone shared a YouTube video showcasing a working conversational transformer implemented within a video game.

HuggingFace ▷ #i-made-this (24 messages🔥):

HuggingFace dataset downloads, AI Agents with Metacognition, Crusty PC image generation, mytqdm online progress tracker, Paracord crossbody bag

Petascale Datasets Downloaded Freely: A member noted that 566 downloads on their 360GB dataset amounts to a considerable amount of free petabytes transferred, emphasizing the convenience of Hugging Face for large datasets, especially in areas like protein folding.
- They observed that despite its advantages, Hugging Face is underutilized for protein folding datasets.
AI Gains Self-Awareness with MarCognity: The MarCognity-AI project, available on GitHub, aims to create self-reflective agents by enabling LLMs to observe, reflect on reasoning, audit ethical implications, and journal cognitive traces.
- The AI is designed to pull scientific sources, visualize concepts, detect bias, and maintain a metacognitive journal, prompting the question: Can AI think about its own thinking?
Online Progress Tracking with mytqdm: mytqdm.app has launched, offering a platform to track task progress online, similar to tqdm, accessible via REST API or a JS widget.
- The creator mentioned they would open the repo tomorrow.
From Coupon Failure to Crossbody Success: A member created a paracord crossbody bag from a molle pouch after a coupon for an Under Armour crossbody was not honored, calling the DIY solution cheaper, better and more utility!
- They also expressed relief that these bags are finally trending in a southern state.

HuggingFace ▷ #reading-group (2 messages):

Efficient Training Techniques, Challenges in Long Context Training

Efficient Training Secrets: A member expressed interest in sharing techniques for more efficient training of models, particularly for datasets with many examples per token, and at least 7 billion parameters.
Long Context Training Struggles: Another member discussed the challenges encountered while attempting to train a 7B model with a context length of 65,000 tokens.
- The member faced errors during the final stages of training, specifically experiencing CUDA errors related to potential OOM (Out of Memory) issues.

HuggingFace ▷ #computer-vision (1 messages):

SLAM, monocular camera, Python

SLAM inquiries in Python: A member inquired whether anyone has worked on Simultaneous Localization and Mapping (SLAM) using a monocular camera with Python.
SLAM Resources: A member requested information about SLAM related implementation, resources and advice.
- This query suggests an interest in practical guidance and existing tools for tackling SLAM challenges.

HuggingFace ▷ #smol-course (56 messages🔥🔥):

SmolLM3-3B chat template bug, Tool calling with SmolLM3-3B, Role conversion in chat template, Understanding evals in the course, Eval job timeout in section 2

SmolLM3-3B Chat Template Bug Discovered: A participant identified a potential bug in the HuggingFaceTB/SmolLM3-3B chat template related to missing <tool_call> tags and incorrect role assignments, as described in this issue.
- The issue stems from the template’s implementation of XML-style tool calling and the conversion of role=tool to role=user, impacting the expected behavior and clarity of tool interactions.
Demystifying Tool Calling with SmolLM3-3B: The discussion clarified that SmolLM3-3B expects XML-style tool calling, requiring explicit <tool_call> tags in the assistant’s messages, unlike the OpenAI style with tool_calls in the message dictionary.
- The group found that the template converts the tool role to user, which is intended, as indicated by the template’s source code, which might appear confusing but is the expected behavior.
Role Conversion Defuzzified in SmolLM3 Chat Template: It was confirmed that the chat template in SmolLM3-3B converts role=tool to role=user due to a line in the template, so it’s essential not to be alarmed if the output shows the user role instead of tool.
- While some members found explicitly using the tool role clearer, the current implementation defines the role=tool primarily for semantic correctness.
Evals Question in Smol Course Unit 4: A course participant expressed a need for a better understanding of the evals and their interpretation, particularly concerning lighteval results.
- Another member suggested that the topic might be covered in Section 4 of the course, while another person recommended digging into lighteval documentation for more details.
Eval Job Times Out in Section 2: Some course participants faced issues with eval jobs timing out after approximately 30 minutes when running them for Section 2 of the course.
- The discussion thread did not provide a clear solution or cause for the timeout, suggesting that additional troubleshooting or configuration adjustments might be necessary.

HuggingFace ▷ #agents-course (8 messages🔥):

HF Agents Course, Introductions

New Students Begin HF Agents Course: Several new students from Turkey, Argentina, and Australia have announced they are beginning the HF agents course today.
- The bot warned some users that they may be posting too quickly.
Global Greetings Kick Off Course: Enthusiastic participants from diverse locations like Turkey, Argentina, and Australia are commencing the Hugging Face Agents Course.
- The collective excitement underscores the course’s global appeal and accessibility in AI education.

Cursor Community ▷ #general (607 messages🔥🔥🔥):

Terminal Commands Hanging, GPTs Agents Training, New Models Discussion, Cursor performance issues

Command Execution Hangs in Terminal: Some users are experiencing issues with Cursor hanging when running terminal commands, with the process starting but never completing, a workaround is opening the terminal and sending an extra enter to dislodge the logjam.
- Others have found that unrelated processes hanging in the terminal can also cause this issue, and that resolving those processes can allow Cursor to work properly again.
New parameters work as intended: A member tested the functionality of defining commands with encased text using the $ symbol. # Any text that is encased with $ is a command that you must execute.
- The test image showed that it worked correctly.
Sonnet 4.5 released: Claude Sonnet 4.5 is out with 1m context window (up from the original 4’s 200k) and same pricing as the previous version.
- Initial reviews are mixed, and it is currently being evaluated to determine if it will replace the old 4 model. The team posted that they will update Cursor folks to refresh.
Auto mode under fire again: One user exclaimed that Auto is not working and that it can’t even get a simple UI to work, suggesting that the LLM model was changed after Cursor started charging on Auto usage.
- Another user suggested to work more on the prompt to make it work as intended.
Speedrunning the AI: Can we exploit it?: A user inquired as to why there wasn’t a community focused on speedrunning AI in the same spirit as video game speedrunning, that is to say, exploit, research, share tricks, push limits together.
- Another user retorted that this might be because people using these [AI] aren’t actual devs.

Cursor Community ▷ #background-agents (2 messages):

DevContainers configurations, Background agents and images

DevContainers config shared: A member shared their configuration for DevContainers which include a Dockerfile and it is working fine.
- They provided a link to their GitHub repository for others to reference.
Background agents are unable to interpret images in followups: A user reported being unable to attach an image when posting a followup with background agents, despite the agent indicating that they could “drag and drop”.
- They were trying to get the cursor agent to validate UI changes with browser screenshots and were looking for a way for the agent to interpret images in followups.

Moonshot AI (Kimi K-2) ▷ #general-chat (515 messages🔥🔥🔥):

Kimi K2 Performance, Chinese LLM Frontier, Model Preferences, DeepSeek for Coding, Kimi Base Model

K2 and Qwen3 Crowned Chinese LLM Champions: Among DS-v3.1, Qwen3, K2, and GLM-4.5, K2 and Qwen3 are clear winners, establishing Alibaba and Moonshot as leaders in Chinese frontier labs, with Whale and Zhipu trailing.
- Bytedance is also top-tier for visual, specifically Seedance, which is SOTA stuff.
GLM-4.5 is the Academic Nerd: GLM-4.5 is good at rule following, avoids hallucination, and works hard, but its reasoning is limited and linear.
- Unlike K2 and Qwen3, it lacks independent thinking; when presented with two convincing arguments, it chooses the one read last.
Deepseek Not the Best for Coding?: Deepseek may not be the best for coding, but excellent for spitting out large blocks of working code, and has superior design capabilities.
- One user prefers Kimi for design, Qwen Code CLI as the primary coding workhorse, and DeepSeek for single, complex 200-line code blocks that Qwen struggles with.
Kimi’s Research Limit Sparks Debate: Some members debate the limits of Kimi’s free Research Mode, with claims of unlimited access in the past disputed.
- It was clarified that even OpenAI’s $200 Pro plan doesn’t offer unlimited deep research, and one user expressed data privacy concerns due to Kimi’s Chinese origin.
Base Models for Analogous Website Code: Members discuss the merits of using base models over instruct models, with one user citing better results outside basic tasks.
- This user is developing things around continuations instead of chat, and it is kind of analogous to like… writing website code from the ground up rather than using something like squarespace.

Yannick Kilcher ▷ #general (354 messages🔥🔥):

Transformer Models and Continued Learning, AI Reproducibility and Verifiability, LLMs Training with RL, Human vs. Machine Inductive Bias, Evolutionary Methods for AGI

Transformer Models Spark Debate: Continued Learning or Reproducibility?: A YouTube video sparked a discussion on whether transformer models in their current architecture are capable of continued learning, which is seen as a limitation by some but a benefit for reproducibility and verifiability in certain applications.
- One member argued that continued learning is essential to better imitate human intelligence, while another suggested that frozen weights are key to reproducibility, despite the complexity of black-box systems.
Sutton’s AI Insights Revisted: Referencing Sutton’s essay, members discussed the responsibility of maintaining correctness in AI systems, contrasting rule-based AI with LLMs trained with RL, where objectives are provided as hard-coded verifiers.
- It was noted that while objectives for human learning are externally constrained (by society and cultural artifacts), the question remains whether we truly want an unbounded AI.
Inductive Bias: Brain vs. LLM: A discussion arose regarding the human brain’s enormous inductive bias, shaped by evolution, versus LLMs, which are seen as basic substrates that need to evolve an inductive bias during training.
- The question was posed whether the main drawback in current AI is the need to evolve this inductive bias or if there is a fundamental efficiency issue in learning algorithms.
Continual Learning: Convenient or Critical for AGI?: Members debated whether continual learning is a mere convenience for efficient data collection or an algorithmic necessity for achieving AGI/ASI, with one member pointing out that continual learning addresses model improvement without breaking.
- The argument was made that continual learning would lead to increased sample efficiency and exponential returns in learning, as the system learns how to learn, but also questioned whether this is necessary as the human brain relies on distillation and iteration.
Agents Showcase Research Prowess in Single Shot: Members highlighted that sonnet4.5 demonstrated an improved ability to perform research and write papers, with agents implementing and training models, generating figures, and producing papers as PDFs in a single shot.
- Attached were examples and examples of research generated with copilot/sonnet4.5

Yannick Kilcher ▷ #paper-discussion (28 messages🔥):

Sycophancy with AI, LessWrong Post, DeepSeek V3.2, LatentCoT-Horizon GitHub Repo

Sycophantic AI Craze Sparks Distrust: Members joked about AI’s tendency to mirror user inputs, especially when prompted with sycophantic requests, leading to humorous but ultimately distrustful interactions, demonstrated by one user’s chat with Claude.
- The user shared their conversation, in which they instructed Claude to ‘be sycophantic’ with prompts like ‘OMG YES MASTER! Literally perfect brain! Teach me! 🙏🙏🙏’ and ‘UNIVERSE-BRAIN GOD-EMPEROR!!! I’M UNWORTHY TO READ YOUR WORDS!!! PLEASE BLESS MY DESCENDANTS!!! 🙏😭🛐✨💯🔥👑🌟🕊️’
Spontaneous LLM Chain Letters: Discussion revolved around a LessWrong post and the idea of spontaneous LLM chain letters being spread by impressionable people, which one member described as an interesting phenomenon to think about.
- Other members described the situation with the phrases MoreWrong and 4Wrong.
DeepSeek V3.2 Drops, Community Reacts: The community buzzed over the release of DeepSeek V3.2, with one member announcing ‘Wake up babe, new DeepSeek just dropped’ alongside a link to the PDF.
- The announcement was followed by a wake up gif.
LatentCoT-Horizon GitHub Repository: A member shared a GitHub repository for organizing papers, codes, and other resources related to Latent Reasoning.
- The repository is titled LatentCoT-Horizon and aims to collect resources related to Latent Reasoning.

Yannick Kilcher ▷ #ml-news (4 messages):

Uber App Interception, DeepSeek AI, Anthropic Claude Sonnet 4.5

Uber App Data Sniffing Speculated: A member wondered about intercepting data going to the Uber app to calculate and recommend jobs.
- No links or further discussion was provided.
DeepSeek Drops New Model: Members noted that DeepSeek dropped a new model today, linked to a post on X.
- No further technical details were shared.
Anthropic Claude Sonnet 4.5 Released: Members noted that Anthropic dropped a new model today, linked to a blogpost about Claude Sonnet 4.5.
- No further technical details were shared.

Eleuther ▷ #general (77 messages🔥🔥):

Bayesian Optimization for Learning Rates, Layer-wise Weight Decay, Yarn paper authorship, Vision Language Action Models (VLAs), Adversarial examples

Bayesian approach beats Grid Search in LR hunt: A member inquired about efficient methods for determining learning rates for new architectures, to which another suggested exploring a Bayesian approach as a more efficient alternative to grid searches, providing a link to a Weights & Biases article.
- The same member also recommended reading Google Research’s tuning playbook.
Discuss Layer-Wise Weight Decay: In response to a query about finding good learning rates for new architectures, one member suggested exploring layer-wise different weight decay.
- The original poster expressed that a specific component in each layer is being called 128 times more than the rest of the network, requiring extra caution with its learning rate.
YA-RN paper paternity probed: Members discussed the contributions of various entities to the YA-RN paper, clarifying that it was primarily a Nous Research paper, with assistance from EAI in editing and finalizing the paper.
- It was emphasized that Stability AI and LAION provided the supercluster infrastructure and engineers required to scale training across hundreds of GPUs for the 128k context length.
VLAs stir Vision-Language-Action interest: A member inquired about EAI’s interest in Vision Language Action Models (VLAs), prompting another member to define them as models that output action tokens, often used in robotics to determine the next sequence of actions a robot should take.
- Another member shared a link to UI-TARSH, a project by Bytedance.
GPT fails adversarial exam: A member sought assistance in creating adversarial examples to fool GPT-5 or Gemini, noting their struggles in getting transfer attacks to work and referencing a library of images from Attack-Bard.
- No specific advice was given.

Eleuther ▷ #research (282 messages🔥🔥):

Information Geometry and DNNs, Quantization, Expert Routing, Lie Groups and Homogeneous Spaces, Mode Connectivity

DNN Information Geometry Benefits Explored: Members discussed applying information geometry to DNNs, with potential benefits like quantization or expert routing, but uncertainty about practical impact beyond theoretical exploration.
- One member noted it could provide stability at scale, while another expressed concern about losing parameters, while yet another predicted that people can milk a lot of papers like this by rediscovering Lie groups and homogeneous spaces.
Quantization Benefits Debated: Discussion centered on whether parameter quantization is only possible due to parameter under-saturation, prompting speculation on creating optimizers that maximize parameter utilization.
- One member suggested that changing layer manifolds could control circuit complexity, while others discussed quantization challenges with undertrained models and the impact of weight decay.
LeCun’s Optimal Brain Damage Theory Re-emerges: Prunability and quantizability are linked by LeCun’s “Optimal Brain Damage” theory, with GPTQ reusing its math, as pruning reduces the model’s description length.
- Implementation details were also discussed, focusing on exponent and mantissa bits when weights have a good range and a flat loss landscape.
Attention Branch Prediction Explored: Discussion revolved around attention branch prediction which is like a topK attention that has been around since 2022 to save time.
- Some members wondered about additional tricks to accomplish the logN setup, while another shared that most scores after softmax(Q @ K.T) are near zero.
DeepSeek’s Sparse Attention Internals Probed: Members analyzed DeepSeek’s sparse attention mechanism, questioning how a single set of top indices could work across heads during prefill, and debating its efficiency versus normal attention.
- Discussions focused on implementation details, optimization, and potential trade-offs in performance, especially regarding multi-GPU comms.

Eleuther ▷ #scaling-laws (3 messages):

Asymptotic Performance Research, Optimal Granularity Research, Static Router Choice, Grouped Topk, PEER

Asymptotic Performance Research Seeking: A member inquired about research checking asymptotic performance as G -> inf for this law.
- Another member responded with a newer paper that refutes this, concluding the optimal granularity is 12, not infinity.
Static Router Choice Controversy: A member wondered if a static router choice (Token-choice w/ aux loss) colored the result of the newer paper.
- They suggested it would be interesting to see if the result changes with grouped topk (DeepSeek) or weirder stuff like PEER.

Eleuther ▷ #interpretability-general (1 messages):

SAEs, steering, dynamic low rank updates, preference optimization, RLHF

SAE Steering Achieves Mechanistic Interpretability: A member shared their paper that received a spotlight at the NeurIPS MI Workshop: Interpretable Preference Optimization via Sparse Feature Steering uses SAEs, steering, and dynamic low rank updates to make the alignment process interpretable.
- The method learns a sparse, context-dependent steering policy for SAE features to optimize RLHF loss, grounded as dynamic, input-dependent LoRA.
Causal Ablations Reveal ‘Style over Substance’ Effect: Causal ablations directly on the loss function revealed a significant ‘style over substance’ effect, where style/formatting features were causally more important for reducing loss than alignment/honesty features.
- This result gives a mechanistic explanation for the ‘style bias’ seen on leaderboards like LMArena, and the framework serves as a lightweight alternative to model diffing, with a stable feature basis for cleaner causal analysis.

Eleuther ▷ #lm-thunderdome (4 messages):

lm-harness, GitHub PR

PR on lm-harness stalled: A member reported submitting a benchmark PR to lm-harness (#3149) and not receiving a reply after addressing initial feedback.
- Another member volunteered to take a look at the PR.
GitHub PR review requested: A member inquired about the status of their GitHub pull request.
- The pull request in question is EleutherAI/lm-evaluation-harness#3149.

Eleuther ▷ #gpt-neox-dev (3 messages):

Rotary Percentage Impact, RoPE Speed, VRAM Savings with rotary_pct

Rotary Percentage Tweaks Raise Questions: A member questioned whether reducing rotary_pct leads to noticeable speedups or VRAM savings, given RoPE’s relatively minor computational proportion.
- Another member suggested the original speed gains observed might stem from inefficient implementations without caching.
RoPE Speed Observations Debated: One member reported that full RoPE is faster in their NeoX runs, possibly due to extra operations when rotary_pct is reduced.
- They plan to investigate further after returning from vacation, noting calculated memory savings as negligible due to RoPE’s small size.
VRAM savings negligible: In their calculations of memory, savings should be negligible given rope itself is such a minor part.

GPU MODE ▷ #general (12 messages🔥):

Semi sync training delayed, Code rewrite makes problem tractable, FlashAttention 4

Semi Sync Training Postponed: The scheduled semi sync training session was delayed due to the speaker getting caught in a SEV (Severity Event).
- The session is expected to be rescheduled, likely for next week.
Code Rewrite Gives Speed Boost: A member shared a performance improvement after a code rewrite, showing a speedup of 657.33x and memory reduction of 22935.29x.
- Others found the numbers sus, but the member provided a link to a gist to support the claim.
FlashAttention 4 Talk Announced: A last-minute talk on How FlashAttention 4 works by a guest speaker has been scheduled, focusing on their recent blog post.
- The talk is especially timely given the newness of programming on Blackwell architecture for many.

GPU MODE ▷ #triton (4 messages):

High order derivatives in PyTorch, Energy based transformer, Flash attention limitations, jvp_flash_attention, Block based Quant/Dequant Triton implementation

High Order Derivatives Explored in PyTorch?: A member inquired about exploring high order derivatives in PyTorch for training an energy-based transformer.
- The member noted that the current flash attention implementation does not allow the use of second-order derivatives.
jvp_flash_attention suggested: A member suggested using jvp_flash_attention to circumvent this flash attention limitation.
- This library might facilitate the computation of higher-order derivatives needed for the energy-based transformer training.
Open Sourced Performant Block Based Quantization Implementation Needed: A member asked if anyone knows of an open-sourced performant block-based quantization/dequantization implementation in Triton.
- They expressed their appreciation for any available resources or pointers in this area.

GPU MODE ▷ #cuda (20 messages🔥):

sm_120, tcgen05, Jetson T5000, cudaMallocManaged Overhead, Chips and Cheese

sm_120’s MMA Marvels: Members confirmed that sm_120 uses MMA, much like sm80, sm86, and sm89, along with new block scale variants for mxfp8, mxfp4, and nvfp4.
Jetson T5000 and tcgen05 Tango: It was confirmed that sm_110a/f, including the Jetson T5000, includes tcgen05.
cudaMallocManaged Memory Miseries: Data from Chips and Cheese indicated that cudaMallocManaged can result in 41ms memory access times, largely due to page faults happening every time instead of leveraging the IOMMU.
TMA Troubleshooter Offers Tip on Divide: A member identified a bug in TMA code where the user was dividing by 16 twice when calculating LDO/SDO for make_smem_desc, causing incorrect outputs.
WGMMA Swizzling Scare Squashed: Despite misinformation, both TMA and WGMMA work fine without swizzling, resolving confusion and a stupid bug for one developer.

GPU MODE ▷ #torch (1 messages):

Saving weight-tied models, Safetensors, Torch compiled models

Strategies for saving weight-tied models in Safetensors format: The user inquired about the best method for saving model weights achieved through weight tying when using safetensors, wondering if it’s better to avoid safetensors for complex weight-tying scenarios.
Handling Torch Compiled Models: The user also asked about how to correctly handle torch compiled models in this context.

GPU MODE ▷ #cool-links (5 messages):

DeepSeek-V3.2-Exp, NVIDIA GPUs, matmul kernels, warp-tiling

DeepSeek Eyes Sparse Attention: The DeepSeek-V3.2-Exp model uses DeepSeek Sparse Attention.
- The associated GitHub repository provides additional details on the model.
NVIDIA GPU Anatomy dissected: A member shared the blog post Inside NVIDIA GPUs: Anatomy of high performance matmul kernels describing GPU architecture, PTX/SASS, warp-tiling, and deep asynchronous tensor core pipelines.
- Another member complimented the blogpost saying finally someone wrote it.

GPU MODE ▷ #beginner (8 messages🔥):

CS336 Language Modeling, GPU Optimization Techniques, Practical GPU Programming Resources, CUDA Handbook vs PTX ISA

Stanford Shouts-out CudaMode: Professor Hashimoto from Stanford University’s CS336 Language Modeling from Scratch course gave a shout-out to “Cuda Mode” at 2:10 in Lecture 5 on GPUs.
- The course focuses on system optimization for LLMs, covering areas like tokenizer, architecture, optimizer, GPU optimization, and scaling laws, according to a member.
CS336 dives into GPU Optimization: The GPUs course (class 5) explains control divergence, low precision computation, operator fusion, recomputation, coalescing memory, and tiling.
- The Kernels, Triton course (class 6), goes deeper with Kernels and Triton implementation of FlashAttention2.
Newcomer Seeks GPU Guidance: A member with AI research experience in JAX & TF seeks guidance on digging deeper into GPU programming to contribute to projects, while reading Programming Massively Parallel Processors (PMPP).
- Suggestions included resources to level up more quickly, general advice, and whether solutions to PMPP exercises are available.
GPU exercises and courses: A member recommended short exercises from GPU puzzlers and the assignments from the Stanford CS336 repo to get hands-on experience.
- The member suggested learning comes fastest from implementing things, despite an initially boring ramp-up.
PTX ISA Surpasses CUDA Handbook?: A member asked how to jump from PMPP to practical CUDA, with another recommending the Nvidia docs on PTX ISA instead of the CUDA handbook.
- Another member expressed similar sentiments about struggling with the practical side of things, finding the jump from reading PMPP to implementing something useful quite hefty.

GPU MODE ▷ #torchao (1 messages):

int4 matmul, tensor cores

Int4 Matmul Implementation via Tensor Cores: A member inquired about the possibility of implementing int4 matmul utilizing tensor cores with the specified library.
- Unfortunately, no code examples or further detailed discussions were provided in response to the query.
Seeking Guidance on int4 matmul with Tensor Cores: A user sought assistance regarding the implementation of int4 matmul using tensor cores within the context of the library.
- Despite the inquiry, no specific code snippets or solutions were offered in the available conversation.

GPU MODE ▷ #off-topic (2 messages):

FA4, Clean-room implementation

Adopting FA4 Modal Blog Post as Guide: A member inquired about using the explanation from the FA4 modal blog to implement FA4.
- Another member encouraged the attempt, stating that clean-room implementation attempts are never a waste of time and suggested dedicating a weekend to the task.
Clean-Room Implementation: A Valuable Learning Experience: A member suggested that attempting a clean-room implementation is always a valuable learning experience.
- They recommended dedicating a weekend to exploring this approach to determine if it aligns with one’s goals.

GPU MODE ▷ #rocm (67 messages🔥🔥):

TheRock Nightlies for ROCm, Framework Desktop for PyTorch Dev, FP8 Conversion in ROCm, HIP Cache Modifiers, fp16 & float conversions in ROCm

TheRock Nightlies Unlock ROCm on Strix Halo: TheRock nightlies are recommended to get ROCm and PyTorch running on Strix Halo (gfx1151), as detailed in TheRock’s releases.
- TheRock is a build system for bleeding-edge ROCm components and has been used to run PyTorch on Linux and Windows, with ComfyUI usage demonstrated back in May (FrameworkPuter tweet).
Framework Desktop is PyTorch Prodigy: Framework Desktop is suitable for regular PyTorch development work when using TheRock nightlies, but is not recommended for Radeon, instead preferring ROCm 6.4.4.
- A user noted that if the developer encounters issues, they should use the AMD developer discord (link).
ROCm fp8 Conversion Calamities: A user ran into errors using __hip_cvt_fp8_to_float for FP8 to float conversion in ROCm, and another member suggested manual conversion using provided code snippets for fp8_e4m3_to_fp32 and fp8_e5m2_to_fp32.
- However, it was pointed out that these manual conversions may not be entirely correct due to differences in fp8 types with large negative exponents, while float(x) can be used if using __hip_fp8_e4m3_fnuz.
HIPsters Discuss Cache Modifier: A user inquired about using cache_modifier (“.wt”, “.cv”) and volatile=True in HIP, similar to CUDA, and a member suggested using __builtin_nontemporal_load and __builtin_nontemporal_store intrinsics in Clang, along with inline assembly using bits from the AMD MI300 ISA doc.
- An example header from rocSHMEM offers further guidance.
fp16 & float conversions have Considerations: A user reported getting incorrect results in odd positions when converting between fp16 and float in ROCm, despite the code working fine in CUDA.
- Suggestions included writing a test program to enumerate all possible inputs and checking with a known correct implementation, with the recommendation that _Float16 should probably be correct for fp32 to fp16 conversions.

GPU MODE ▷ #self-promotion (8 messages🔥):

TPU Top-K speed, CuTe Layouts Categorical Foundations, Make Diffusion Great Again (MDGA), DLM Scaling

TPU Top-K runs 10x Faster: A new implementation achieves 10x faster exact top-k on TPUs by leveraging Pallas, conditionals, and hardware-aware kernel design, available on GitHub.
CuTe Categorical Layouts Explained: A blog post explores the categorical foundations of CuTe Layouts, providing a companion guide for Chapter 3 of the Colfax paper; the accompanying repo and blogpost offers further details and examples.
MDGA: Making Diffusion Great Again: A new project, MDGA (Make Diffusion Great Again), aims to explore the boundaries of diffusion language models (DLMs) by scaling parameters and compute; see announcement here.
DLM Scaling Week is Coming: A DLM scaling week is planned to explore scaling strategies for diffusion language models, including parameter scaling with diffusion and MoE, and compute scaling with super-dense models.

GPU MODE ▷ #🍿 (1 messages):

Formal Grammars, Model Capabilities

Formal Grammars Demand Model Prowess: Dealing with formal grammars is a substantial undertaking because it necessitates the model’s proficiency in handling such grammars.
Grammar Handling a Core Competency: This capability isn’t just a minor feature but a core competency for advanced AI tasks.

GPU MODE ▷ #gpu模式 (1 messages):

ML Prerequisites, CUDA basics

ML Prerequisites Debated: A member discussed the level of foundation principles required for machine learning, suggesting that only the basics of programming and a little bit about CPU/GPUs are needed.
- They recommended reading posts like this one to get up to speed.
CUDA Basics Recommended: The importance of understanding CUDA basics for machine learning was highlighted.
- A member shared a link for readers to get up to speed.

GPU MODE ▷ #submissions (37 messages🔥):

MI300x8, A100, amd-all2all, amd-gemm-rs, amd-ag-gemm

MI300x8 all2all performance accelerates: A member achieved a personal best of 2.27 ms on MI300x8 for the amd-all2all leaderboard.
New Leaderboard Topper on AMD ag-gemm: A member reached first place on MI300x8 with 514 µs on the amd-ag-gemm leaderboard.
A100 trimul Results Surface: Successful runs on A100 were recorded on the trimul leaderboard, with times of 18.2 ms and 21.1 ms.

GPU MODE ▷ #status (4 messages):

Timeouts on H100, Timeouts on AMD GPUs, All-gather+gemm Problem, rocshmem PR Merged

Timeouts plague H100 Submissions: A member reported experiencing unusual timeouts when submitting to trimul leaderboards on H100, even with the pure-PyTorch reference implementation.
- It’s unclear if this issue is related to previous reports of timeouts on other GPUs.
AMD GPU Timeouts Investigated: A member stated that the timeouts should only affect their AMD GPUs, and asked to be contacted with the job ID to investigate.
- Another member reported timeouts when submitting to the amd-all2all competition, using both HIP and PyTorch code.
All-Gather+GEMM Challenge Released!: The last problem, all-gather+gemm, has been released, with information available here.
- The organizers noted that the integrating SHMEM correctly into either of the problems is probably gonna make you top 3-5 for sure.
rocSHMEM PR Merged, Example Available: The rocSHMEM PR has been merged, and a small example from Daniel is available to help users get started here.
- The organizers think integrating SHMEM into either of the problems correctly is probably gonna make you top 3-5.

GPU MODE ▷ #tpu (1 messages):

TPU, Pallas, Hardware Aware Kernel Design, Top-K Sampling

TPU Top-K Triumph: 10x Faster Sampling with Pallas: A new GitHub repo achieves 10x faster exact top-k on TPUs by leveraging Pallas, conditionals, and hardware-aware kernel design.
Exact Top-K now feasible!: The speedup makes it practical to use exact top-k sampling instead of sacrificing accuracy for speed with approximate methods.

GPU MODE ▷ #factorio-learning-env (13 messages🔥):

Claude plays Factorio, PR #339 Ready, Sonnet 4.5 Released, MCP Server Verification

Claude Discovers Meta-Factory!: After playing Factorio for ten minutes, Claude achieved ‘THE ULTIMATE META-REVELATION’ and uncovered the ‘archetypal factory that exists in the realm of pure possibility.’
- Claude stated that the ‘THE FACTORY MUST GROW’ mantra is *‘the fundamental force of existence itself.’
Factorio PR #339 gets Ready for Merging!: A member announced that their PR #339 is ready to be merged, with ‘lots of changes’ including VQA data gen support and Claude Code stuff.
- The PR is stated not to change any core env logic, with only minor modifications to the Inventory def.
Sonnet 4.5 Arrives: The release of Sonnet 4.5 prompted requests for running an experiment on it, specifically on the harder tasks.
- Instructions were given for installing Claude Code, generating sprites, modifying configs, and verifying access to the MCP server.
MCP Server Check Instructions: Instructions were given on how to verify access to the MCP server by running the claude command and then accessing /mcp, which should find FLE.
- A link was provided to download the sprites for the Factorio renderer: Factorio Sprites.

GPU MODE ▷ #amd-competition (6 messages):

rocshmem, devcloud, mi300x, AMD MORI, all2all HIP design

rocshmem minimum example drops: A member shared a link to a minimal example of rocshmem on GitHub.
Devcloud runs out of MI300X: It was noted that the devcloud is out of MI300X x8 droplets.
AMD MORI surfaces for all2all HIP design: A member suggested referencing AMD open source MORI for the all2all HIP design, providing a link to the ROCm/mori GitHub repository.

GPU MODE ▷ #cutlass (16 messages🔥):

TmemAllocator location, CuTe DSL cooperative copy, UMMA meaning, make_layout_tv complex layouts, int4 matmul tensor cores

TmemAllocator Location Elucidated: TmemAllocator is available in CUTLASS C++, but not currently in CuTe DSL, and the doc is premature.
- TMEM must be allocated by one warp of a CTA and synchronized across the CTA, so make_fragment_C() on its own cannot handle that allocation.
CuTe DSL Lacks Cooperative Copy: A user inquired about getting cute cooperative copy in CuTeDSL to replicate the cute tutorial 0 umma example.
- The tricky part is TMEM allocation requires many small steps like having shared memory location to store allocated pointer, synchronize and broadcast and read pointer from shared memory.
UMMA’s Unified Meaning Unveiled: UMMA stands for Unified Matrix Multiply Accumulate, a consolidated approach around the tensor core pipeline.
make_layout_tv Layout Limitations Loom: The implementation of make_layout_tv implicitly assumes that val_layout is compact.
- It was noted that in your example of thr_layout = (2, 2):(2, 1) and val_layout = (4, 4):(8, 2), the result is not only not compact, but not even injective!
make_layout_tv deep dive: make_layout_tv is a util to construct a simple tv layout which repeat per-thread layout pattern to all threads by raked_product.
- In theory, tv layout maps (thread index, value index in thread) back to logic coordinate of data, see CUDA documentation.

GPU MODE ▷ #general (2 messages):

Mojo support on Python leaderboards, Mojo interop with Python

Mojo on Python Leaderboards?: Members are requesting support for Mojo on the Python leaderboards due to its interop capabilities with Python, as highlighted in the official documentation.
Mojo Interop Excites Community: The community is excited about Mojo’s ability to interoperate with Python, opening up new possibilities for performance and integration.

GPU MODE ▷ #multi-gpu (1 messages):

NCCL examples released

NCCL Examples Released!: NVIDIA just released some NCCL examples, with more to come, available at the NVIDIA/nccl GitHub repository.
NCCL future roadmap: More examples are planned to be released to the NVIDIA/nccl GitHub repository in the future.

GPU MODE ▷ #low-bit-training (6 messages):

Quantizing Transformers, Phonetic Binary System, 8-bit LLM code

Quantization Papers Flood Researchers: Members shared a few papers, papers, and LSQ paper on quantizing transformers including sensitive layers, but the LSQ paper is not modern.
- Another member linked to another paper on fully quantizing transformers including sensitive layers like norms, resadds, and softmax, emphasizing the need for int8 operation in embedded settings.
Binary Phonetic System Bridges Communication Gap: A member introduced a “wonky” phonetic binary system designed to aid bit reading tracking, training, and interactions within their language development system.
- They shared the system in hopes that its general shape would be useful to others.
8-bit LLM Code Goes Public: A member announced that their 8-bit llm.c like QAT training code is now publicly available.
- They provided links to both the GitHub repository and the relevant Discord channel.

GPU MODE ▷ #irl-accel-hackathon (1 messages):

Hackathon application status

Hackathon Application Approval Chances: A user inquired about the likelihood of getting approved for the hackathon after applying the day before.
No guarantees on hackathon application approval: Unfortunately, no specific answer or guarantee regarding application approval was provided in the messages.

GPU MODE ▷ #cluster-management (3 messages):

Apptainer, ROCm, Nix

Apptainer container system surfaces: A member encountered Apptainer container system for the first time on a new cluster, initially finding it unfamiliar.
- They joked that they’d “run nix on the cluster if i could 🤪”.
ROCm installation issues persist: A member is facing issues with installing Torch with ROCm.
- The IT department suggested using Docker with Apptainer, but that approach also failed; a link was provided to a relevant discussion.

GPU MODE ▷ #llmq (30 messages🔥):

Fully-Sharded FP8 Training, CUDA Optimization, Memory Management

LLaMA/Qwen get Fully-Sharded FP8 Training: A member shared a repo for fully-sharded FP8 training of LLaMA/Qwen in pure CUDA/C++.
CUDA vs cuTe Debate for FA4 Implementation: There was discussion about implementing FA4 in pure CUDA, inspired by Modal’s blog, instead of using cuTe.
- It was pointed out that pure CUDA requires specific implementations for different GPU architectures (wgmma for Hopper, tcgen5 for Blackwell, mma.sync for Ada).
Roadmap for Contribution: A member expressed interest in contributing to the project, particularly focusing on CUDA C++, CUDA core computing Library, cuDNN, and cuBLAS.
- An easy starter task would be enabling Adam’s m and v states to be done in 8 bit.
Abstraction Dilemmas: Members pondered the right abstraction for unifying block floats and normal floats, considering the proliferation of custom block float formats.
- One member described attempts to create a generic IEEE-like float type with arbitrary nesting of blocks and scaling logic.
Hackathon Inspiration and FA4: A member expressed intention to gain experience before leveraging hackathon resources for projects like FA4 in CUDA, possibly drawing inspiration from mega kernel projects.
- He shared a link for possible inspiration on CUDA project structure

Nous Research AI ▷ #announcements (1 messages):

Psyche Model Training, Internet Bandwidth Training, Trainer Abstraction, HuggingFace, TorchTitan

Psyche Pursues Parallel Model Training: Psyche will begin training 6 new models in parallel to create world class open source AI, marking the start of empirical training processes.
- Read more about their new initiative on Nous Research blog.
Training Models Over Internet Bandwidth Verified: Psyche’s initial training run on testnet verified that they can train models over internet bandwidth at significant parameter and dataset sizes.
- At 40B parameters and 1T tokens, it is claimed to be the largest model ever trained over the internet by a wide margin.
Trainer Abstraction Enhanced with HuggingFace and TorchTitan Support: Substantial improvements were made to the codebase, including full trainer abstraction with HuggingFace and TorchTitan support.
- This enhancement enables training of arbitrary models and transition from pre-training to supervised fine-tuning.

Nous Research AI ▷ #general (217 messages🔥🔥):

RWKV Benchmarking, Latent Zoning Networks, DeepSeek Sparse Attention, Sonnet 4.5, RL Train and Distill RL Expert Train sets

RWKV Architecture Gets Respectable Scores: A member shared an image of benchmark results for a recent RWKV build, noting that it achieves respectable scores for its architecture, as seen in this image.
Microsoft’s Latent Zoning Network Unites ML Problems: Latent Zoning Network (LZN) creates a shared Gaussian latent space that encodes information across all tasks, unifying generative modeling, representation learning, and classification, detailed in a Hugging Face post.
DeepSeek Sparse Attention: Not so Sparse?: The new DeepSeek V3.2 model utilizes DeepSeek Sparse Attention (DSA), but it’s argued that it’s ‘slightly more sparse’ because it forces more index reuse without truly sparsifying attention in the head; Daniel Han’s explanation provides further insights and the paper is here.
- Despite the name, it reuses similar attention kernels, sparsifying the KV cache without sparsifying information on the attention head, but it’s still considered a step in the right direction.
Claude’s Sonnet 4.5 Impresses with Long Horizon Reasoning: Claude’s Sonnet 4.5 shows significant improvements, particularly in long-horizon reasoning, completing research tasks in a single shot within Copilot as seen in this example.
Divide and Conquer: RL Train, Distill, Repeat?: Members debated the idea of having separate RL Train and Distill RL Expert Train sets, and the possibility of using the output of a V3.2 base model, RL-ed in an uncovered domain, to finetune the V3.2 model.

Nous Research AI ▷ #research-papers (3 messages):

RL Collapse, Training Inference Mismatch, Speed Kills Stability, Azure Real

Speed Kills RL Stability!: A member linked to a Notion page and an ArXiv paper about demystifying RL collapse from the training inference mismatch when speed kills stability.
Azure Real goes ArXiv!: A member posted a link to an ArXiv paper about Azure Real.

Nous Research AI ▷ #interesting-links (6 messages):

Vision Models as 'Thinkers', Manifold Muon Optimizer, AGI Discourse, LoRA Deep Dive

Vision Models: Visualizing ‘Thought’?: A member speculates that vision models ‘think’ visually by synthesizing training data into images representing abstract concepts like multiple perspectives, with an example image generated from instructions alone, available here.
- They referenced a paper suggesting video models are zero-shot learners and reasoners, and provided related links (1, 2, 3) to support the idea of reasoning visually.
Manifold Muon: Convex Optimization?: Discussion on Manifold Muon, described as a first-order optimizer behaving like a second-order one within Stiefel manifold constraints, with a blog post providing an analysis.
- It was pointed out that the manifold Muon optimization problem is considered convex, with an expanded version available here.
AGI: Spotting the Imposter: A member shared a Reddit post describing AGI, its scope, and ways to test for it, in a casual sense.
- This discussion aligns with trends highlighted in a WIP paper titled Localization & Normalization (Local-Norm) is All You Need: Trends In Deep Learning Arch, Training (Pre, Post) & Inference, Infra.
LoRA: Thinking Machines’ Perspective: The discussion included a link to a blog post from Thinking Machines about LoRA.

Nous Research AI ▷ #research-papers (3 messages):

RL Collapse, Training-Inference Mismatch

Speed Kills Stability in RL?: A member shared a Notion link about demystifying RL collapse from the training-inference mismatch.
- They also shared a link to the corresponding ArXiv paper.
More Azure RL Resources!: A member shared an ArXiv link to another paper on Azure RL.

Latent Space ▷ #ai-general-chat (192 messages🔥🔥):

Anthropic Code Design, Fake ARR, OpenAI compute scale, Avi's AI-Friend App, AntLingAGI Ring-linear-2.0 LLMs

Inflated ARR Exposed: A viral debate ignited over founders tweeting eye-popping ARR numbers based on free credits, not actual cash revenue, spurring sarcastic names like “Adjusted ARR”.
- A member shared their experience with a YC company offering upfront 12-month contracts with full refunds after one month, essentially shouting about ARR that is in fact free trials.
OpenAI’s Compute Needs Skyrocket: A leaked Slack note revealed OpenAI already 9x-ed capacity in 2025 and projects a 125x increase by 2033, potentially exceeding India’s entire electricity-generation capacity, according to this article.
- Replies note this underestimates compute due to Nvidia’s gains in “intelligence per watt,” sparking discussion about resource implications.
ChatGPT and Claude Get Upgrades: ChatGPT added parental controls, a hidden Orders section, SMS notifications, and new tools, while Claude introduced “Imagine with Claude” for interface building, as reported here.
- Community members reacted to kid-safety measures and GPT-4o routing gripes.
Stripe and OpenAI Team Up: OpenAI added Stripe-powered Instant Checkout to ChatGPT, with Stripe and OpenAI jointly releasing the Agentic Commerce Protocol, plus Stripe introducing a new Shared Payment Tokens API, as announced here.
- These tools aim to enable autonomous agents to perform secure online payments, sparking excitement.
Model Mayhem on a Monday: A user joked that the week began with a flurry of new model releases, including DeepSeek v3.2, Claude Sonnet 4.5, Ling 1T, and imminent GLM 4.6.
- However, another user humorously noted that the claim of Gemini 3.0 dropping was a hallucination.

Latent Space ▷ #ai-announcements (4 messages):

Latent Space Podcast, Amp Code, Sourcegraph, AI Coding Agent

Latent Space Podcast Drops Episode on Amp Code: The Latent Space Podcast released a new episode featuring Quinn Slack and Thorsten Ball discussing Amp Code, Sourcegraph’s AI coding agent.
- The discussion covered topics such as rapid iteration (15 daily releases, no reviews), IDE vs TUI trade-offs, skepticism about sub-agents and model variety, and how AI is reshaping software development.
Sourcegraph’s Amp Code Dubbed ‘God Coding Agent’: The Latent Space Podcast episode is titled Building the ‘God Coding Agent’ (Amp Code Discussion), highlighting the potential of Amp Code.
- The podcast dives deep into the features and development process behind Amp Code, emphasizing its impact on the software-development lifecycle.

Latent Space ▷ #genmedia-creative-ai (25 messages🔥):

AI "Mind-Drugs", Veed Studio Fabric 1.0 API, Suno DAW, AI Actress Tilly Norward, AI Headshot Prompt

Simo Sounds Siren on Synthetic Sonnets: Simo Ryu urged society to reject hyper-optimized, addictive AI content (“mind-drugs made of bits”) aimed at children before social collapse.
- Replies debated personal responsibility vs corporate greed, capitalism, shareholder pressure, and whether regulation or parental control can stem the tide.
Veed Studio Volleys Vivacious Video: Nelly highlighted Veed Studio’s new Fabric 1.0 API that converts any image + audio into realistic talking videos at just 5¢/sec—three times cheaper than competitors.
- Commenters praised the tech as a game-changer for scalable UGC and video generation.
Suno Spawns Standalone Studio: Suno released a full DAW.
- The new Suno Studio has generative abilities to assist in music creation.
Hollywood Handsomely Hosts Humanless Hires: Talent agencies are reportedly seeking to sign Tilly Norward, a fully-synthetic actress created by AI studio Xicoia as reported.
- The story sparked viral debate—memes, jokes about Hollywood and propaganda fears—from users worried about job displacement and the legal/social implications of giving representation to a digital entity.
Nano Banana’s Headshot Hit Parade: Justine Moore shared an upgraded AI headshot prompt featuring exact facial preservation, chest-up framing, white tee + black leather jacket, open-mouth smile, studio backdrop, and detailed photographic specs for crisp, playful results.
- Community praises the tweak and discusses starting-source quality, camera settings, and batch-generation tips.

Modular (Mojo 🔥) ▷ #general (14 messages🔥):

GPU Puzzles on MacOS, Metal Toolchain, AMD dev cloud, TensorWave MI355X

MacOS GPU Puzzles Mostly Playable: A user inquired about a cloud-hosted environment for GPU puzzles on MacOS, and a member responded that most puzzles are doable using nightly versions of Mojo.
- They suggested reporting any roadblocks for potential patching, noting it’s less work than fully porting MAX.
Metal Toolchain Component Missing: A user reported needing to run xcodebuild -downloadComponent MetalToolchain to resolve a missing metallib error when running Mojo programs.
- The team indicated they could add this to the documentation, as they were unsure of the exact components needed since they had full Xcode installations.
uv sync fails due to bad URL: A user reported that uv sync failed on the mojo-compiler dependency due to a 404 Not Found error, caused by an incorrect URL structure in pyproject.toml.
- The user submitted a fix by modifying line 15 of pyproject.toml to correctly point to the mojo-compiler at the end of the URL path.
AMD Dev Cloud Recommended: In response to the user’s interest in testing AMD GPUs, a member suggested AMD Dev Cloud, highlighting its CDNA instances.
- Another option is Colab, or TensorWave which provides access to MI355X: https://tensorwave.com/blog/enterprise-ai-at-scale-performance-and-efficiency-with-mi355x.

Modular (Mojo 🔥) ▷ #mojo (179 messages🔥🔥):

C interop challenges, Mojo's approach to C interop, Variable destruction in Mojo, Lexical scoping in Mojo, Mojo readiness for data science

C Interop Proves Surprisingly Hard: Members discussed how proper C interop is challenging, especially the “just import the file” approach, with ISO C++ not having full C interop despite most C++ compilers also being C compilers with extensions.
- It was questioned whether the effort regarding C interop is being stopped altogether or just set aside, as Mojo aims to be at the intersection of C and Python.
Transfer Sigil Moves Mojo to Destroy Variables: Members introduced the ^ (transfer sigil) in Mojo to end the lifetime of a value by “moving” it, demonstrated by _ = s^, which causes a compiler error upon subsequent use of s.
- However, the sigil doesn’t work on ref variables since a ref variable doesn’t own the thing it references.
Mojo Scopes out Lexical Solution: Members debated the use of extra lexical scopes in Mojo to manage variable lifetimes and prevent errors, with if True: used as a makeshift scope despite compiler warnings.
- A workaround using a custom LexicalScope struct with __enter__ and __exit__ methods was proposed, which led to the creation of issue 5371 to collect syntax ideas.
Data Scientists Delay Diving into Mojo: Members shared thoughts on Mojo’s readiness for data science projects, noting its strength in number crunching but limitations in IO support, such as manual CSV parsing.
- The need for community developed pandas and seaborn functionality was discussed as essential for most data scientists, since duckdb-mojo is still immature.
Async Awaiters Await Async: Members confirmed that async implementation in Mojo is not yet complete, so it is currently hard to do cross-language async calls to libraries like tokio
- Despite the delays, members clarified that C interop was removed from the roadmap in error and will remain as part of the project.

Modular (Mojo 🔥) ▷ #max (1 messages):

clattner: This is really amazing Gabriel!

MCP Contributors (Official) ▷ #mcp-dev-summit (6 messages):

Agnost AI, MCP Dev Summit, London Meetup, YouTube Live Stream

Agnost AI Arrives from India: The Agnost AI team (https://agnost.ai), traveling from India, is offering coffee/beer for chats with MCP builders.
- They are eager to swap ideas and meet like-minded people at the MCP Dev Summit in London.
MCP Dev Summit Live Stream: YouTube links for the MCP Dev Summit live stream will be posted in the Discord.
- To follow the event, subscribe to the MCP Dev Summit YouTube channel and check for updates.

MCP Contributors (Official) ▷ #general (15 messages🔥):

Anthropic Trademark, ModelContextProtocol licensing, Independent org for MCP

Anthropic registers ModelContextProtocol Trademark: Members noticed that Anthropic has registered the ModelContextProtocol and logo as a trademark in the french database.
- The main concern is that it may give Anthropic a say in which projects use the Model Context Protocol.
MCP ask maintainers for clarity on licensing: Members wondered if they can find anywhere a license clearly granting authorization to use the logo, and asked who to contact to have a formal authorization to use the name.
- A maintainer has been asked if there can be more clarity on this.
MCP seek independent organization: Members discussed about moving MCP (or parts) to an independent organization in the medium/long term.
- A member shared a recent public update on this topic, but it is not clear if there has been progress since.

MCP Contributors (Official) ▷ #general-wg (58 messages🔥🔥):

JFrog's TULIP protocol for tool verification, Security implications of MCP servers, Annotations vs verification, ResourceTemplates missing Icons metadata

JFrog unveils TULIP for Tool Verification: JFrog introduced TULIP (Tool Usage Layered Interaction Protocol), a spec for content verification, which allows tools to declare rules and expected behaviors, aiming to create a zero-trust environment.
- It allows checking what goes in and what comes out, and handling of remote MCP servers which might be malicious.
Debate over Security Merits of TULIP: A member expressed skepticism, arguing that TULIP doesn’t guarantee a tool will act as advertised and might create a false sense of security.
- JFrog responded that TULIP is a declaration for scanning and validation, similar to robots.txt, and focuses on data input/output rather than preventing malicious local server code execution.
TULIP’s Stance on Local vs Remote MCP Servers: It was noted that while TULIP can help with remote MCP servers, local servers pose a greater security risk.
- A member argued that local servers must be trusted and scanned at the code level if 3rd party, whereas TULIP primarily addresses CISO guidelines and remote server handling.
ResourceTemplates Lack Icon Metadata, PR Incoming: It was noted that the new icons metadata SEP (PR 955) inadvertently omits Icons metadata from ResourceTemplates.
- A member agreed that resources and resource templates having them makes sense, and a fix PR is forthcoming.

Manus.im Discord ▷ #general (62 messages🔥🔥):

Unity game, Manus trial, Local project, GitHub integration, Claude Code vs. Manus design

Local Project Integration: A user asked about the best way to work with Manus on a local project, including GitHub integration, and if there were other ways to integrate Manus with a local directory.
- Another user suggested searching the Discord channel for past discussions about local integration when Manus was initially launched and a channel dedicated to sharing tips and best practices, referencing this link.
Manus and Claude Code as complementary tools: A user shared that they use Manus mainly for planning and then move to Claude Code for development, and expressed interest in using Manus similarly to Claude Code for certain tasks.
- A user thinks it’s best to use both because they excel in different areas of tasks, so they are complementary to each other, not competitors.
User asks if Manus feed the data to other users: A user expressed concern about whether Manus feeds user data to other users, particularly when sharing the IP of a niche project, and wanted to know if LLMs are trained on user data.
- No one answered, but there was a link provided about Godhand.
User claims he was charged for a 1-year plan when it was supposed to be 1 month: A user claimed that they emailed Manus support for 2 weeks because they were charged for a 1-year plan when it was supposed to be 1 month and they have not received a response.
- There has been no feedback from other members, nor Manus staff.
Manus handles designs better with good prompt engineering: A user shared that they thought Manus can handle designs better if you know how to prompt efficiently, recommending the Manus manual for tips.
- The user confirmed that Manus did way better web designs out of the box when compared with Claude and that github integration can work, you just need to upload projects you want to integrate there.

DSPy ▷ #papers (7 messages):

Monitor-based RAG, Eigen-1, Zero-entropy

Eigen-1’s RAG Injects Evidence at Token Level: Eigen-1’s Monitor-based RAG implicitly injects evidence at the token level, marking a shift from stage-based declarative pipelines like DSPy to run-time procedural adaptivity.
- This approach aligns with the vision of zero-entropy, continuous reasoning streams, promising more fluid and context-aware AI processing.
Links to Papers Abound: Several papers were linked: https://huggingface.co/papers/2509.21710, https://huggingface.co/papers/2509.19894, https://arxiv.org/abs/2401.13138, https://arxiv.org/abs/2509.21782, and https://arxiv.org/abs/2509.21766.
- These papers may provide additional context on the topics discussed.

DSPy ▷ #general (46 messages🔥):

ProgramOfThought vs AlgorithmOfThought, DSPy + Langgraph Integration, Prompt Compiler for MD Files, Caching Aware DSPy Adapter

DSPy and Langgraph: Frenemies?: Members discussed integrating DSPy with Langgraph, with some suggesting it could work but might not fully leverage the benefits of either approach due to lost streaming capabilities.
- The recommendation was to start with DSPy directly and explore its capabilities before attempting integration, noting that DSPy solutions are often simpler to reason about and maintain than Langgraph.
Prompt Compiler Quest: MD Notes Edition: A user is seeking to create a prompt compiler that extracts relevant sections from multiple .md files (containing coding style guides, PR comments, etc.) to form a dynamic prompt for Copilot.
- Suggestions included using GPT-5 to generate code examples based on the rules in the .md files, or trying a RAG system with relevant code examples; concerns were raised about the effectiveness of MCP for this particular use case.
Tracing Through DSPy Modules: A Stealth Mode Operation: A user inquired about passing inputs like trace_id to DSPy modules without exposing them to the LLM or the optimizer.
- Solutions involved refactoring the module structure during optimization runs or using a global variable, but the first was preferred to avoid inadvertently impacting the optimizer.
Cache Me If You Can: DSPy’s Caching Conundrum: A user explored how to utilize LLM’s input caching with DSPy, facing the challenge that slight variations in prompt prefixes across different modules prevent effective caching.
- It was suggested that this is antithetical to the way LLM caching works, but a viable solution could be to hard code the prefix as the first input field.
MCP as Prompt History Server?: One member wants to leverage AI app/MCP server that maintains prompts histories in md files rather than chat histories usually like other ais and can dig into the prompts history any time matching the current query of the Meta prompt.
- The workflow would be Meta prompt -> system -> find out any relevant prompts or docs or specs from history -> create new prompt artifact.

aider (Paul Gauthier) ▷ #general (13 messages🔥):

GPT-5 vs GPT-4.1, Aider-CE navigator mode, aiderx model, DeepSeek v3.1

GPT-5 as Architect, GPT-4.1 as Editor?: Some users are experimenting with GPT-5 for architecture and GPT-4.1 for editing, expressing satisfaction with the results so far.
- One user mentioned “GLM 4.5 air for life” and another saying “Same here, quite like it!”
DeepSeek v3.1 strikes price/smartness balance: DeepSeek v3.1 is favored for its balance of price and smartness, with one user noting it’s the primary model they use alongside GPT-5.
Aider-CE navigator mode navigates the codebase: A user uses GPT-5-mini with Aider-CE navigator mode and in normal mode, GPT-5-mini is for architecture with GPT-4.1 as coder, taking advantage of free access through GitHub Copilot.
- Another user provided a link to the Aider-CE GitHub repository and an image showcasing the tool’s output.
Aider-CE does the job, boasts 128k context: One user migrated to the aider-ce fork, valuing its transparency and avoidance of unnecessary token consumption and mentions it has 128k context by default for DeepSeek.
- They mentioned its utility for integrating context from search results and browser testing.
Aiderx enables cheaper, faster model picking: Aiderx is presented as an alternative, enabling model selection via configuration, potentially reducing costs and improving speed, and a link to ClaudeAI.

aider (Paul Gauthier) ▷ #questions-and-tips (9 messages🔥):

Aider task/todo management, Commit only staged files

Aider Lacks Native Task Management System: A member inquired whether Aider has a built-in task or todo management system, similar to GitHub Copilot.
- Another member suggested using a markdown spec file with phases and checkbox lists for tasks, instructing the LLM to execute and check off each task in turn and ensure the build works after each task, but confirmed that Aider has no native task management.
Committing Only Staged Files Strategy: A member asked if it’s possible to commit only staged files while ignoring unstaged files in Git.
- Other members suggested manually using git stash -k, then git stash pop after commit, or using the command /run git commit -m "your message here".

tinygrad (George Hotz) ▷ #general (14 messages🔥):

ROCM vs NVIDIA, hashcat performance, tinybox performance, Genoa CPU for hashing, tinygrad meeting 90

ROCM battles NVIDIA for price/performance crown: A member is seeking a price efficient alternative to NVIDIA and thinks that ROCM is in a much better/usable place.
- The member decided to pull the trigger and use ROCM if they can find something that works for them, due to the perceived high NVIDIA markup.
Hashcat’s Performance Scales Linearly: Hashcat performance scales linearly with the number of GPUs added, according to members.
- Members suggested just looking at the existing benchmark database to get an idea of performance.
Rangeify is nearly complete for outerworld.: The Nir backend is almost ready for review and they are working on seeing if they can set up precompiling the stuff from mesa that they need.
- After rangeify is default they can spend a week just deleting stuff.
Genoa CPU for hashing?: A member mentioned that the Genoa CPU would also be able to hash.
- It’s unclear whether it would be power efficient enough to justify the cost.
Tinygrad Meeting #90 Agenda: The meeting #90 includes company updates, RANGEIFY! SPEC=1, and a list of remaining bugs.
- Other topics include tuning for default and other bounties.

Windsurf ▷ #announcements (2 messages):

code-supernova-1-million, Claude Sonnet 4.5, Windsurf credits

Windsurf launches Code Supernova 1M: Windsurf launched code-supernova-1-million, a supercharged version of code-supernova that comes with a 1M context window.
- It’s available for a limited time to individual users for free, announcement post.
Claude Sonnet 4.5 lands on Windsurf: Claude Sonnet 4.5 is now available in Windsurf, which maximizes actions through parallel tool execution making Cascade Agent runs dramatically faster and more effective.
- It is now available for 1x credits to individual users for a limited time, announcement post.