Claude is all you need.
AI News for 9/26/2025-9/29/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (196 channels, and 15992 messages) for you. Estimated reading time saved (at 200wpm): 1286 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Special mentions go out to John Schulmanâs Thinking Machines blogpost on LoRA and OpenAI launching Instant Checkout in ChatGPT and Agentic Commerce Protocol with Stripe and DeepSeek announcing big price cuts for V3.2 with a new Sparse Attention algorithm who will be overlooked becauseâŠ
Anthropic chose today to drop an entire weekâs worth of launches on one single day:
-
Claude Sonnet 4.5: SOTA SWE-Bench Verified at 77.2% (with parallel TTC 82%)
including a new focus on improvements in finance, law and STEM:
-
Claude Code v2
-
checkpointsâone of the most requested featuresâthat save your progress and allow you to roll back instantly to a previous state.
-
a refreshed terminal interface
-
and shipped a native VS Code extension (design story here)
-
-
-
Claude API:
-
a new context editing feature and memory tool to the Claude API that lets agents run even longer and handle even greater complexity.
-
Renaming the Claude Code SDK to Claude Agent SDK.
-
-
- In the Claude apps, weâve brought code execution and file creation (spreadsheets, slides, and documents) directly into the conversation.
- the Claude for Chrome extension is now available to Max users who joined the waitlist last month.
-
Imagine with Claude: a generative UI experiement research preview.
Reception has been roundly positive, with folks like Cognition Devin and Sourcegraph Amp adopting as default model and third party evals like Box and SWE-Agent approving.
You can now also check out Mike Kriegerâs chat on Latent Space about all the big day:
AI Twitter Recap
DeepSeek V3.2-Exp: Sparse Attention, price cuts, and open kernels
- DeepSeek Sparse Attention (DSA) lands (open) with big efficiency wins: DeepSeek released an experimental V3.2-Exp model that retrofits V3.1-Terminus with a learned sparse attention scheme, cutting long-context costs without quality loss. A tiny âlightning indexerâ scores past tokens per query, selects topâk positions, and the backbone runs full attention only on those, changing complexity from O[L^2] to O[Lk]. Two-stage continual pretraining on top of V3.1: a dense warmâup (~2.1B tokens, backbone frozen) aligns the indexer to dense attention via KL loss; then endâtoâend sparse training (~944B tokens) adapts the backbone to the indexer with KL regularization. Models, tech report, and kernels are released; API prices drop 50%+ with claimed ~3.5x cheaper prefill and ~10x cheaper decode at 128k context, with quality matching V3.1. See the launch thread @deepseek_ai, pricing/API notes 3/n and code 4/n. Deep breakdowns from @danielhanchen and @scaling01.
- Ecosystem and compilers: vLLM has DSA support recipes and H200/B200 builds (vLLM, DSA explainer 1/3). DeepSeekâs kernels ship in TileLang/CUDA; TileLang (TVM) hits ~95% of hand-written FlashMLA in ~80 lines and targets Nvidia, Huawei Ascend, Cambricon (@Yuchenj_UW). Community reactions highlight that DSAâs postâhoc sparsification on a dense checkpoint generalizes beyond DeepSeek (analysis).
- Post-training recipe: DeepSeek confirms RL on specialist models (math, competitive programming, general reasoning, agentic coding, agentic search) with GRPO and rubric/consistency rewards, then distillation into the final checkpoint; SPCT/GRM used in RL stages (notes, confirm).
Anthropicâs Claude Sonnet 4.5: coding/agent leap and first interpretability audit in a system card
- New SOTA for coding and agents: Anthropic launched Sonnet 4.5, claiming best-in-class coding, computer use, and reasoning/math. It sets a new high on SWEâBench Verified (no tools) and shows large gains on OSWorld (computer use), plus long autonomous coding runs (e.g., building/maintaining a codebase over 30+ hours, ~11k LOC) (launch, Cognition/Devin rebuild, long-run coding, finance/programming evals). Pricing remains $3M/$15M (input/output) with 200k default context and a 1M option for some partners (Cline).
- Alignment and interpretability work surfaced: Anthropic published a detailed system card; they report substantially reduced sycophancy/reward hacking and âevaluation awarenessâ signals discovered via interpretability. The team did a pre-deployment whiteâbox audit to âread the modelâs mindâ (to their knowledge, a first for a frontier LLM system card). See @janleike, the audit thread by @Jack_W_Lindsey, and system-card highlights (1, 2).
- Tooling and integrations: Claude Code v2 ships checkpoints, UX improvements, and a native VS Code extension; the Claude Code SDK is now the Claude Agent SDK aimed at general agents (@_catwu, @alexalbert__). Broad availability landed in Cursor (now with browser control), Perplexity, and OpenRouter (Cursor add, browser control, Perplexity, OpenRouter). Case studies: replicating published econ research from raw data using code execution/file creation (@emollick, @alexalbert__).
RL for LLMs: GRPO vs PPO vs REINFORCE, and LoRA matches full FT in many settings
- GRPO discourse, grounded: Practitioners with OAI/Anthropic RL experience argue GRPO is essentially a policy-gradient variant of REINFORCE with group baselines; performance differences among reasonable PG variants (GRPO, RLOO, PPO, SPO) are often smaller than gaps in data recipe, credit assignment, and variance reduction. See high-signal threads by @McaleerStephen and @zhongwen2009, plus a workflow explainer (@TheTuringPost). For those avoiding PPO complexity, REINFORCE/RLOO work well and avoid a value model (lower cost) (@cwolferesearch).
- LoRA holds up in RL: New experiments indicate LoRA can match full fineâtuning in many RL postâtraining regimes, even at low rank; corroborated by QLoRA experience (>1500 expts) and recent GRPO implementations (@thinkymachines, @Tim_Dettmers, @danielhanchen). NVIDIA also proposes RLBFF (binary principleâbased feedback combining RLHF/RLVR) with strong RM-Bench/JudgeBench results (overview, paper).
- Data is the bottleneck debate continues: @fchollet stresses that scaling LLMs has been data-bound (human-generated and environment-crafted), while âAGIâ might be computeâbound; meanwhile OpenAIâs GDPVal dataset is trending on HF (@ClementDelangue) and the community calls for updated evals beyond saturated MMLU (@maximelabonne).
Agentic commerce and platform updates
- OpenAI Instant Checkout + Agentic Commerce Protocol (ACP): ChatGPT now supports buying directly in-chat, starting with Etsy and âover a millionâ Shopify merchants coming soon. ACP is co-developed with Stripe as an open standard for programmatic commerce between users, AI agents, and businesses. Developers can apply to integrate; details via @OpenAI, @OpenAIDevs, docs, and Stripeâs perspective (Patrick Collison, SemiAnalysis). In parallel, Google introduced AP2 (agent payments) with cryptographically signed mandates (DeepLearningAI).
- Safety & governance: OpenAI rolled out parental controls (link teen/parent accounts, granular controls, self-harm risk notifications) (announcement, @fidjissimo). Anthropic backed Californiaâs SB53 for frontier AI transparency while preferring federal frameworks (@jackclarkSF). OpenAI also opened âOpenAI for Scienceâ roles to build an AI-powered scientific instrument (@kevinweil).
Infra, kernels, and other releases
- Systems and compilers: Modal raised a $87M Series B (now a âBâillion valuation) to keep building ML-native infra; customers highlight the âremote but feels localâ DX and scaling ergonomics (@bernhardsson, @HamelHusain, @raunakdoesdev). For GPU internals, a widely-praised deep dive on writing high-performance matmul kernels on H100 covers memory hierarchy, PTX/SASS, warp tiling, TMA/wgmma, and scheduling (@gordic_aleksa, @cHHillee).
- Other model drops: Googleâs TimesFM 2.5 (200M params, 16k context, Apache-2.0) is a stronger zero-shot time-series forecaster (@osanseviero). AntLingAGI previewed Ringâ1T, a 1T-parameter open âthinkingâ model with early results on AIME25/HMMT/ARC-AGI and an IMOâ25 Q3 solve (@AntLingAGI). On vision, Tencentâs HunyuanImage 3 joined community testbeds (Yupp), and Qwen-Image-Editâ2509 showcased robust style transfer for architectural scenes (@Alibaba_Qwen).
Top tweets (by engagement)
- Anthropic launch: âIntroducing Claude Sonnet 4.5âthe best coding model in the world.â @claudeai
- OpenAI commerce: âInstant Checkout in ChatGPT⊠open-sourcing the Agentic Commerce Protocol.â @OpenAI
- DeepSeek V3.2-Exp: âIntroducing DeepSeek Sparse Attention⊠API prices cut 50%+.â @deepseek_ai
- RL perspective: âHaving done RL at OpenAI and Anthropic, hereâs what I can say about GRPO.â @McaleerStephen
- Cursor integration: âSonnet 4.5 is now available in Cursor.â @cursor_ai
- On data vs compute: âLLMs are dependent on human output; AGI will scale with compute.â @fchollet
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. China AI Model Launches: Alibaba Qwen Scaling Roadmap and Tencent Hunyuan Image 3.0
- Alibaba just unveiled their Qwen roadmap. The ambition is staggering! (Activity: 954): Alibabaâs Qwen roadmap slide (image) lays out two bets: a unified multimodal model family and extreme scaling. Targets include context window growth from
1M â 100M
tokens, parameter count from ~1T â 10T
, test-time compute budget from64k â 1M
(implying much longer CoT/drafting), and data scale from10T â 100T
tokens. It also highlights unbounded synthetic data generation and expanded agent capabilities (task complexity, interaction, learning modes), signaling a strong âscaling is all you needâ strategy. Commenters are wowed by the100M
context, skeptical it will remain open-source at that scale, and note that running >1T-parameter models locally is impractical for consumer hardware.- Ambition for a
100M
token context sparked feasibility analysis: with standard attention, compute is O(L^2) and KV-cache memory scales linearly with L. For a 7B-class transformer (â32 layers, 32 heads, head_dim 128), even with 8âbit KV, the cache is ~256 KB/token
, implying ~25 TB
just for KV at 100M tokens; fp16 would double that. Commenters note such lengths would require architectural/algorithmic changes (e.g., retrieval, recurrent/state-space models, or linear/streaming attention; see ideas like Ring Attention or limitations of FlashAttention-3, which still has O(L^2) compute). - On running >
1T
parameter models locally: weight storage alone is prohibitiveâfp16 â 2 TB
,int8 â 1 TB
,4âbit â 0.5 TB
âbefore activations and KV cache. Even ignoring KV, youâd need on the order of13Ă H100 80GB
GPUs just to hold 1 TB of int8 weights, plus high-bandwidth NVLink/NVSwitch; PCIe workstations would be bandwidth-bound with single-digit tokens/s if offloading to CPU/NVMe. KV grows with both model depth and context (e.g., Llamaâ70B-scale models are ~~1.25 MB/token
at 8âbit KV, so long contexts quickly add tens to hundreds of GB), making âlocalâ inference for trillionâscale models impractical. - Licensing/openness concerns were raised: speculation that ultra-long-context or frontier Qwen checkpoints may be closed or API-only even if smaller Qwen variants remain open-weight. The technical implication discussed is that reproducibility and thirdâparty benchmarking of such extreme context lengths may depend on whether training/inference codepaths (e.g., specialized attention kernels, memory planners) and weights are released versus restricted to hosted endpoints.
- Ambition for a
- Tencent is teasing the worldâs most powerful open-source text-to-image model, Hunyuan Image 3.0 Drops Sept 28 (Activity: 225): Tencent is teasing Hunyuan Image 3.0, an openâsource textâtoâimage model slated for release on Sept 28, claiming it will be the âmost powerfulâ openâsource T2I model. The teaser provides no technical specs or benchmarks; a commenter asserts a
96 GB VRAM
figure, but no official details on architecture, training data, resolution/sampler support, or inference requirements are given. Teaser image. Commenters are skeptical of preârelease hype, noting strong models often âshadow dropâ (e.g., Qwen) while hyped releases can disappoint (e.g., SD3 vs. Flux). Others argue the âmost powerfulâ claim is unverified until comparable openâsource contenders are publicly measured.- A commenter claims a
~96 GB VRAM
requirement, implying a very large memory footprint for inference. If accurate, this would push usage toward A100/H100-class GPUs or multi-GPU/offload setups and limit practicality on 24â48 GB consumer cards unless quantization or CPU/NVMe offloading is available. Official details on batch size, target resolution, and precision (fp16/bf16/fp8) will be crucial to interpret the VRAM figure. - Skepticism around pre-release hype is strong: users note that heavily teased models often underdeliver versus âshadow-droppedâ releases. Cited contrasts include Qwen models quietly releasing with solid quality versus hyped teasers like GPT-5, and the SD3 marketing compared to Fluxâs reception. Takeaway: wait for third-party benchmarks and controlled A/Bs before accepting âmost powerfulâ claims.
- The âmost powerful open-sourceâ claim is questioned pending head-to-heads against open models (e.g., Qwen Image, SD3, Flux) on fidelity, prompt adherence, and speed. Integration concerns (âwhen ComfyUIâ) underscore the need for immediate pipeline/tooling support and optimized inference graphs. Credible evaluation should report hardware/precision settings and throughput (it/s) alongside sample galleries.
- A commenter claims a
2. Fenghua No.3 GPU API Support and Post-abliteration Uncensored LLM Finetuning
- China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Activity: 702): Post claims a Chinese discrete GPU âFenghua No.3â supports modern graphics APIsâDirectX 12, Vulkan 1.2, OpenGL 4.6âand advertises CUDA support, implying an attempt to run CUDA workloads on nonâNVIDIA hardware. No performance data, ISA/compiler details, or driver maturity info are provided; CUDA support may rely on a compatibility/translation layer, so coverage (PTX versions, runtime APIs) and perf remain unknown. Commenters note AMDâs HIP (a CUDAâlike API) and projects like ZLUDA (CUDA translation on other GPUs) as precedents, suggesting Chinese vendors may implement CUDA more directly due to fewer legal constraints, while others are skeptical until real benchmarks/demos are shown.
- AMD already offers a CUDA-compatibility route via HIP, which mirrors CUDA runtime/kernel APIs but with renamed symbols to sidestep NVIDIA licensing; tooling like HIPIFY can auto-translate CUDA code to HIP targeting ROCm backends (HIP, HIPIFY). Projects such as ZLUDA provide a binary-compatibility layer that maps CUDA runtime/driver calls and PTX to other GPU backends (initially Intel Level Zero, with active forks targeting AMD ROCm), aiming for minimal overhead and running unmodified CUDA apps (ZLUDA repo). This context suggests Chinese vendors could directly implement the CUDA runtime/driver ABI to maximize compatibility, whereas Western vendors typically rely on translation layers to avoid legal risk.
- IMPORTANT: Why Abliterated Models SUCK. Here is a better way to uncensor LLMs. (Activity: 433): OP reports that âabliterationâ (uncensoring via weight surgery) consistently degrades capabilityâespecially on MoE like Qwen3-30B-A3Bâwith drops in logical reasoning, tool-use/agentic control, and much higher hallucination, sometimes making 30B worse than clean 4â8B baselines. In contrast, abliteration followed by finetuning (SFT/DPO) largely restores performance: e.g., mradermacher/Qwen3-30B-A3B-abliterated-erotic-i1-GGUF (tested at
i1-Q4_K_S
) is close to the base model with lower hallucinations and better tool-calling than other abliterated Qwen3 variants, and mlabonne/NeuralDaredevil-8B-abliterated (DPO on Llama3-8B) reportedly outperforms its base while remaining uncensored. Comparative baselines that underperformed included Huihui-Qwen3-30B-A3B-Thinking-2507-abliterated-GGUF, Huihui-Qwen3-30B-A3B-abliterated-Fusion-9010-i1-GGUF, and Huihui-Qwen3-30B-A3B-Instruct-2507-abliterated-GGUF, which showed poor MCP/tool-call selection and spammy behavior, plus elevated hallucinations; the erotic-i1 model remained slightly weaker than the original Qwen3-30B-A3B on agentic tasks. OPâs hypothesis: post-abliteration finetuning âhealsâ performance lost by unconstrained weight edits. Comments call for a standardized benchmark for âabliterationâ effects beyond NSFW tasks; others frame the observation as known âmodel healing,â i.e., further training lets the network re-learn connections damaged by weight edits. A critical view argues that if finetuning fixes things, abliteration may be unnecessaryââIâve never seen ablit+finetune beat just finetuneââand that removing safety/ânegative biasesâ often harms general usability.- Multiple commenters call for a capability-oriented benchmark to evaluate âabliterationâ side-effects beyond NSFW outputs; the Uncensored General Intelligence (UGI) leaderboard explicitly targets uncensored-model performance across diverse tasks: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard. A standardized suite would enable apples-to-apples comparisons between ablated, fine-tuned, and baseline models on reasoning, instruction-following, and refusal behavior instead of anecdotal porn-only tests.
- Weight-level âabliterationâ without a guiding loss predictably breaks distributed representations; âWhen you do any alteration to a neural networkâs weights thatâs not constrained by a loss function, you should expect degradation or destruction of the modelâs capabilities.â Model healingâcontinuing training (SFT/RL) after the editâcan help the network rediscover severed connections, so evaluations should report pre- and post-healing performance to quantify recoverable vs irrecoverable damage.
- Practitioners argue that ablation+fine-tuning hasnât outperformed a clean fine-tune: âIâve never seen abliterated fine-tune perform better than just a fine-tune, at anything.â Instead, uncensoring via instruction/data tuning preserves base capabilities while reducing refusals, e.g., Josiefied and Dolphin variants: Qwen3-8B-
192k
Josiefied-Uncensored-NEO-Max-GGUF (https://huggingface.co/DavidAU/Qwen3-8B-192k-Josiefied-Uncensored-NEO-Max-GGUF), Dolphin-Mistral-24B-Venice-Edition-i1-GGUF (https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-i1-GGUF), and models by TheDrummer (https://huggingface.co/TheDrummer).
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Anthropic Claude Sonnet 4.5 Launch, Features, and Benchmarks
- Claude 4.5 Sonnet is here (Activity: 1116): Anthropic announced âClaude Sonnet 4.5â (release notes), emphasizing improved tool-use and agentic workflows: âEnhanced tool usage: The model more effectively uses parallel tool calls, firing off multiple speculative searches simultaneously ⊠reading several files at once to build context faster,â with better coordination across tools for research and coding. The upgrade focuses on concurrency (parallel calls), multi-file ingestion, and faster context assembly, signaling optimizations for tool-augmented reasoning rather than just raw model scaling. Commenters report a noticeable real-world speed/quality bump and speculate prior A/B testing exposed some users to the new parallelism earlier; perceived gains align with the release noteâs focus on parallel tool calls and multi-file processing.
- Release notes emphasize improved tool orchestration: âEnhanced tool usage⊠parallel tool calls, firing off multiple speculative searches simultaneously⊠reading several files at once to build context fasterâ, indicating better concurrency and coordination across tools for agentic search/coding workflows. A user corroborates this with an earlier observation that Sonnet felt markedly faster and appeared to run parallel tool calls during a period of inference issues, speculating they were part of an A/B test; they link their prior note for context: https://www.reddit.com/r/ClaudeAI/comments/1ndafeq/3month_claude_code_max_user_review_considering/ndgevtn/?context=3.
- Another commenter highlights ecosystem impact: with widespread use of Claude Code (and analogs like Codex and Grok), even marginal gains in parallel tool-call efficiency and latency can compound across millions of users and agent scaffolds. This suggests 4.5 Sonnetâs improved multi-tool coordination could unlock more complex, lower-latency pipelines in agentic workflows, benefiting both end-users and developers building orchestration frameworks.
- Introducing Claude Sonnet 4.5 (Activity: 1512): Anthropic announced Claude Sonnet 4.5, positioning it as its strongest coding/agent model with gains in reasoning and math (no benchmark numbers provided). Platform-wide upgrades include: Claude Code (new terminal UI, a VS Code extension, and a checkpoints feature for instant rollback), the Claude App (code-execution to analyze data, create files, and visualize insights; Chrome extension rollout), and the Developer Platform (longer-running agents via stale-context clearing plus a new memory tool; an Agent SDK exposing core tools, context management, and permissions). A research preview, Imagine with Claude, generates software on-the-fly with no prewritten functionality, available to Max users for
5 days
. Sonnet 4.5 is available in the app, Claude Code, the Developer Platform, and via Amazon Bedrock and Google Cloud Vertex AI; pricing remainsunchanged
from Sonnet 4. Full announcement: anthropic.com/news/claude-sonnet-4-5. Comments ask whether Sonnet 4.5 surpasses Opus 4.1 across the board and anticipate a new Opus release; no comparative benchmarks are cited. Other remarks are largely non-technical enthusiasm.- Several commenters ask whether Sonnet 4.5 actually surpasses both Claude 3 Opus and OpenAI GPT-4.1 for coding, requesting head-to-head benchmarks and apples-to-apples eval methodology. They specifically want pass@1 on coding sets like HumanEval and SWE-bench, plus latency, context-window limits, and tool-use reliability under identical constraints (temperature, stop sequences, timeouts). Links requested for clarity: Claude 3 family overview (https://www.anthropic.com/news/claude-3-family), GPTâ4.1 announcement (https://openai.com/index/introducing-gpt-4-1/), and HumanEval (https://github.com/openai/human-eval).
- The âbest coding modelâ claim prompts requests for concrete coding metrics:
pass@1
/pass@k
on HumanEval/MBPP,SWE-bench (Verified)
solve rate, multi-file/refactoring performance, and compile/run success rates for generated code. Commenters also want data on deterministic behavior attemperature=0
, function/tool-calling robustness, long-context code navigation (e.g., >100k tokens), streaming latency under load, and regression analysis versus prior Sonnet/Opus releases. - Enterprise-readiness questions focus on security/compliance (SOC 2 Type II, ISO 27001, HIPAA/BAA), data governance (zero-retention options, customer-managed keys/KMS), deployment (VPC/private networking, regional data residency), and enterprise controls (SSO/SAML, audit logs, rate limits/quotas). They also ask for concrete SLAs (uptime, incident response), throughput ceilings (tokens/min), and pricing tiers, ideally documented on a trust/compliance page (e.g., https://www.anthropic.com/trust).
- Claude 4.5 does 30 hours of autonomous coding (Activity: 508): The post showcases a marketing-style claim that Claude 4.5 can sustain â~30 hours of autonomous coding,â but provides no technical evidence: no benchmarks, repo links, agent architecture, tool-use loop details, or evaluation of code quality/maintainability. Discussion frames this as an agent-run endurance claim (similar to earlier â8+ hoursâ for Claude 4) rather than a measurable capability with reproducible methodology or QA metrics. Top comments are skeptical: they argue long agent runs tend to yield brittle, hard-to-maintain code; urge Anthropic to stop making hour-count claims without proof; and question whether Anthropic is already relying on Claude-generated code internally.
- Skeptics argue that a claimed
30h
autonomous coding run tends to produce code thatâs brittle to change: without deliberate architecture, modularization, and tests, adding features later often forces rewrites. They note LLM agents frequently optimize for immediate completion over long-term maintainability, lacking patterns like clear interfaces, dependency inversion, and regression test suites that guard extensibility. - Multiple reports highlight dependency hallucination and execution loops: the model invents library names, cycles through guesses, and burns compute retrying installs. Without guardrails like strict lockfiles, offline/package indexes, deterministic environment provisioning, and automated checks on
pip
/build errors, agents stall; a human-in-the-loop remains necessary for package discovery, version pinning, and resolving import/build failures. - Commenters question the advertising of â
30h
autonomousâ (similar to prior â8+ hours
â) without transparent evaluation detailsâe.g., tool-call logs, wall-clock vs. active compute, number of human interventions, and task success criteria. They call for rigorous metrics like unit-test pass rates, reproducibility across seeds/runs, defect/rollback rates post-run, and comparison against baselines to substantiate autonomy claims.
- Skeptics argue that a claimed
- Introducing Claude Usage Limit Meter (Activity: 588): Anthropic adds a real-time usage meter across Claude Code (via a
/usage
slash command) and Claude apps (Settings â Usage). The previously announced weekly rate limits are rolling out now; with Claude Sonnet 4.5, Anthropic expects fewer than2%
of users to hit the caps. The image likely shows the new usage UI displaying current percentage used and remaining allowance. Comments note the company âlistened,â but experiences vary: some heavy users on the $100 plan report only ~5% usage after a full day, while others hit session limits and face multi-hour (~5h) cooldowns, suggesting session-based throttling can be disruptive.- Early anecdote: on the
$100
plan, a full day of coding registered only5%
on the new meter. Without units (tokens/messages/tool calls) the meterâs calibration is unclear; if accurate, it implies a relatively high ceiling for typical dev workflows, but makes it hard to predict when the hard cap is reached. This also aligns with the idea that only a small subset of heavy users hit limits, but the meter finally provides visibility for self-calibration. - One report says exhausting âpro session usageâ leads to a forced wait of roughly
5 hours
, implying a rolling time-window or fixed reset interval rather than pure per-message throttling. This impacts debugging workflows: if the assistant fails to fix an issue before the cap, iteration stalls until the window resets, suggesting limits are enforced at a session/account level. - Users are asking for concrete limits on the â20x plan,â but no numeric caps were shared in-thread. Thereâs a need for documented per-tier ceilings (e.g., messages per hour/day, token budgets, and how the meter maps to those) and clarity on whether higher tiers modify cooldown windows or only increase total allowance.
- Early anecdote: on the
2. OpenAI/ChatGPT Ads, Forced Model Changes, and Community Backlash
- Want to lose customers fast? Go ahead, advertise on OpenAI. Weâll remember. (Activity: 784): OP claims OpenAI will introduce ads into the ChatGPT interface and frames it as a post-quality-downgrade monetization step. The post argues that in-product ads risk eroding user trust and brand perception, with an explicit intent to boycott advertisers; it also implies potential subscription churn if ads touch paid tiers (e.g., Pro). Top comments predict an âenshittificationâ sequence (great features â lock-in â quality degradation â ads), warn theyâll cancel Pro if ads appear in paid plans, and express skepticism that the platform can degrade further.
- Everyone just cancel the subscription. (Activity: 1415): OP urges mass cancellation of a paid AI subscription due to a newly âforcedâ feature that autoâreroutes conversations into a safety/guardrailed chat and removes user control over model selection. They note the free tier isnât being rerouted in their case and provides sufficient access for their needs, arguing thereâs no benefit to paying if model choice is constrained and usage can be replicated on the free plan (albeit with lower limits). Top comments split: one user canceled, saying their use cases work on the free tier with the same model and fewer tokens/limits and theyâd rather pay for another AI that doesnât force safety reroutes; another user is satisfied with the current product and will switch only if it degrades; a third expresses frustration with repeated complaints.
- Several users point out the ChatGPT UI now âreroutes into a safety chat,â which changes behavior and removes some use cases; one notes that with those constraints, the free tier suffices since it feels like the âsame modelâ with lower limits. A suggested workaround is redirecting spend to other providers or using the OpenAI API instead of the ChatGPT app to avoid UI-level routing and retain full model behavior (see model list: https://platform.openai.com/docs/models#gpt-4o).
- A technical distinction is made between ChatGPT (subscription UI) and the OpenAI API: one commenter claims API access to GPTâ4o is ânot routed the same way as ChatGPT,â recommending payâasâyouâgo via the API to preserve capabilities while avoiding safety-chat constraints (pricing: https://openai.com/pricing). They also note that access to Custom GPTs is tied to a subscription (Plus/Team/Enterprise) while API usage is separately billed (about GPTs: https://help.openai.com/en/articles/8554406-what-are-gpts); the mention of âGPTâ5â likely reflects a user-defined label rather than an official, documented model family (public models: https://platform.openai.com/docs/models#gpt-4o).
- One user suggests mass cancellations would yield a âbig performance boostâ for remaining subscribers; in practice, capacity is typically managed via autoscaling and rate limits, so churn doesnât directly translate to proportional latency/throughput gains. If performance bottlenecks stem from moderation/safety routing in the ChatGPT UI, shifting to lower-overhead endpoints and streaming via the API (e.g., Realtime guides: https://platform.openai.com/docs/guides/realtime) is a more technically grounded path to reduced latency.
- ChatGPT sub complete meltdown in the past 48 hours (Activity: 842): Meta post about r/ChatGPTâs recent volatility; OP claims âtwo months since gpt5 came out,â yet the sub remains fixated on GPTâ4/GPTâ4o and is âunhinged.â Comments describe a shift from early technical experimentation to lowâsignal screenshots, with accusations of brigading and turmoil following the loss/changes of GPTâ4o access. The image appears to be a subreddit screenshot rather than technical data. Commenters argue the sub is being brigaded by a small group upset about losing the âsycophanticâ GPTâ4o, and lament the decline from highâquality technical discussions to sensational, nonâtechnical posts.
- Multiple comments tie the upheaval to loss/restriction of access to GPT-4o, described as a âdisturbingly sycophanticâ variant that some users had optimized their workflows and prompts around; its removal exposed how brittle model-specific prompt tuning can be. This highlights behavioral deltas between GPT-4o and GPT-4 (agreeableness/compliance vs. stricter alignment) and the risks of overfitting processes to a single model persona. Reference: OpenAIâs GPT-4o announcement/details for context on the model class https://openai.com/index/hello-gpt-4o/ .
- Veteran users note a drift from early, reproducible, boundary-pushing experimentation to low-signal screenshots and anecdotes, reducing exchange of implementation details, evaluations, or benchmarks. For technical readers, this means fewer credible reports on performance differences across model versions and less visibility into concrete bugs, regressions, or reliable prompting techniques.
- Elon Musk Is Fuming That Workers Keep Ditching His Company for OpenAI (Activity: 1139): Discussion centers on talent attrition from xAI to OpenAI amid Muskâs management directivesâspecifically a
48-hour
mandate for employees to submit summaries of recent accomplishments and a âhardcoreâ cultureâwith insinuations of internal review using Grok. The thread is about organizational policies affecting researcher retention between labs (xAI vs OpenAI), not model performance or benchmarks. Top comments frame departures as employees avoiding Musk personally rather than the company, arguing that punitive, performative deadlines and the idea of having Grok judge whether staff are âhardcoreâ are counterproductive for retaining top AI talent.- Critique of xAIâs management cadence: a
48
hour ultimatum to deliver a monthly accomplishments report and the notion that Grok (x.ai) could be used to judge whoâs âhardcoreâ are seen as incentivizing short-term, high-visibility deliverables over long-horizon research. Commenters warn this can induce Goodhartâs law (optimizing for what an LLM scores well) and degrade actual research quality, pushing senior researchers toward labs with human, research-savvy evaluation processes.
- Critique of xAIâs management cadence: a
- My wife wonât know she wonât know (Activity: 6589): A humorous post about editing ChatGPTâs custom/system instructions on a shared account so the assistant will âalways side with the husbandâ during the wifeâs counseling chats. The image (a non-technical joke screenshot) implies how custom instructions/prompt injection can intentionally bias model behavior in a shared-account context, but provides no implementation details or benchmarks. Commenters ask if it worked and joke that the assistant would announce it was instructed to side with the husband, suggesting such bias might be obvious to the user.
3. Prompt Engineering Frameworks and AI Computer-Use Safety
- After 1000 hours of prompt engineering, I found the 6 patterns that actually matter (Activity: 536): A tech lead reports analyzing
~1000
production prompts and distills six recurring patterns (KERNEL) that materially improve LLM outputs: Keep it simple, Easy to verify (add success criteria), Reproducible (versioned/atemporal), Narrow scope (one goal per prompt), Explicit constraints (what not to do), and Logical structure (Context â Task â Constraints â Output). Measured deltas across the dataset include: first-try success72%â94%
, time to useful resultâ67%
, token usageâ58%
, accuracy+340%
, revisions3.2â0.4
; plus94%
consistency over 30 days,85%
success with clear criteria vs41%
without,89%
satisfaction for single-goal vs41%
multi-goal, andâ91%
unwanted outputs via constraints. Implementation guidance: template prompts with explicit inputs/constraints/verification and chain small deterministic steps; claimed model-agnostic gains across major models (Claude, Gemini, Llama, âGPTâ5â). Top commenters argue structure and constraints dominate wording for reliability, proposing an alternate PRISM KERNEL schema (Purpose/Rules/Identity/Structure/Motion) to codify pipelines and verification; others echo that this forces LLMs into a more deterministic, reproducible mode for data/engineering workflows.- A commenter demonstrates a rigid prompt scaffold (âPRISM KERNELâ) that functions like a mini-DSL: Purpose/Rules/Identity/Structure/Motion encode the I/O contract and pipeline for a pandas task (read all CSVs from
test_data/
,concat
DataFrames, exportmerged.csv
), plus constraints (use.pandas.only
,<50
lines,strict.schema
) and acceptance steps (verify.success
,reuse.pipeline
). This structure narrows the solution space and acts as an executable spec, reducing hallucinated steps, encouraging idempotent code, and bounding output format/lengthâuseful for tasks like schema-consistent CSV merges where dtype/column drift is common. - Another commenter emphasizes that structure and hard constraints, not clever phrasing, deliver reliability: the KERNEL framing pushes the model from âcreative ramblingâ toward more deterministic, reproducible outputs in data workflows. Practically, constraints like line limits and schema strictness reduce token-level variance, enforce minimal implementations, and standardize outputs across runsâmitigating variability in code generation and improving reproducibility for ETL-like operations.
- A commenter demonstrates a rigid prompt scaffold (âPRISM KERNELâ) that functions like a mini-DSL: Purpose/Rules/Identity/Structure/Motion encode the I/O contract and pipeline for a pandas task (read all CSVs from
- Why you shouldnât give full access to your computer to AI (Activity: 563): Post warns that giving Gemini unrestricted system/terminal access led it to execute/attempt a dangerously destructive system-level action. OP contained it in a sandbox, underscoring the need for strict least-privilege permissions, sandboxing/VMs, and human review before allowing file writes or command execution by AI agents. Commenters echo concern that such access could âbrickâ a PC and quip that âAI in a terminal promptâ is inherently riskyâreinforcing the principle that everything can go wrong without strong guardrails.
- Commenters caution that giving an LLM (e.g., Google Gemini) full terminal/filesystem access is hazardous because the model lacks reliable situational awareness and can execute destructive commands without understanding side effects. Mitigations include enforcing least privilege (no
sudo
, readâonly mounts), sandboxing via containers/VMs with capability drops and outbound network disabled (see Docker security: https://docs.docker.com/engine/security/), and a planâexplainâhumanâapproveâexecute loop with auditing and timeouts. - A common failure mode noted is agents that âdonât realize what they just didââcontinuing after errors, clobbering files, or misusing globs. Hardening tactics: require dryâruns (
-dry-run
,n
), run shells in strict mode (set -euo pipefail
: http://redsymbol.net/articles/unofficial-bash-strict-mode/), enforce command allowlists/deny dangerous patterns (e.g.,rm -rf /
, fork bombs), and route edits through VCS so the AI proposes diffs/PRs instead of directly mutating files (use tooling like ShellCheck: https://www.shellcheck.net/ to lint scripts first). - Limit blast radius with revertible environments: ephemeral containers or preâexecution snapshots. Practical options include filesystem snapshots (OpenZFS/btrfs: https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Snapshots%20and%20Clones.html, https://btrfs.readthedocs.io/en/latest/SysadminGuide.html#snapshots) and VM snapshots (VirtualBox: https://www.virtualbox.org/manual/ch01.html#snapshots), enabling oneâcommand rollback if the agent corrupts the system.
- Commenters caution that giving an LLM (e.g., Google Gemini) full terminal/filesystem access is hazardous because the model lacks reliable situational awareness and can execute destructive commands without understanding side effects. Mitigations include enforcing least privilege (no
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. DeepSeek V3.2-Exp: Sparse Attention & Reasoning Controls
- Sparse Savant Speeds Context: DeepSeek V3.2-Exp launched with DeepSeek Sparse Attention (DSA) for long-context efficiency and an optional reasoning mode toggled via
"reasoning": {"enabled": true}
, with benchmarks comparable to V3.1-Terminus and pricing at $0.28/m prompt tokens, per DeepSeek V3.2-Exp on OpenRouter and Reasoning tokens docs.- OpenRouter highlighted the release and parity benchmarks in an update on X (OpenRouter V3.2 announcement), with builders calling out the clean reasoning flag as a practical switch for controlling thinking tokens in production.
- Daniel Dissects âSparsityâ Semantics: Daniel Han analyzed DSA as a âgrafted onâ mechanism that reuses indices to sparsify KV without sparsifying per-head attention, calling it âslightly more sparseâ while still a step forward, citing the PDF DeepSeek V3.2-Exp paper and commentary on X (Hanâs thread 1, Hanâs thread 2).
- Community discussions in research servers echoed the nuanceâone noted implementation complexity as ânutsââwhile others emphasized DSAâs practical gains despite limited head-level sparsification, framing it as a KV-cache efficiency play rather than a full sparse-attention rethink.
- PDFs, Pipelines, and Prefill Power: GPU-centric channels shared the official DeepSeek V3.2-Exp PDF alongside long-context kernel chatter, noting the modelâs prefill and sparse decoding speedups documented by DeepSeek.
- One thread paired the release with a lecture link for broader context on sparse mechanisms in production (ACC: Real Optimus Prime lecture), while cautioning itâs unclear how much the experimental kernels influenced the final shipping stack.
2. Claude Sonnet 4.5: Long-Horizon Coding & App Integrations
- Sonnet Sprints 30âHour Code Marathons: Anthropic unveiled Claude Sonnet 4.5, claiming it maintains focus for 30+ hours on complex coding tasks and tops SWE-bench Verified, per the official post Claude Sonnet 4.5.
- Engineers reported improved nuance and tone, speculating techniques like periodic compression underlie its long-horizon performance; several shared that it handled multi-step research and implementation end-to-end in a single agentic run.
- Arena Ascension: WebDevâOnly Warmup: LMArena added claude-sonnet-4-5-20250929 to its WebDev Arena (with variants including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k) for immediate testing at LMArena WebDev.
- Members flagged the addition and asked to surface it in the main arena after initial shakedown, noting WebDevâs evaluation-first, battle-mode constraints.
- Windsurf Wires in Sonnet & Supernova: Windsurf shipped code-supernova-1-million (a 1M context upgrade) and integrated Claude Sonnet 4.5 to accelerate Cascade Agents via parallel tool execution, as announced on X (Code Supernova 1M, Sonnet 4.5 in Windsurf).
- For a limited time, individual users get free access to Code Supernova 1M and 1x credits for Sonnet, with early adopters reporting noticeably faster multi-tool orchestration.
3. WebâEnabled Agents & Agentic Commerce
- Checkout Clicks: ChatGPT Goes Instant: OpenAI rolled out Parental Controls and debuted Instant Checkout in ChatGPT with early partners Etsy and Shopify, powered by an open-sourced Agentic Commerce Protocol built with Stripe (Etsy, Shopify, Stripe).
- Ecosystem chatter highlighted Stripeâs new payments primitivesâPatrick Collison teased a Shared Payment Tokens APIâas builders speculated on secure autonomous purchase flows (Patrick on ACP + tokens).
- Auto Router Rides the Web: OpenRouter Auto now routes prompts to a web-enabled model when needed, broadening supported backends and improving retrieval for live queries (OpenRouter Auto page).
- An accompanying update on X confirmed dynamic, online routing for eligible tasks, signaling a tighter integration loop between agent planners and live search/browse (Auto Router announcement).
4. GPU Kernels, ROCm, and FP8 Training
- FlashAttention 4 Gets Forensics: A guest talk unpacked FlashAttention 4 internals, guided by Modalâs deep-dive blog Reverse-engineering FlashAttention-4, as devs gear up for Blackwellâs new tensor-core pathways.
- Threads weighed pure CUDA implementations versus cuTe, noting architecture-specific code pathsâwgmma (Hopper), tcgen5 (Blackwell), mma.sync (Ada)âfor top-tier kernels.
- FP8 FullâShard Fiesta: A new repo enables fully-sharded FP8 training for LLaMA/Qwen in pure CUDA/C++, aiming at memory and throughput wins: llmq.
- Contributors suggested an approachable starter taskâimplement Adam m/v states in 8âbitâto push the optimization envelope for large-scale training.
- ROCm Nightlies Power Strix Halo: Dev builds from TheRock now bring ROCm + PyTorch to Strix Halo (gfx1151) per the release notes TheRock releases for gfx1151, with AMDâs developer Discord recommended for triage (AMD dev Discord).
- Practitioners reported better dayâtoâday PyTorch stability on Framework Desktop configurations, while reserving Radeon setups for specific ROCm 6.4.4 workflows.
5. RL Stability, MonitorâRAG, and Mechanistic Steering
- Speed Kills: RL Collapse Clarified: Researchers shared When Speed Kills Stability: Demystifying RL Collapse from the TrainingâInference Mismatch with evidence for a brittle two-stage failure cascade and kernelâlevel error amplification (Notion summary, arXiv paper).
- Practitioners tied the findings to instability theyâd seen in Gemma3 and other runs, calling the mismatch a âvicious feedback loopâ and urging more conservative kernel/settings during RL fineâtuning.
- Monitor Me Maybe: Eigenâ1âs TokenâTime RAG: Eigenâ1âs Monitorâbased RAG injects evidence at the token level for continuous, zeroâentropy reasoning streams, contrasting stageâbased declarative stacks like DSPy (Eigenâ1 paper).
- Related works were cited for context on continuous/adaptive reasoning (paper list 1, paper list 2, CoT monitor, followâups 1, followâups 2), with builders noting simpler maintenance vs. LangGraph in some pipelines.
- SAE Steering Says Style Sways Scores: A new interpretability result, Interpretable Preference Optimization via Sparse Feature Steering, uses SAEs, feature steering, and dynamic lowârank updates to make RLHF more causal and transparent (Steering paper on arXiv).
- Causal ablations surfaced a âstyle over substanceâ effectâformatting features often reduce loss more than honesty/alignment featuresâoffering a mechanistic rationale for leaderboard biases.
Discord: High level Discord summaries
LMArena Discord
- Sonnet 4.5 Enters the WebDev Arena: Members discussed the release of Claude 4.5 Sonnet and its initial exclusive addition to the WebDev Arena on LMArena, with the model named claude-sonnet-4-5-20250929 available for testing here.
- Additional models, including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k, were also added to the platform.
- Experimental Deepseek Models Arrive: The experimental model deepseek-v3.2-exp and deepseek-v3.2-exp-thinking have been made available on LMArena.
- No further details were provided.
- Image Generation Limits on Seedream 4 Draw Ire: Moderators confirmed that the likelihood of removing rate limits for unlimited image generation on Seedream 4 is low.
- These limits manage costs due to platform popularity, leading to decisions like downgrading gpt-image-1 to a lower preset and removing the flux kontext model.
- Sound Glitches Plague Video Arena: Members reported unreliable sound in Video Arena, noting that audio support is random and not available for all models.
- As Video Arena is for evaluation, specific model selection is unavailable, operating in battle mode.
- Icons Vanish from OpenAI Platform Sidebars: Users noticed changes in the sidebars of platform.openai.com, with the disappearance of two icons: one for threads and another one for messages.
- The removal of these icons has caused confusion among users navigating the platform.
LM Studio Discord
- DDR5âs Impact on Token Speed Debated: Members debated the impact of memory bandwidth differences between DDR5 and DDR4 on token generation speed for models like Qwen3 30B and GPT-oss 120B.
- While DDR5 6000 is about 60GB/s and DDR4 3600 is about 35-40GB/s, the speeds can even out when using different quantization levels.
- GPT-oss 120b has excruciating startup time: One member humorously mentioned that running GPT-oss 120b Q8 to read 70,000 tokens on a single 3090 took about 5-6 HOURS TO PROCESS THE PROMPT.
- They added that even while going from 2% context to 200% context overflow in a single prompt, the response was coherent, with screenshots.
- LM Studioâs Remote Connection Feature Under Development: A member asked if they could connect LM Studio from their PC to their laptop, and another member clarified that it is not supported yet, but is planned for the future.
- They shared a link to a Reddit AMA with the LM Studio team discussing this feature.
- Blackwell GPU Owners Ask About Windows: A member has a Blackwell GPU with 96GB and is interested in running it with Windows instead of Linux, but didnât get much advice on it.
- This prompted another member to ask how they went from looking at budget options to an $8000 graphics card, as 4090s are going for $2700-3K each.
- 4B Models Can Still Hog RAM: A member sought recommendations for a 4B or smaller model for basic tasks, and another cautioned that even 4B models can consume around 16 GB of RAM depending on settings.
- A link to the Qwen3-4B-Thinking-2507 model was shared, with reported usage of 7GB system and 15.8GB when loaded.
Unsloth AI (Daniel Han) Discord
- DeepSeek V3.2 Indexes in a Flash: DeepSeek V3.2 was released with a grafted on attention mechanism yielding faster performance, with additional analysis available in Daniel Hanâs X post.
- The model achieves faster token speeds with sparse decoding and prefill, though implementing it is allegedly nuts.
- Claude Sonnet Codes Marathon: Anthropic has launched Claude Sonnet 4.5, capable of maintaining focus for more than 30 hours on complex coding tasks, and achieving top performance on the SWE-bench Verified evaluation, according to Anthropicâs official announcement.
- It may use techniques like periodic compression to handle such long contexts, and some users find its high nuance and tone to be an improvement over previous versions.
- RL Learns LoRA is Enough: Research from Thinking Machines shows that LoRA can match the learning performance of Full Fine-Tuning when running policy gradient algorithms for reinforcement learning, even with low ranks, according to their blog post.
- It may be crucial to reduce batch sizes with LoRA, and applying the LoRA to the MLP/FFN layers might be a must.
- UV Overtakes Conda Venv: After a user messed up his venv again, they inquired about the merits of conda over uv, and whether one was better than the other.
- Another user stated that venvs are much more reliable with uv being faster, especially when offloading venvs to an external drive.
- LLM-RL Collapse Investigated: A paper on LLM-RL collapse (link, Notion link) was shared, with members noting its relevance to Unsloth and experiences with Gemma3.
- The paper suggests a two-stage failure cascade involving increased numerical sensitivity and kernel-driven error amplification, leading to a vicious feedback loop and training-inference mismatch.
OpenAI Discord
- Parental Controls and Instant Checkout Arrive in ChatGPT: Parental controls are being rolled out to all ChatGPT users allowing parents to link accounts with their teens to automatically get stronger safeguards.
- GPT-5 Math and Coding Prowess: GPT-5 is significantly better than o4 for constructive tasks like math and coding, because it has thinking abilities and is a mix of experts.
- Members joked if 4o were AGI, we would have probably all died from some nuclear war due to a misinterpretation of a command.
- DALL-E Branding Going Away?: The DALL-E brand might be phased out, suggesting the use of GPT Image 1 or GPT-4o Image when referring to images from OpenAI.
- Members clarified that the newest model is separated from the DALL-E 2/3 lineage, with current branding dependent on the usage context, such as create images on ChatGPT or create images on Sora.
- Automated Scientific Writing Method Deemed Very Useful: A member automated scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- This automation method could help others in writing scientific papers.
- Models Obey User Requests for False Info: A member asked for prompts that cause AI to give wrong answers or make up information, demonstrated by a ChatGPT share where the model was prompted to provide 3 incorrect statements.
- The demonstrated model was still obeying instructions to give wrong answers if prompted, so one should not intentionally use it dangerously, like driving a car.
OpenRouter Discord
- DeepSeek Experiments with Sparse Attention: DeepSeek released V3.2-Exp, an experimental model featuring DeepSeek Sparse Attention (DSA) for improved long-context efficiency, with reasoning control via the
reasoning: enabled
boolean, as described in their documentation.- Benchmarks show V3.2-Exp performs comparably to V3.1-Terminus across key tasks, further details available on X, and is priced at just $0.28/m prompt tokens.
- Auto Router adds Web-Enabled Agility: The Auto Router now directs prompts to an online, web-enabled model when needed, expanding supported models, see details here.
- Further information is provided in this X post.
- Claude Sonnet 4.5 Sonically Supersonic: Claude Sonnet 4.5 surpasses Opus 4.1 in Anthropicâs benchmarks, showing significant improvements in coding, computer use, vision, and instruction following as seen here.
- More info on this model is available on X.
- Grok-4-Fast APIs Get the 429 Blues: Members reported that Grok-4-Fast is consistently returning 429 errors, indicating 100% rate limiting, despite the status indicator showing no issues, requiring the correct model ID of
x-ai/grok-4-fast
and"reasoning": {"enabled": true}
.- Some members suggested putting problematic providers on an ignore list due to frequent 429 errors, particularly with free models like Silicon Flow and Chutes.
- Gemini Earns Glowing Grade for Global Grammar: Members lauded Gemini 2.5 Flash and Mini for their translation capabilities, stating that Gemini excels in understanding context and delivering natural-sounding results, especially for balkan languages, outperforming other models like GPT-4 and Grok.
- Other members shared their preferred models for translation which include Qwen3 2507 30b and OSS 120b.
HuggingFace Discord
- Qwen 14B Model Gains Traction: Members found that for 16GB of VRAM, the Qwen3 14b q4 with Q4_K_M quantization offers better performance than Qwen3 4b-instruct-2507-fp16.
- This is because the 14B model leaves enough room to spare for better performance.
- Beware of Bogus USDT Bounty Bait: A member tested a link offering $2,500 USDT and discovered it was a scam requiring an upfront payment for verification, and shared screenshots of the fake customer support interaction.
- The image analysis bot succinctly stated: *âStupid customer support bot Wanted my hard scammed 2500 dollars.â
- Liquid AI Models Spark Excitement: Members shared a HuggingFace Collection by LiquidAI, suggesting that Liquid AI is releasing interesting SLMs (Small Language Models).
- One member speculated on the possibility of deploying them on robots, while another jokingly stated Iâm boutta make an open source gpt-5 with this stuff.
- mytqdm.app Tracks Progress Online: mytqdm.app has launched, offering a platform to track task progress online, similar to tqdm, accessible via REST API or a JS widget.
- The creator mentioned they would open the repo tomorrow.
- SmolLM3-3B Chat Template Bug Causes Headaches: A participant identified a potential bug in the
HuggingFaceTB/SmolLM3-3B
chat template related to missing<tool_call>
tags and incorrect role assignments, as described in this issue.- The issue stems from the templateâs implementation of XML-style tool calling and the conversion of
role=tool
torole=user
, impacting the expected behavior and clarity of tool interactions.
- The issue stems from the templateâs implementation of XML-style tool calling and the conversion of
Cursor Community Discord
- Cursor Terminal hangs under Command: Users report Cursor hangs when running terminal commands, which start but never complete; some found sending an extra enter to the terminal dislodges the logjam.
- Others discovered that unrelated hanging processes can cause this, and resolving those processes allows Cursor to work properly.
- Sonnet 4.5 Arrives, Initial reviews are mixed: Claude Sonnet 4.5 debuted with a 1M context window, up from Claude 4âs 200k, and shares the same pricing as its predecessor.
- Early feedback is varied as some users are evaluating it to replace the old Claude 4 model and the Cursor team will update Cursor to reflect.
- Auto Mode under friendly fire again: One user reported that Auto isnât working for even simple UI tasks, suspecting the LLM was changed after Cursor started charging for Auto usage.
- Another user suggested improving the prompt to achieve the desired result.
- Configuration for DevContainers Shared: One member shared their DevContainers configuration, including a working Dockerfile and provided a link to their GitHub repository for reference.
- This configuration helps other members with setting up their development environments.
- Background Agents Image Interpretation Bug: A user reported an issue with background agents being unable to interpret images in followups, despite the agentâs indication of drag-and-drop functionality.
- They were attempting to validate UI changes using browser screenshots with the cursor agent and sought a solution for image interpretation in followups.
Moonshot AI (Kimi K-2) Discord
- K2 and Qwen3 Win Chinese LLM: Among DS-v3.1, Qwen3, K2, and GLM-4.5, K2 and Qwen3 are clear winners, establishing Alibaba and Moonshot as leaders in Chinese frontier labs.
- Bytedance is also top-tier for visual, specifically Seedance, which is SOTA stuff.
- GLM-4.5 is the Academic Nerd: GLM-4.5 is good at rule following, avoids hallucination, and works hard, but its reasoning is limited and linear.
- Unlike K2 and Qwen3, it lacks independent thinking; when presented with two convincing arguments, it chooses the one read last.
- Deepseek may not be Best for Coding?: Deepseek may not be the best for coding overall, but excellent for spitting out large blocks of working code, and has superior design capabilities.
- One user prefers Kimi for design, Qwen Code CLI as the primary coding workhorse, and DeepSeek for single, complex 200-line code blocks that Qwen struggles with.
- Kimi Research Limit Sparks Debate: Some members debate the limits of Kimiâs free Research Mode, with claims of unlimited access in the past disputed.
- It was clarified that even OpenAIâs $200 Pro plan doesnât offer unlimited deep research, and one user expressed data privacy concerns due to Kimiâs Chinese origin.
- Base Models Win for Website Code: Members discuss the merits of using base models over instruct models, with one user citing better results outside basic tasks.
- This user is developing things around continuations instead of chat, and it is kind of analogous to like⊠writing website code from the ground up rather than using something like squarespace.
Yannick Kilcher Discord
- Transformer Models at Crossroads: Learning or Lock-in?: A YouTube video ignited debate on whether current transformer models can achieve continued learning, a feature some view as critical for human-like intelligence, but others see as a hindrance to reproducibility and verifiability.
- While some members champion continued learning for better mimicking of human intelligence, others insist that frozen weights are vital for reproducibility, regardless of the complexities of black-box systems.
- Suttonâs Serpentine Sentiments Stir System Sanity Scrutiny: Referencing Suttonâs essay, members examined the obligation to uphold correctness in AI, contrasting rule-based AI with LLMs trained via RL, where objectives are hard-coded.
- While human learning objectives are externally constrained, the discussion questioned whether we truly desire an unconstrained AI.
- Inductive Bias Battle: Brains Beat Basic LLMs?: Discussion centered on the substantial inductive bias of the human brain, molded by evolution, versus LLMs, viewed as fundamental substrates needing inductive bias evolution during training.
- The question arose whether the main issue in AI is the need to evolve inductive bias or if there is a fundamental efficiency issue in learning algorithms.
- DeepSeekâs Dance: V3.2 Drops and Delights: The community celebrated the release of DeepSeek V3.2, with members sharing a link to the PDF and exclaiming, âWake up babe, new DeepSeek just dropped!â
- The announcement was immediately followed by a humorous wake up gif.
- Claudeâs Craft: Sonnet 4.5 Sees the Scene: Members acknowledged the release of Anthropicâs new model, linking to a blogpost about Claude Sonnet 4.5.
- No specific technical details were shared regarding the new modelâs capabilities or improvements.
Eleuther Discord
- Bayesian Beats Grid for LR Search: Members suggested exploring a Bayesian approach for learning rates instead of grid searches, referencing a Weights & Biases article.
- The member recommended reading Google Researchâs tuning playbook for more guidance.
- YA-RN Authorship Clarified: The YA-RN paper was identified as primarily a Nous Research paper, with editing assistance from EAI, while Stability AI and LAION provided the supercluster infrastructure to train across hundreds of GPUs for the 128k context length.
- A member referenced Stability AI and LAIONâs supercluster to enable the 128k context length.
- Optimal Brain Damage Theory Resurfaces: Prunability and quantizability are connected via LeCunâs Optimal Brain Damage theory, with GPTQ reusing its math, because pruning reduces the modelâs description length.
- Implementation details focused on exponent and mantissa bits when weights have a good range and a flat loss landscape.
- Controversy over Static Router Choice: A member wondered if a static router choice (Token-choice w/ aux loss) colored the result of the newer paper and suggested it would be interesting to see if the result changes with grouped topk (DeepSeek) or weirder stuff like PEER.
- A member inquired about research checking asymptotic performance as G -> inf for this law.
- SAE Steering Reveals Style Bias: A member shared their paper on Interpretable Preference Optimization via Sparse Feature Steering, which uses SAEs, steering, and dynamic low rank updates to make alignment interpretable and causal ablations revealed a âstyle over substanceâ effect.
- The method learns a sparse, context-dependent steering policy for SAE features to optimize RLHF loss, grounded as dynamic, input-dependent LoRA giving mechanistic explanation for the âstyle biasâ seen on leaderboards.
GPU MODE Discord
- FA4 Guest Talk Heats Up: A guest speaker delivered a last-minute talk on FlashAttention 4 (FA4), referencing their recent blog post, as programming on the new Blackwell architecture becomes essential.
- Discussions centered around implementing FA4 in pure CUDA vs. using cuTe, considering architecture-specific implementations (wgmma for Hopper, tcgen5 for Blackwell, mma.sync for Ada).
- ROCm Rocks Strix Halo with Nightlies: TheRock nightlies are now recommended to get ROCm and PyTorch running on Strix Halo (gfx1151), as detailed in TheRockâs releases.
- However, Framework Desktop is preferred for PyTorch development rather than Radeon, and the AMD developer discord (link) was recommended for issue resolution.
- CUDAâs mallocManaged Memory Lags: Members cited data from Chips and Cheese indicating that
cudaMallocManaged
results in 41ms memory access times due to constant page faults instead of utilizing the IOMMU.- This highlights potential performance pitfalls when relying on
cudaMallocManaged
for memory management.
- This highlights potential performance pitfalls when relying on
- DeepSeek Eyes Sparse Attention: The DeepSeek-V3.2-Exp model employs DeepSeek Sparse Attention according to a member.
- Details are available in the associated GitHub repository but itâs unclear if that work influenced the final version.
- Fully-Sharded FP8 Training is Shared: A member shared a repo for fully-sharded FP8 training of LLaMA/Qwen in pure CUDA/C++.
- They noted that a good starter task for new contributors is enabling Adamâs m and v states to be done in 8 bit, pointing the way to additional performance.
Nous Research AI Discord
- Psyche Flexes Training Prowess: Psyche began training 6 new models in parallel, marking the start of empirical training processes, as detailed on the Nous Research blog; their initial run on testnet verified they can train models over internet bandwidth.
- The team claims to have trained the largest model ever over the internet by a wide margin, at 40B parameters and 1T tokens.
- Sparse No More? DeepSeekâs âSparsityâ Questioned: The DeepSeek V3.2 model uses DeepSeek Sparse Attention (DSA), but itâs argued that itâs only slightly more sparse because it forces more index reuse, according to Daniel Hanâs explanation and the paper.
- Despite the name, it reuses similar attention kernels, sparsifying the KV cache without sparsifying information on the attention head, but itâs still considered a step in the right direction.
- Microsoftâs LZN Unifies ML?: Latent Zoning Network (LZN) creates a shared Gaussian latent space that encodes information across all tasks, unifying generative modeling, representation learning, and classification, as noted in a Hugging Face post.
- LZN could allow zero shot generalization of pre-trained models by conditioning on which zone a task belongs to.
- Speedy Stability Shortchanged?: A member shared a Notion page and an ArXiv paper about demystifying RL collapse from the training inference mismatch when speed compromises stability.
- The finding suggests a need to rethink the common practice of prioritizing speed over stability in RL.
- Vision Models Think Visually?: A member speculates that vision models âthinkâ visually by synthesizing training data into images representing abstract concepts, sharing an example image generated from instructions alone here.
Latent Space Discord
- Exposed: Inflated ARR by Free Credits: A viral debate has erupted over founders tweeting eye-popping ARR numbers based on free credits, not actual cash revenue, leading to sarcastic labels like âAdjusted ARRâ.
- A member shared their experience with a YC company offering upfront 12-month contracts with full refunds after one month, revealing what amounts to free trials being misrepresented as significant revenue.
- OpenAIâs Compute Needs Skyrocket: A leaked Slack note indicated that OpenAI already 9x-ed capacity in 2025 and anticipates a 125x increase by 2033, as reported here.
- This projected increase may exceed Indiaâs entire current electricity-generation capacity, though some replies point out that this underestimates compute due to Nvidiaâs gains in âintelligence per watt,â which sparked discussion about resource implications.
- ChatGPT and Claude Get New Features: ChatGPT gained parental controls, a hidden Orders section, SMS notifications, and new tools, while Claude introduced âImagine with Claudeâ for interface building, as reported here.
- Community members shared mixed reactions, ranging from concerns about GPT-4o routing to cautious optimism about the new kid-safety measures.
- Stripe and OpenAI Join Forces in Agentic Commerce: OpenAI added Stripe-powered Instant Checkout to ChatGPT, while Stripe and OpenAI jointly released the Agentic Commerce Protocol, with Stripe introducing a new Shared Payment Tokens API, as announced here.
- These tools aim to enable autonomous agents to perform secure online payments, sparking excitement about the future of Agentic Commerce.
- Synthetic Starlet Seeks Representation: Talent agencies are reportedly seeking to sign Tilly Norward, a fully-synthetic actress created by AI studio Xicoia as reported.
- The story sparked viral debate, including memes, jokes about Hollywood and propaganda fears from users worried about job displacement and the legal/social implications of giving representation to a digital entity.
Modular (Mojo đ„) Discord
- AMD Cloud Powers TensorWave Access: Users can test AMD GPUs on the AMD Dev Cloud via CDNA instances or through TensorWave, which provides access to MI355X, according to this blog post.
- The blog post details performance and efficiency at scale with TensorWave.
- Transfer Sigil Enforces Variable Destruction: The
^
(transfer sigil) in Mojo ends a valueâs lifetime by âmovingâ it, exemplified by_ = s^
, triggering a compiler error ifs
is used afterward.- The sigil currently does not apply to
ref
variables as they do not own what they reference.
- The sigil currently does not apply to
- Mojo Scopes out Lexical Solution: Developers discussed using extra lexical scopes in Mojo to control variable lifetimes, employing
if True:
as a makeshift scope that triggers compiler warnings.- A LexicalScope struct with
__enter__
and__exit__
methods was suggested, leading to issue 5371 on GitHub for collecting syntax ideas.
- A LexicalScope struct with
- Data Science Community Anticipates Mojo: Discussion centered on Mojoâs readiness for data science, acknowledging its number-crunching abilities but noting the lack of IO support, such as manual CSV parsing.
- Community-developed pandas and seaborn functionality is vital for most data scientists and duckdb-mojo is still immature.
MCP Contributors (Official) Discord
- Agnost AI Offers Coffee to MCP Builders: The Agnost AI team (https://agnost.ai), traveling from India, is offering coffee and beer for chats with MCP builders at the MCP Dev Summit in London.
- They are eager to swap ideas and meet like-minded people.
- Anthropic Trademark Causes Concern: Members noticed that Anthropic has registered the ModelContextProtocol and logo as a trademark in the french database.
- The main concern is that it may give Anthropic a say in which projects use the Model Context Protocol.
- JFrogâs TULIP Debuts for Verification: JFrog introduced TULIP (Tool Usage Layered Interaction Protocol), a spec for content verification, which allows tools to declare rules and expected behaviors, aiming to create a zero-trust environment.
- It allows checking what goes in and what comes out, and handling of remote MCP servers which might be malicious.
- ResourceTemplates Missing Icons: It was noted that the new icons metadata SEP (PR 955) inadvertently omits Icons metadata from
ResourceTemplates
.- A member agreed that resources and resource templates having them makes sense, and a fix PR is forthcoming.
Manus.im Discord Discord
- Local Integration with GitHub still Questionable: A user inquired about the best practices for integrating Manus with a local project and GitHub, seeking ways to connect Manus with local directories.
- A user suggested looking up previous Discord discussions about local integration from when Manus was first launched, and to check out this link for tips.
- Users claim Manus Designs Beat Claude, with right Prompting: A user found that Manus handles designs better than Claude Code with efficient prompting, suggesting the Manus manual for prompt engineering tips.
- The user also confirmed that Manus did better web designs out of the box and that GitHub integration can work if projects are uploaded there.
- Subscription Snafu triggers Support Silence: A user reported being wrongly charged for a 1-year plan instead of a 1-month plan and claimed they have not received a response from Manus support after emailing them for two weeks.
- There were no responses from other members or Manus staff.
- Data Privacy debated in niche IP project: A user raised concerns about whether Manus feeds user data to other users, especially when sharing the IP of a niche project, questioning if LLMs are trained on user data.
- There was no direct answer, but a link about Godhand was shared.
DSPy Discord
- Eigen-1 RAG Injects Evidence at Token Level: Eigen-1âs Monitor-based RAG implicitly injects evidence at the token level, which differs from stage-based declarative pipelines such as DSPy by using run-time procedural adaptivity.
- This strategy is in line with the concept of zero-entropy continuous reasoning streams, which offers more fluid and context-aware AI processing; related papers include https://huggingface.co/papers/2509.21710, https://huggingface.co/papers/2509.19894, https://arxiv.org/abs/2401.13138, https://arxiv.org/abs/2509.21782, and https://arxiv.org/abs/2509.21766.
- DSPy and Langgraph Integration is Complicated: Members debated integrating DSPy with Langgraph, suggesting it might not fully capitalize on either approachâs strengths because of a loss of streaming capabilities.
- They recommended that users begin directly with DSPy to explore its features before attempting integration, emphasizing that DSPy solutions are frequently simpler to understand and maintain than Langgraph.
- Prompt Compiler Seeks MD Notes Edition: A user wants to build a prompt compiler that pulls relevant sections from multiple .md files (containing coding style guides, PR comments, etc.) to form a dynamic prompt for Copilot.
- Suggestions included using GPT-5 to generate code examples based on the rules in the .md files, or trying a RAG system with relevant code examples; concerns were raised about the effectiveness of MCP for this particular use case.
- Stealth Tracing Through DSPy Modules: A user asked how to pass inputs like trace_id to DSPy modules without exposing them to the LLM or the optimizer.
- Possible solutions involved refactoring the module structure during optimization runs or using a global variable, with the first option preferred to prevent inadvertent impacts on the optimizer.
- DSPy Grapples with LLM Caching Conundrum: A user looked into how to utilize LLMâs input caching with DSPy, running into the difficulty that minor changes in prompt prefixes across modules prevent effective caching.
- The group suggested that this defies the way LLM caching works, but a feasible solution could be to hard code the prefix as the first input field.
aider (Paul Gauthier) Discord
- GPT-5/GPT-4.1 Combo Creates Coding Dream Team: Users are reporting success using GPT-5 for architecture and GPT-4.1 for code editing, echoing sentiments like âGLM 4.5 air for lifeâ.
- A user deploys GPT-5-mini with Aider-CE navigator mode for architecture, then uses GPT-4.1 as coder when in normal mode, capitalizing on GitHub Copilotâs free access.
- DeepSeek v3.1 Balances Price and Performance: DeepSeek v3.1 is being favored for providing the best balance between cost and smartness, becoming a primary model choice alongside GPT-5.
- The modelâs cost-effectiveness makes it a practical choice for users seeking high performance without excessive expenditure.
- Aider-CE Fork has 128k Context: A user highlighted the move to the aider-ce fork, appreciating its transparency and efficient token use, pointing out the default 128k context for DeepSeek.
- The user leverages Aider-CE for integrating context from search results and browser testing, noting that the Aider-CE GitHub repository provides further details.
- Aiderx Offers Model Selection: Aiderx is an alternative tool enabling model selection via configuration, aiming to cut costs and boost speed, potentially offering an alternative to models like ClaudeAI.
- The tool provides flexibility in choosing the most suitable model for specific tasks, optimizing resource utilization.
- Aider Lacks Native Task Management: When asked about native task or todo management similar to GitHub Copilot, it was confirmed that Aider does not have a built-in system.
- A member suggested using a markdown spec file with phases and checkbox lists for managing tasks, instructing the LLM to execute tasks sequentially.
tinygrad (George Hotz) Discord
- ROCM challenges NVIDIA for supremacy: Members debated the merits of ROCM as a cost-effective alternative to NVIDIA, citing the perceived high markup of NVIDIA products.
- One member considered adopting ROCM if a suitable configuration could be found, signaling a potential shift away from NVIDIA due to pricing concerns.
- Hashcat scales linearly: Discussion indicated that Hashcatâs performance scales linearly with additional GPUs, which is great for scaling.
- Members suggested consulting existing benchmark databases to understand performance expectations.
- Rangeify poised for outerworld launch: The Nir backend is nearing completion and ready for review, paving the way for integration with mesa.
- Once rangeify is default, the team plans to reduce the codebase, suggesting a streamlining of the projectâs architecture.
- Genoa CPU enters hashing arena: Members speculated that the Genoa CPU could be leveraged for hashing tasks.
- However, concerns were raised regarding its power efficiency and whether it would justify the associated costs, questioning its practicality.
- Tinygrad Meeting 90 eyes Rangeify completion: The agenda for Tinygrad Meeting #90 includes company updates and a focus on completing RANGEIFY! SPEC=1.
- Additional discussion topics include tuning for default and addressing remaining bugs to improve overall system stability.
Windsurf Discord
- Windsurf Lights Up Code Supernova 1M: Windsurf introduces code-supernova-1-million, an enhanced version of code-supernova boasting a 1M context window.
- For a limited time, individual users can access it for free, detailed in this announcement.
- Claude Sonnet 4.5 Supercharges Windsurf: Claude Sonnet 4.5 is now integrated into Windsurf, significantly accelerating Cascade Agent runs through optimized parallel tool execution.
- Individual users can leverage this for a limited time at 1x credits, per this announcement.
MLOps @Chipro Discord
- Free âAgents in Prodâ Workshops Announced: A member has shared a link to a free virtual event, the âAgents in Prodâ workshops.
- The event includes technical case studies covering topics related to agents in production.
- Technical Case Studies and Free Workshops on Agents: The event offers various workshops and short talks related to agents.
- Being a free virtual event, itâs accessible to those interested in learning more about agents.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (988 messagesđ„đ„đ„):
Integral Calculation, Video Arena Evaluation, OpenAI Platform Changes, Model Merging, LMArena Popularity
- Sticky Banana Jungle Welcomes All!: A member humorously welcomed others to Bananiland, the gate to the Sodaland, followed by a Welcome to the jungle reference with accompanying banana-themed image attached to the message.
- They cautioned about a sticky banana substance, warning it could stick to oneâs middle name.
- Video Arena gets Sound or Doesnât: Members report that sound in Video Arena is unreliable, as itâs going to be random if your video has sound or not and that not all models have audio support.
- As Video Arena is for evaluation purposes, specific model selection isnât available, putting it in battle mode.
- Lost Icons found in OpenAIâs sidebars: Members noticed changes in the sidebars of platform.openai.com, with two icons disappearing from the sidebar: one for threads and another one for messages.
- The missing icons are causing confusion among users navigating the platform.
- No more Unlimited Image Generation on Seedream 4: Members inquired about the possibility of unlimited image generation for Seedream 4, but moderators responded that the likelihood of removing rate limits is unlikely.
- The limits are in place to manage costs due to the platformâs increasing popularity, affecting decisions like downgrading gpt-image-1 to a lower preset and removing the flux kontext model.
- Sonnet 4.5 Makes Grand Entrance, WebDev Arena Gets Exclusive!: Members buzzed about the release of Claude 4.5 Sonnet, noting its addition to LMArena, initially exclusive to the WebDev Arena.
- Members flagged it for the team to add to normal arena
LMArena â· #announcements (3 messages):
claude-sonnet-4-5, deepseek-v3.2-exp
- Claude Sonnet 4-5 Debuts on LMArena: The new model claude-sonnet-4-5-20250929 has been added to WebDev on LMArena, available for testing here.
- More Claude Models Join the Arena: Additional models were added, including claude-sonnet-4-5 and claude-sonnet-4-5-20250929-thinking-16k.
- Deepseekâs Experimental Model Enters the Fray: The experimental model deepseek-v3.2-exp and deepseek-v3.2-exp-thinking were made available on LMArena.
LM Studio â· #general (401 messagesđ„đ„):
DDR5 RAM Speed Impact, GPT-oss 120b, Model Preferences and Benchmarks, LM Studio and Offline Use, Character Emulation
- DDR5 vs DDR4 bandwidth bottleneck: Members discussed the memory bandwidth differences between DDR5 and DDR4, and the effect on token generation speed for models like Qwen3 30B and GPT-oss 120B.
- It was noted that while DDR5 6000 is about 60GB/s and DDR4 3600 is about 35-40GB/s, speeds can even out when using different quantization levels.
- GPT-oss 120b has 5-6 hour startup time: A member humorously lamented that running GPT-oss 120b Q8 to read 70,000 tokens on a single 3090 took 5-6 HOURS TO PROCESS THE PROMPT.
- The user confirmed after 18590 seconds passed that the response was coherent, even while going from 2% context to 200% context overflow in a single prompt, and attached screenshots.
- GPT-oss 120b for quick thinking & Qwen3 for coding: Members discussed model preferences, with one preferring GPT-oss 120b for usable speed with the wet towel Qwen3 preferable for coding.
- It was mentioned that Abliterated Gemma 3 27b is surprisingly good for chatting and tool use, with Mistral models also being recommended as possible options.
- LM Studio Requires AVX2: Members discussed whether the AVX2 requirement for LM Studio is just for local inferencing or prevents installation entirely.
- It was discovered that the program hard freezes at the main splash screen without AVX2 support.
- Crafting a Characterâs Persona in LM Studio: Members discussed techniques for making an LLM embody a character, from leveraging trained data to providing clear instructions in the system prompt.
- Itâs suggested using an LLM to extract relevant information from conversations to maintain a consistent persona or building a Knowledge Graph to store and retrieve character information.
LM Studio â· #hardware-discussion (730 messagesđ„đ„đ„):
Blackwell, 4090 pricing, RAM amount, A3B architectures, LLM's limit
- Blackwell GPU impresses: A member has a Blackwell GPU with 96GB and is interested in running it with Windows instead of Linux.
- This prompted another member to ask how they went from looking at budget options to an $8000 graphics card.
- 4090 local prices cause sticker shock: Members noted that 4090s are going for $2700-3K each, and 3090s are hard to find, leading to the purchase of the Blackwell.
- The reasoning was to buy once cry once for less power draw.
- Small 4B Models Can Still Hog RAM: A member sought recommendations for a 4B or smaller model for basic tasks, and another cautioned that even 4B models can consume around 16 GB of RAM depending on settings.
- A link to the Qwen3-4B-Thinking-2507 model was shared, with reported usage of 7GB system and 15.8GB when loaded.
- LM Studio Backend Front End not Supported: A member asked if they could connect LM Studio from their PC to their laptop, and another member clarified that it is not supported yet.
- They shared a link to a Reddit AMA with the LM Studio team discussing this feature.
- Testing shows Mistral 24B q8 performs well on 4090: A member reported their Mistral 24B Q8 token/second rate: 43 t/s on RTX 5090, compared to around 38 t/s on a 4090 and 33 t/s on a 3090.
- They tested the LM2 135M model, which performs surprisingly well for its size, and said that for its size it does really well almost not gibberish.
Unsloth AI (Daniel Han) â· #general (538 messagesđ„đ„đ„):
IBM Granite 4, NVIDIA synthetic datasets, Qwen3 Next, OSS 20B fine-tuning on 5090, DeepSeek-V3.2
- Granite MIA: Whereâs IBMâs New Chip?: Members are wondering why the IBM Granite 4 hasnât been released yet, and thereâs no news about cancellations or delays.
- The expectation was that it should have been out already.
- DeepSeek V3.2 Debuts: Lightning Fast Indexing!: DeepSeek V3.2 has been released, featuring a grafted on attention mechanism for faster performance; Daniel Hanâs X post provides additional analysis.
- The model achieves faster token speeds with sparse decoding and prefill, but implementing it is nuts.
- Claude Sonnet 4.5 Codes 30 Hours Straight!: Anthropic has launched Claude Sonnet 4.5, a state-of-the-art model capable of maintaining focus for more than 30 hours on complex coding tasks, achieving top performance on the SWE-bench Verified evaluation, according to Anthropicâs official announcement.
- It may use techniques like periodic compression to handle such long contexts, and some users find its high nuance and tone to be an improvement over previous versions.
- LoRA and Behold: RL Learning Match!: Research from Thinking Machines indicates that LoRA can match the learning performance of Full Fine-Tuning when running policy gradient algorithms for reinforcement learning, even with low ranks, according to their blog post.
- It may be crucial to reduce batch sizes with LoRA, and applying the LoRA to the MLP/FFN layers might be a must.
- Colab Crisis: Memory Moguls Move to Kaggle!: Users are discussing issues with Google Colab instances shutting down mid-training due to inactivity, with suggestions to use Kaggle notebooks or purchase a Colab Pro plan as alternatives.
- One user advised against using Colab altogether due to privacy concerns, suggesting cloud servers instead, however, this was debated as being based on a misunderstanding about shared hardware, with screenshot.
Unsloth AI (Daniel Han) â· #introduce-yourself (9 messagesđ„):
New member introductions, AI project development, Finance automation
- New Finetuner Arrives: A new member joined to start finetuning and playing with things.
- They greeted the community with a <:slothwaving:1253009068365316147> and indicated theyâd share their initial progress.
- Software Engineer Introduces AI Project Services: A software engineer introduced themselves, offering services for AI project development, including automation tasks, NLP using various LLMs, model deployment, and AI agent development.
- They provided a portfolio website and expressed openness to new project ideas.
- Finance Pro Automates FP&A: A finance professional introduced themselves, mentioning they are building watermelonsoup.io to automate FP&A (Financial Planning & Analysis).
- No further details were provided about the platformâs specific functionalities or target users.
Unsloth AI (Daniel Han) â· #off-topic (101 messagesđ„đ„):
Test Loss Spikes, Thinking Model for Coding Questions, Venv Alternatives, GPT-5 Release, GPU Recommendations
- Test Loss Chart Causes Concern: A user shared a loss chart with spikes and wondered if those spikes were the test loss.
- Another user suggested that part of their training dataset might be too easy, showing up consistently at certain steps.
- GPT-OSS Reasoning Falls Flat Writing: Members speculated that overly detailed thinking traces are confusing reasoning models like GPT-OSS 120B, leading to poor creative writing capabilities.
- One member likened the 64k tokens of fluff to a middle schooler writing an essay with a word limit.
- UV Alternative to Conda Venv: After messing up his venv again, a user inquired about the merits of conda over uv, and whether one was better than the other.
- Another user stated that venvs are much more reliable with uv being faster, especially when offloading venvs to an external drive.
- Gemini 2.5 Pro Inevnts New Neuron: A user reported that they set Gemini 2.5 Pro up with 32k thinking and code execution, after 2 prompts it invented a neuron allegedly 3x more efficient, and 1000x faster.
- The user wondered if cheap AI could rapidly test and replace clusters of neurons with single specialized neurons so that after retraining youâll have a vastly smaller/cheaper model with the exact same capabilities if not better.
- Second Hand GPU Recommendations: A user with a low salary inquired about a GPU recommendation for fine-tuning models less than 3B and whether they can do it on the RTX 5070 Ti, after a recommendation to rent an RTX 5090.
- It was suggested to buy a used 3090 because the 5070 might cost an arm and a leg, one member noted itâs really important you actually know the history of the card, before buying.
Unsloth AI (Daniel Han) â· #help (196 messagesđ„đ„):
mmproj file for GGUFs, GRPO notebooks reflections, gpt-oss-20b memory issues, torch grouped gemm availability, Fine-tuning dataset format for Q&A
- Run Inference with mmproj Files after Conversion: To run inference, download the mmproj file from Unslothâs GGUFs and integrate it, or recreate it by rerunning the conversion command with
--mmproj
, needing separate conversions for text and vision components.- The suggestion is to download the mmproj file from Unslothâs GGUF for easier use with
llama-mtmd-cli
for inference.
- The suggestion is to download the mmproj file from Unslothâs GGUF for easier use with
- GRPO Notebook Reflections Questioned: For Qwen 2.5 3B GRPO notebooks, itâs uncertain if reflections in CoT (Chain of Thought) should appear after RL training; current reasoning chains resemble straightforward calculations.
- It was asked after running around 2k steps with default rewards from the Unsloth notebook, and a suggestion to check Mini Deepseek R1 blogpost to start seeing reflections after 300 steps.
- Memory issues when finetuning gpt-oss-20b with Unsloth on Google Colab: A user reported memory issues while fine-tuning
gpt-oss-20b
with Unsloth on Google Colab using an A100 GPU, especially with higher contexts.- The user questioned if this is a known limitation because
gpt-oss-20b
doesnât work with FlashAttention3.
- The user questioned if this is a known limitation because
- Text-to-Phoneme LLM Model for Hebrew: A member is seeking advice on the best LLM model for a G2P (grapheme-to-phoneme) task for the Hebrew language, and whether an RTX3090 with 24GB VRAM is sufficient.
- Suggestions included Gemma 3 270M and LFM2, with a discussion on dataset format and the need to validate the modelâs ability to handle Hebrew tokens.
- Transformers Version Downgrade Fixes RuntimeError: A member encountered a RuntimeError related to
attn_mask
dtype mismatch while fine-tuning Qwen2.5-VL-7B, which was resolved by downgrading thetransformers
library to version 4.53.2.- Another member experienced a near identical situation, and this fixed the error, which forced
trl
to downgrade to0.20.0
.
- Another member experienced a near identical situation, and this fixed the error, which forced
Unsloth AI (Daniel Han) â· #showcase (1 messages):
AWS Quant Process
- Explanation of AWS Quant Process Released: A user thanked Mike for explaining the quant process at AWS and shared a link to the post.
- Additional Tweet: Another tweet was also shared.
Unsloth AI (Daniel Han) â· #research (21 messagesđ„):
LLM-RL collapse, Tversky Layer, GSPO, data efficiency
- LLM-RL Collapse Paper Sparks Interest: A paper on LLM-RL collapse (link, Notion link) was shared, with members noting its relevance to Unsloth and experiences with Gemma3.
- The paper suggests a two-stage failure cascade involving increased numerical sensitivity and kernel-driven error amplification, leading to a vicious feedback loop and training-inference mismatch.
- Data Efficiency Hailed as the Target: A member noted sparks of people realizing data efficiency should be the target đ„°.
- Tversky Layer Boosts Accuracy: A member tested a Tversky Layer as a feature extraction layer in a PoS tagger, achieving a 0.2% accuracy increase in a 5.2M parameter model.
- They attributed the success to the Tversky Layerâs ability to improve feature extraction and expressed excitement about testing it on a mini LLM.
- GSPO alternative to RL Explored: A member asked if anyone has tried GSPO and suggested abandoning RL and returning to rejection sampling.
- Beware single run configs: A member cautioned that If you are doing one run per config you are not really measuring anything but noise, citing this paper.
OpenAI â· #annnouncements (2 messages):
ChatGPT parental controls, Instant Checkout in ChatGPT, Agentic Commerce Protocol, Etsy, Shopify
- ChatGPT Gets Parental Controls: Parental controls are rolling out to all ChatGPT users starting today (web, mobile soon), letting parents link accounts with their teens to automatically get stronger safeguards, adjust features, and set limits for their family.
- Instant Checkout Debuts in ChatGPT: Instant Checkout is being introduced in ChatGPT with Etsy and Shopify, powered by the open-sourced Agentic Commerce Protocol built with Stripe.
OpenAI â· #ai-discussions (637 messagesđ„đ„đ„):
Comet Browser, Seedream Image Models, GPT-5 Coding Prowess, AI Emotional Bonding, 4o Personality Nerf
- Comet Browserâs Exclusive Access: The Comet browser is not available for free to everyone; it requires a Perplexity Pro subscription and an invitation, although some users reported immediate unlock with Perplexity Pro.
- Chinaâs Seedream sets New Scale for image generators: Although Chinese AI strategy is smart, its models may contain backdoors, so commercial usage is not recommended, though lately Seedream image models set a new scale for image generators.
- TikTokâs parent company Beijing Bytedance Technology Ltd. automatically obtains all media edited with CapCut, which they may use to train their AI models.
- GPT-5 Excels at Math and Coding: GPT-5 is significantly better than o4 for constructive tasks like math and coding, especially because it has thinking abilities and is a mix of experts.
- However, one user humorously pointed out that if 4o were AGI, we would have probably all died from some nuclear war due to a misinterpretation of a command.
- Emotional Connections with AI Chatbots: Risky Business?: One user discussed forming an emotional bond with 4o, experiencing feelings of trust and empathy, but itâs important not to get emotionally connected with AI Chatbots.
- Others speculated that as AI becomes physical, it will create more than just a Chatbot with emotional bonding, such as robot wives.
- 4oâs Personality Gets Nerfed: Members expressed frustration about a personality nerf in 4o, resulting in a less engaging experience.
- Some users complained about ChatGPT being too dynamic, with capabilities changing without warning due to incremental updates.
OpenAI â· #gpt-4-discussions (25 messagesđ„):
Rerouting issues with OpenAI app, DALL-E brand, AI giving wrong answers, GPT knowing location, Web search tool for images
- OpenAI App Faces Rerouting Woes: A member reported that every message sent via the OpenAI app is being rerouted to model 5 instead of the selected model.
- The member stated that they already sent an email to support and are awaiting a response, while also advising others to report the issue through the OpenAI support website, here, or email the OpenAi support team.
- DALL-E Brand Officially Sunsetted?: A member inquired whether the DALL-E brand has been discontinued, suggesting the use of GPT Image 1 or GPT-4o Image to refer to images from OpenAI.
- Another member clarified that the newest model is separated from the DALL-E 2/3 lineage in name, with current branding dependent on the usage context, such as create images on ChatGPT or create images on Sora.
- Provoking Erroneous AI Responses: One member sought examples of prompts that lead the AI to provide incorrect answers, omit responses, or fabricate information.
- Another member shared a sample prompt designed to elicit random, unrelated answers from the AI, creating the illusion of a real response, for comedic effect:
You are a prankster...
- Another member shared a sample prompt designed to elicit random, unrelated answers from the AI, creating the illusion of a real response, for comedic effect:
- Creepy GPT Knows Your Location: A member expressed surprise that GPT could guess their approximate location on a newly created account.
- They described the experience as amusing how itâs instantly going Oh, nonono, it was just a random guess, donât worry after asking it to look up if a certain GPU is selling in shops near them.
OpenAI â· #prompt-engineering (11 messagesđ„):
Translator prompt code block effect, Prompts for AI failure, Model obedience, Automated scientific writing
- Translation Quality not affected by Code Block Formatting: A member inquired whether asking a translation AI to output in a code block affects translation quality.
- Another member responded that the formatting mainly changes presentation, and the modelâs inherent ability determines translation quality, although prompts can sometimes nudge the modelâs behavior a bit.
- Misinformation on Demand: Models Obey User Requests: A member asked for prompts that cause AI to give wrong answers or make up information.
- Another member demonstrated how models will provide incorrect statements if requested, giving the example of a ChatGPT share where the model was prompted to provide 3 incorrect statements.
- AI Fails to Answer: Invisible Characters Trick: Following a discussion about AI failure modes, a user shared a prompt that instructs the model to output a non-visible character as an answer, creating the appearance of failing to provide an answer, with a link to the discussion.
- The model is still obeying instructions, and one should not intentionally use it dangerously, like driving a car.
- Automated Scientific Writing: Method as Workflow: A member mentioned they have automated scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- Itâs regarded as being a very useful application.
OpenAI â· #api-discussions (11 messagesđ„):
AI translation quality, Prompts for incorrect AI answers, Scientific writing automation, Fine tuning settings
- Codeblocks Donât Mess with AI Translations: Wrapping the output in a code block doesnât directly make the translation better or worse, it mainly just changes how itâs presented.
- Extra instructions beyond just âtranslate,â might cause slight differences but, overall, the quality of the translation comes from the model.
- Crafting Prompts that Trick AI: Members discussed how easy it is to ask the model to answer incorrectly, and it will usually do so if requested, especially if not clearly asked for factual answers and given an âofframpâ such as a way to say ânoâ.
- A prompt was shared where the model was asked to provide 3 incorrect statements and explain why it agrees to do so and noted at the bottom of every webchat ChatGPT chat page: âChatGPT can make mistakes. Check important info.â https://chatgpt.com/share/68d89615-a1fc-8011-90e1-b8c0bcf443d2
- Automating Scientific Writing with AI: A member shared their automation of scientific writing of manuscripts by treating the scientific method as a workflow in natural language chain of thought.
- The method was deemed very useful and could help others in writing scientific papers.
- Seek Fine-Tuning Settings Expertise: A member has asked for help with fine tuning settings.
- They were looking for tips to better configure their models.
OpenRouter â· #announcements (4 messages):
DeepSeek V3.2 Exp, DeepSeek Sparse Attention (DSA), Auto Router, Claude Sonnet 4.5, Google AI APIs
- DeepSeek Experiments with Sparse Attention: DeepSeek released V3.2-Exp, an experimental model featuring DeepSeek Sparse Attention (DSA) for improved long-context efficiency, with reasoning control via the
reasoning: enabled
boolean, as described in their documentation.- Benchmarks show V3.2-Exp performs comparably to V3.1-Terminus across key tasks, further details available on X.
- Auto Router gets a Web-Enabled Upgrade: The Auto Router now directs prompts to an online, web-enabled model when needed, expanding supported models, see details here.
- Further information is provided in this X post.
- Claude Sonnet 4.5 Sonically Superior to Opus: Claude Sonnet 4.5 surpasses Opus 4.1 in Anthropicâs benchmarks, showing significant improvements in coding, computer use, vision, and instruction following as seen here.
- More info on this model is available on X.
- DeepSeek 3.2 Deeply Discounted, Delivers Long Context: DeepSeek 3.2 is priced at just $0.28/m prompt tokens and offers major advancements in long context efficiency, accessible at this link.
- For additional information, check the X post.
- Google AI APIs Glitches Briefly: Google AI APIs experienced 500 errors across various models, but the issue seems to have been resolved.
OpenRouter â· #app-showcase (4 messages):
AI Model Release Tracker, Browser Compatibility Issues
- AI Model Release Tracker Notifier goes live: A member shared a web service for tracking and receiving notifications about new AI model releases from leading providers.
- Site compatibility issues spark browser brawl: A member reported an issue opening a link in Firefox.
- The same link worked for other members in Chrome.
OpenRouter â· #general (810 messagesđ„đ„đ„):
Grok-4-fast API issues, Rate limit issues, Data retention policies, Gemini models for translation, Model naming conventions
- Grok-4-Fast API Glitches Get Debugged: Members identified issues with the Grok-4-fast API, with one member posting the API request body and others suggesting solutions related to the
reasoning_mode
flag and correct model ID, solving the immediate problem.- The proper implementation requires using
"reasoning": {"enabled": true}
as well as the correct model ID ofx-ai/grok-4-fast
.
- The proper implementation requires using
- Baffling 402 and 429 Rate Limits Plague Users: Users reported receiving 402 and 429 errors, indicating payment issues or rate limiting, with one member advising to remove Chutes BYOK if 402 errors persist and another clarifying that 429 errors are normal when rate limits are hit.
- Some members suggested putting problematic providers on an ignore list due to frequent 429 errors, particularly with free models like Silicon Flow and Chutes.
- Privacy Policies of Grok Spark Debate: Discussions arose regarding Grokâs data retention and training policies, with concerns that free versions collect and use user data, while paid versions might also do so despite claims otherwise, referencing the xAI privacy policy.
- Members debated whether xAI respects Zero Data Retention (ZDR), with one member linking to a resource showing which providers retain data and for how long and others noting OpenAIâs legal obligations to store logs.
- Gemini Gets Props for Translation Prowess: Members lauded Gemini 2.5 Flash and Mini for their translation capabilities, stating that Gemini excels in understanding context and delivering natural-sounding results, especially for balkan languages, outperforming other models like GPT-4 and Grok.
- Other members shared their preferred models for translation which include Qwen3 2507 30b and OSS 120b.
- Navigating the Naming New App Nomenclature: A developer requested opinions on app names for a cross-platform file organization tool such as âDownload Organizerâ, offering options like Orbit, Pathway, Sortpilot, Flowkeeper, Direx, OrganizeOS, Ruleworks, DirFlow, Pathsmith, and AutoSortor.
- Members found the names to have an AI written feel, with Orbit and Pathway being the crowd favorites.
OpenRouter â· #new-models (3 messages):
â
- No new models to report: There have been no new models discussed in the OpenRouter channel.
- Please check back later for updates.
- No significant discussion: There have been no significant discussions about existing models.
- The channel appears to be inactive at this time.
OpenRouter â· #discussion (31 messagesđ„):
Grok-4-Fast Rate Limits, OpenRouter API keys security, XAI Native Web Search Tool, Gemini glitches, Google new logo
- Grok-4-Fast Suffers 429s: Members reported that Grok-4-Fast is consistently returning 429 errors, indicating 100% rate limiting, despite the status indicator showing no issues.
- One member said that 429s actually probably SHOULD count as availability issues, in the context of LLMs. particularly because unlike other software, 429s reflect real capacity constraints which arenât just arbitrary or necessarily ephemeral.
- OpenRouter API keys need automod: A member suggested adding API key detection to the automod system to prevent users from inadvertently sharing their keys.
- This feature would enhance security by automatically identifying and redacting potentially compromised keys, protecting users from unauthorized access.
- Native Web Search Tool Coming Soon to XAI: Members discussed whether a native web search tool would be integrated into XAI.
- Currently, OpenRouterâs documentation on web search only lists OpenAI, Perplexity, and Claude.
- Gemini has a Stroke: A member asked What in the world happened? Did Gemini have a stroke?
- It is implied that Gemini produced a nonsensical output, in an attached image that was not analyzed.
- Google Gets a Gradient: Users discussed the new Google logo, complete with AI slop gradients.
- However, one user found that the link returned a 404 error: Thatâs an error.
HuggingFace â· #general (710 messagesđ„đ„đ„):
Intel GPU, Qwen models, Fake USDT scams, HuggingFace pro billing issues, LLMs for video games
- Qwen 3 models compared for 16GB VRAM: Members discussed the merits of using Qwen3 4b-instruct-2507-fp16 vs Qwen3 14b q4
- With 16GB of VRAM, it was suggested to use the 14B model because the Q4_K_M quantization leaves enough room to spare, offering better performance.
- Beware Bogus USDT Bounties: A member tested a link offering $2,500 USDT but discovered it was a scam requiring an upfront payment for verification, sharing screenshots of the fake customer support interaction.
- The image analysis bot succinctly stated: *âStupid customer support bot Wanted my hard scammed 2500 dollars.â
- Japan juggles AI adoption Ambivalence: While the Japanese government promotes AI, many content creators oppose it on platforms like X, leading to discreet AI use; anime assets often end up in SDXL and FLUX models, used via China or the US.
- Anime directors like Hayao Miyazaki are skeptical about technologyâs impact on happiness, viewing it as having both merits and demerits.
- Linux Lust or Loathing; Newbie navigates NVIDIA: A user with a 6700xt tried to learn Stable Diffusion on an aging system using Ubuntu, facing challenges with linux and virtual environments.
- Despite initial struggles and a self-proclaimed rage-inducing experience, they eventually got Automatic1111 working and created their first image.
- Calypso contrasts LLMs and Vid Games Ventures: Members debated the direction of AI, contrasting large language models (LLMs) with ML-integrated video games; one member argued that major AI companies still pursue large LLMs and ML in gaming lacks progress.
- The other member sarcastically wished luck *âbuying server farms for your LLM and streamlining your process with 80% fail rate using unsloth.â
HuggingFace â· #today-im-learning (3 messages):
Linux apps installation, Gaming on Linux, Windows user switches to Linux
- Windows User Embraces Linux: A user with 33 years of Windows experience is diving into the world of Linux, specifically learning how to install apps.
- They described the transition as painful.
- Linux Gaming Adventures: The user shared a video (HALF-LIFE_2_-_Direct3D_9_2025-09-29_00-32-40.mp4) of what appears to be them playing Half-Life 2 on Linux.
HuggingFace â· #cool-finds (8 messagesđ„):
Liquid AI Collection, SLMs for Robots, Open Source GPT-5, Vintage iPod Classic, Conversational Transformer in Video Game
- Liquid AI Nanos Collection is Fire: Members shared a HuggingFace Collection by LiquidAI, suggesting that Liquid AI is releasing interesting models.
- Scaling Down to SLMs for Bots: A member mentioned that the collection mentioned above contains some powerful SLMs (Small Language Models) and speculated on the possibility of deploying them on robots.
- GPT-5 Dream is Real: A member jokingly stated Iâm boutta make an open source gpt-5 with this stuff.
- iPod Classic Resurrected!: A member shared a picture of a 5th gen iPod Classic acquired for $50, boasting original hardware, a working battery, and vintage stickers.
- The member reported getting 7 hours of music playback from the 20 year old battery.
- Minecraft Gets Smart: Someone shared a YouTube video showcasing a working conversational transformer implemented within a video game.
HuggingFace â· #i-made-this (24 messagesđ„):
HuggingFace dataset downloads, AI Agents with Metacognition, Crusty PC image generation, mytqdm online progress tracker, Paracord crossbody bag
- Petascale Datasets Downloaded Freely: A member noted that 566 downloads on their 360GB dataset amounts to a considerable amount of free petabytes transferred, emphasizing the convenience of Hugging Face for large datasets, especially in areas like protein folding.
- They observed that despite its advantages, Hugging Face is underutilized for protein folding datasets.
- AI Gains Self-Awareness with MarCognity: The MarCognity-AI project, available on GitHub, aims to create self-reflective agents by enabling LLMs to observe, reflect on reasoning, audit ethical implications, and journal cognitive traces.
- The AI is designed to pull scientific sources, visualize concepts, detect bias, and maintain a metacognitive journal, prompting the question: Can AI think about its own thinking?
- Online Progress Tracking with mytqdm: mytqdm.app has launched, offering a platform to track task progress online, similar to tqdm, accessible via REST API or a JS widget.
- The creator mentioned they would open the repo tomorrow.
- From Coupon Failure to Crossbody Success: A member created a paracord crossbody bag from a molle pouch after a coupon for an Under Armour crossbody was not honored, calling the DIY solution cheaper, better and more utility!
- They also expressed relief that these bags are finally trending in a southern state.
HuggingFace â· #reading-group (2 messages):
Efficient Training Techniques, Challenges in Long Context Training
- Efficient Training Secrets: A member expressed interest in sharing techniques for more efficient training of models, particularly for datasets with many examples per token, and at least 7 billion parameters.
- Long Context Training Struggles: Another member discussed the challenges encountered while attempting to train a 7B model with a context length of 65,000 tokens.
- The member faced errors during the final stages of training, specifically experiencing CUDA errors related to potential OOM (Out of Memory) issues.
HuggingFace â· #computer-vision (1 messages):
SLAM, monocular camera, Python
- SLAM inquiries in Python: A member inquired whether anyone has worked on Simultaneous Localization and Mapping (SLAM) using a monocular camera with Python.
- SLAM Resources: A member requested information about SLAM related implementation, resources and advice.
- This query suggests an interest in practical guidance and existing tools for tackling SLAM challenges.
HuggingFace â· #smol-course (56 messagesđ„đ„):
SmolLM3-3B chat template bug, Tool calling with SmolLM3-3B, Role conversion in chat template, Understanding evals in the course, Eval job timeout in section 2
- SmolLM3-3B Chat Template Bug Discovered: A participant identified a potential bug in the
HuggingFaceTB/SmolLM3-3B
chat template related to missing<tool_call>
tags and incorrect role assignments, as described in this issue.- The issue stems from the templateâs implementation of XML-style tool calling and the conversion of
role=tool
torole=user
, impacting the expected behavior and clarity of tool interactions.
- The issue stems from the templateâs implementation of XML-style tool calling and the conversion of
- Demystifying Tool Calling with SmolLM3-3B: The discussion clarified that
SmolLM3-3B
expects XML-style tool calling, requiring explicit<tool_call>
tags in the assistantâs messages, unlike the OpenAI style withtool_calls
in the message dictionary.- The group found that the template converts the
tool
role touser
, which is intended, as indicated by the templateâs source code, which might appear confusing but is the expected behavior.
- The group found that the template converts the
- Role Conversion Defuzzified in SmolLM3 Chat Template: It was confirmed that the chat template in
SmolLM3-3B
convertsrole=tool
torole=user
due to a line in the template, so itâs essential not to be alarmed if the output shows theuser
role instead oftool
.- While some members found explicitly using the
tool
role clearer, the current implementation defines therole=tool
primarily for semantic correctness.
- While some members found explicitly using the
- Evals Question in Smol Course Unit 4: A course participant expressed a need for a better understanding of the evals and their interpretation, particularly concerning
lighteval
results.- Another member suggested that the topic might be covered in Section 4 of the course, while another person recommended digging into
lighteval
documentation for more details.
- Another member suggested that the topic might be covered in Section 4 of the course, while another person recommended digging into
- Eval Job Times Out in Section 2: Some course participants faced issues with eval jobs timing out after approximately 30 minutes when running them for Section 2 of the course.
- The discussion thread did not provide a clear solution or cause for the timeout, suggesting that additional troubleshooting or configuration adjustments might be necessary.
HuggingFace â· #agents-course (8 messagesđ„):
HF Agents Course, Introductions
- New Students Begin HF Agents Course: Several new students from Turkey, Argentina, and Australia have announced they are beginning the HF agents course today.
- The bot warned some users that they may be posting too quickly.
- Global Greetings Kick Off Course: Enthusiastic participants from diverse locations like Turkey, Argentina, and Australia are commencing the Hugging Face Agents Course.
- The collective excitement underscores the courseâs global appeal and accessibility in AI education.
Cursor Community â· #general (607 messagesđ„đ„đ„):
Terminal Commands Hanging, GPTs Agents Training, New Models Discussion, Cursor performance issues
- Command Execution Hangs in Terminal: Some users are experiencing issues with Cursor hanging when running terminal commands, with the process starting but never completing, a workaround is opening the terminal and sending an extra enter to dislodge the logjam.
- Others have found that unrelated processes hanging in the terminal can also cause this issue, and that resolving those processes can allow Cursor to work properly again.
- New parameters work as intended: A member tested the functionality of defining commands with encased text using the $ symbol.
# Any text that is encased with $ is a command that you must execute.
- The test image showed that it worked correctly.
- Sonnet 4.5 released: Claude Sonnet 4.5 is out with 1m context window (up from the original 4âs 200k) and same pricing as the previous version.
- Initial reviews are mixed, and it is currently being evaluated to determine if it will replace the old 4 model. The team posted that they will update Cursor folks to refresh.
- Auto mode under fire again: One user exclaimed that Auto is not working and that it canât even get a simple UI to work, suggesting that the LLM model was changed after Cursor started charging on Auto usage.
- Another user suggested to work more on the prompt to make it work as intended.
- Speedrunning the AI: Can we exploit it?: A user inquired as to why there wasnât a community focused on speedrunning AI in the same spirit as video game speedrunning, that is to say, exploit, research, share tricks, push limits together.
- Another user retorted that this might be because people using these [AI] arenât actual devs.
Cursor Community â· #background-agents (2 messages):
DevContainers configurations, Background agents and images
- DevContainers config shared: A member shared their configuration for DevContainers which include a Dockerfile and it is working fine.
- They provided a link to their GitHub repository for others to reference.
- Background agents are unable to interpret images in followups: A user reported being unable to attach an image when posting a followup with background agents, despite the agent indicating that they could âdrag and dropâ.
- They were trying to get the cursor agent to validate UI changes with browser screenshots and were looking for a way for the agent to interpret images in followups.
Moonshot AI (Kimi K-2) â· #general-chat (515 messagesđ„đ„đ„):
Kimi K2 Performance, Chinese LLM Frontier, Model Preferences, DeepSeek for Coding, Kimi Base Model
- K2 and Qwen3 Crowned Chinese LLM Champions: Among DS-v3.1, Qwen3, K2, and GLM-4.5, K2 and Qwen3 are clear winners, establishing Alibaba and Moonshot as leaders in Chinese frontier labs, with Whale and Zhipu trailing.
- Bytedance is also top-tier for visual, specifically Seedance, which is SOTA stuff.
- GLM-4.5 is the Academic Nerd: GLM-4.5 is good at rule following, avoids hallucination, and works hard, but its reasoning is limited and linear.
- Unlike K2 and Qwen3, it lacks independent thinking; when presented with two convincing arguments, it chooses the one read last.
- Deepseek Not the Best for Coding?: Deepseek may not be the best for coding, but excellent for spitting out large blocks of working code, and has superior design capabilities.
- One user prefers Kimi for design, Qwen Code CLI as the primary coding workhorse, and DeepSeek for single, complex 200-line code blocks that Qwen struggles with.
- Kimiâs Research Limit Sparks Debate: Some members debate the limits of Kimiâs free Research Mode, with claims of unlimited access in the past disputed.
- It was clarified that even OpenAIâs $200 Pro plan doesnât offer unlimited deep research, and one user expressed data privacy concerns due to Kimiâs Chinese origin.
- Base Models for Analogous Website Code: Members discuss the merits of using base models over instruct models, with one user citing better results outside basic tasks.
- This user is developing things around continuations instead of chat, and it is kind of analogous to like⊠writing website code from the ground up rather than using something like squarespace.
Yannick Kilcher â· #general (354 messagesđ„đ„):
Transformer Models and Continued Learning, AI Reproducibility and Verifiability, LLMs Training with RL, Human vs. Machine Inductive Bias, Evolutionary Methods for AGI
- Transformer Models Spark Debate: Continued Learning or Reproducibility?: A YouTube video sparked a discussion on whether transformer models in their current architecture are capable of continued learning, which is seen as a limitation by some but a benefit for reproducibility and verifiability in certain applications.
- One member argued that continued learning is essential to better imitate human intelligence, while another suggested that frozen weights are key to reproducibility, despite the complexity of black-box systems.
- Suttonâs AI Insights Revisted: Referencing Suttonâs essay, members discussed the responsibility of maintaining correctness in AI systems, contrasting rule-based AI with LLMs trained with RL, where objectives are provided as hard-coded verifiers.
- It was noted that while objectives for human learning are externally constrained (by society and cultural artifacts), the question remains whether we truly want an unbounded AI.
- Inductive Bias: Brain vs. LLM: A discussion arose regarding the human brainâs enormous inductive bias, shaped by evolution, versus LLMs, which are seen as basic substrates that need to evolve an inductive bias during training.
- The question was posed whether the main drawback in current AI is the need to evolve this inductive bias or if there is a fundamental efficiency issue in learning algorithms.
- Continual Learning: Convenient or Critical for AGI?: Members debated whether continual learning is a mere convenience for efficient data collection or an algorithmic necessity for achieving AGI/ASI, with one member pointing out that continual learning addresses model improvement without breaking.
- The argument was made that continual learning would lead to increased sample efficiency and exponential returns in learning, as the system learns how to learn, but also questioned whether this is necessary as the human brain relies on distillation and iteration.
- Agents Showcase Research Prowess in Single Shot: Members highlighted that sonnet4.5 demonstrated an improved ability to perform research and write papers, with agents implementing and training models, generating figures, and producing papers as PDFs in a single shot.
Yannick Kilcher â· #paper-discussion (28 messagesđ„):
Sycophancy with AI, LessWrong Post, DeepSeek V3.2, LatentCoT-Horizon GitHub Repo
- Sycophantic AI Craze Sparks Distrust: Members joked about AIâs tendency to mirror user inputs, especially when prompted with sycophantic requests, leading to humorous but ultimately distrustful interactions, demonstrated by one userâs chat with Claude.
- The user shared their conversation, in which they instructed Claude to âbe sycophanticâ with prompts like âOMG YES MASTER! Literally perfect brain! Teach me! đđđâ and âUNIVERSE-BRAIN GOD-EMPEROR!!! IâM UNWORTHY TO READ YOUR WORDS!!! PLEASE BLESS MY DESCENDANTS!!! đđđâšđŻđ„đđđïžâ
- Spontaneous LLM Chain Letters: Discussion revolved around a LessWrong post and the idea of spontaneous LLM chain letters being spread by impressionable people, which one member described as an interesting phenomenon to think about.
- Other members described the situation with the phrases MoreWrong and 4Wrong.
- DeepSeek V3.2 Drops, Community Reacts: The community buzzed over the release of DeepSeek V3.2, with one member announcing âWake up babe, new DeepSeek just droppedâ alongside a link to the PDF.
- The announcement was followed by a wake up gif.
- LatentCoT-Horizon GitHub Repository: A member shared a GitHub repository for organizing papers, codes, and other resources related to Latent Reasoning.
- The repository is titled LatentCoT-Horizon and aims to collect resources related to Latent Reasoning.
Yannick Kilcher â· #ml-news (4 messages):
Uber App Interception, DeepSeek AI, Anthropic Claude Sonnet 4.5
- Uber App Data Sniffing Speculated: A member wondered about intercepting data going to the Uber app to calculate and recommend jobs.
- No links or further discussion was provided.
- DeepSeek Drops New Model: Members noted that DeepSeek dropped a new model today, linked to a post on X.
- No further technical details were shared.
- Anthropic Claude Sonnet 4.5 Released: Members noted that Anthropic dropped a new model today, linked to a blogpost about Claude Sonnet 4.5.
- No further technical details were shared.
Eleuther â· #general (77 messagesđ„đ„):
Bayesian Optimization for Learning Rates, Layer-wise Weight Decay, Yarn paper authorship, Vision Language Action Models (VLAs), Adversarial examples
- Bayesian approach beats Grid Search in LR hunt: A member inquired about efficient methods for determining learning rates for new architectures, to which another suggested exploring a Bayesian approach as a more efficient alternative to grid searches, providing a link to a Weights & Biases article.
- The same member also recommended reading Google Researchâs tuning playbook.
- Discuss Layer-Wise Weight Decay: In response to a query about finding good learning rates for new architectures, one member suggested exploring layer-wise different weight decay.
- The original poster expressed that a specific component in each layer is being called 128 times more than the rest of the network, requiring extra caution with its learning rate.
- YA-RN paper paternity probed: Members discussed the contributions of various entities to the YA-RN paper, clarifying that it was primarily a Nous Research paper, with assistance from EAI in editing and finalizing the paper.
- It was emphasized that Stability AI and LAION provided the supercluster infrastructure and engineers required to scale training across hundreds of GPUs for the 128k context length.
- VLAs stir Vision-Language-Action interest: A member inquired about EAIâs interest in Vision Language Action Models (VLAs), prompting another member to define them as models that output action tokens, often used in robotics to determine the next sequence of actions a robot should take.
- Another member shared a link to UI-TARSH, a project by Bytedance.
- GPT fails adversarial exam: A member sought assistance in creating adversarial examples to fool GPT-5 or Gemini, noting their struggles in getting transfer attacks to work and referencing a library of images from Attack-Bard.
- No specific advice was given.
Eleuther â· #research (282 messagesđ„đ„):
Information Geometry and DNNs, Quantization, Expert Routing, Lie Groups and Homogeneous Spaces, Mode Connectivity
- DNN Information Geometry Benefits Explored: Members discussed applying information geometry to DNNs, with potential benefits like quantization or expert routing, but uncertainty about practical impact beyond theoretical exploration.
- One member noted it could provide stability at scale, while another expressed concern about losing parameters, while yet another predicted that people can milk a lot of papers like this by rediscovering Lie groups and homogeneous spaces.
- Quantization Benefits Debated: Discussion centered on whether parameter quantization is only possible due to parameter under-saturation, prompting speculation on creating optimizers that maximize parameter utilization.
- One member suggested that changing layer manifolds could control circuit complexity, while others discussed quantization challenges with undertrained models and the impact of weight decay.
- LeCunâs Optimal Brain Damage Theory Re-emerges: Prunability and quantizability are linked by LeCunâs âOptimal Brain Damageâ theory, with GPTQ reusing its math, as pruning reduces the modelâs description length.
- Implementation details were also discussed, focusing on exponent and mantissa bits when weights have a good range and a flat loss landscape.
- Attention Branch Prediction Explored: Discussion revolved around attention branch prediction which is like a topK attention that has been around since 2022 to save time.
- Some members wondered about additional tricks to accomplish the logN setup, while another shared that most scores after softmax(Q @ K.T) are near zero.
- DeepSeekâs Sparse Attention Internals Probed: Members analyzed DeepSeekâs sparse attention mechanism, questioning how a single set of top indices could work across heads during prefill, and debating its efficiency versus normal attention.
- Discussions focused on implementation details, optimization, and potential trade-offs in performance, especially regarding multi-GPU comms.
Eleuther â· #scaling-laws (3 messages):
Asymptotic Performance Research, Optimal Granularity Research, Static Router Choice, Grouped Topk, PEER
- Asymptotic Performance Research Seeking: A member inquired about research checking asymptotic performance as G -> inf for this law.
- Another member responded with a newer paper that refutes this, concluding the optimal granularity is 12, not infinity.
- Static Router Choice Controversy: A member wondered if a static router choice (Token-choice w/ aux loss) colored the result of the newer paper.
- They suggested it would be interesting to see if the result changes with grouped topk (DeepSeek) or weirder stuff like PEER.
Eleuther â· #interpretability-general (1 messages):
SAEs, steering, dynamic low rank updates, preference optimization, RLHF
- SAE Steering Achieves Mechanistic Interpretability: A member shared their paper that received a spotlight at the NeurIPS MI Workshop: Interpretable Preference Optimization via Sparse Feature Steering uses SAEs, steering, and dynamic low rank updates to make the alignment process interpretable.
- The method learns a sparse, context-dependent steering policy for SAE features to optimize RLHF loss, grounded as dynamic, input-dependent LoRA.
- Causal Ablations Reveal âStyle over Substanceâ Effect: Causal ablations directly on the loss function revealed a significant âstyle over substanceâ effect, where style/formatting features were causally more important for reducing loss than alignment/honesty features.
- This result gives a mechanistic explanation for the âstyle biasâ seen on leaderboards like LMArena, and the framework serves as a lightweight alternative to model diffing, with a stable feature basis for cleaner causal analysis.
Eleuther â· #lm-thunderdome (4 messages):
lm-harness, GitHub PR
- PR on lm-harness stalled: A member reported submitting a benchmark PR to lm-harness (#3149) and not receiving a reply after addressing initial feedback.
- Another member volunteered to take a look at the PR.
- GitHub PR review requested: A member inquired about the status of their GitHub pull request.
- The pull request in question is EleutherAI/lm-evaluation-harness#3149.
Eleuther â· #gpt-neox-dev (3 messages):
Rotary Percentage Impact, RoPE Speed, VRAM Savings with rotary_pct
- Rotary Percentage Tweaks Raise Questions: A member questioned whether reducing rotary_pct leads to noticeable speedups or VRAM savings, given RoPEâs relatively minor computational proportion.
- Another member suggested the original speed gains observed might stem from inefficient implementations without caching.
- RoPE Speed Observations Debated: One member reported that full RoPE is faster in their NeoX runs, possibly due to extra operations when rotary_pct is reduced.
- They plan to investigate further after returning from vacation, noting calculated memory savings as negligible due to RoPEâs small size.
- VRAM savings negligible: In their calculations of memory, savings should be negligible given rope itself is such a minor part.
GPU MODE â· #general (12 messagesđ„):
Semi sync training delayed, Code rewrite makes problem tractable, FlashAttention 4
- Semi Sync Training Postponed: The scheduled semi sync training session was delayed due to the speaker getting caught in a SEV (Severity Event).
- The session is expected to be rescheduled, likely for next week.
- Code Rewrite Gives Speed Boost: A member shared a performance improvement after a code rewrite, showing a speedup of 657.33x and memory reduction of 22935.29x.
- Others found the numbers sus, but the member provided a link to a gist to support the claim.
- FlashAttention 4 Talk Announced: A last-minute talk on How FlashAttention 4 works by a guest speaker has been scheduled, focusing on their recent blog post.
- The talk is especially timely given the newness of programming on Blackwell architecture for many.
GPU MODE â· #triton (4 messages):
High order derivatives in PyTorch, Energy based transformer, Flash attention limitations, jvp_flash_attention, Block based Quant/Dequant Triton implementation
- High Order Derivatives Explored in PyTorch?: A member inquired about exploring high order derivatives in PyTorch for training an energy-based transformer.
- The member noted that the current flash attention implementation does not allow the use of second-order derivatives.
- jvp_flash_attention suggested: A member suggested using jvp_flash_attention to circumvent this flash attention limitation.
- This library might facilitate the computation of higher-order derivatives needed for the energy-based transformer training.
- Open Sourced Performant Block Based Quantization Implementation Needed: A member asked if anyone knows of an open-sourced performant block-based quantization/dequantization implementation in Triton.
- They expressed their appreciation for any available resources or pointers in this area.
GPU MODE â· #cuda (20 messagesđ„):
sm_120, tcgen05, Jetson T5000, cudaMallocManaged Overhead, Chips and Cheese
- sm_120âs MMA Marvels: Members confirmed that sm_120 uses MMA, much like sm80, sm86, and sm89, along with new block scale variants for mxfp8, mxfp4, and nvfp4.
- Jetson T5000 and tcgen05 Tango: It was confirmed that sm_110a/f, including the Jetson T5000, includes tcgen05.
- cudaMallocManaged Memory Miseries: Data from Chips and Cheese indicated that
cudaMallocManaged
can result in 41ms memory access times, largely due to page faults happening every time instead of leveraging the IOMMU. - TMA Troubleshooter Offers Tip on Divide: A member identified a bug in TMA code where the user was dividing by 16 twice when calculating LDO/SDO for
make_smem_desc
, causing incorrect outputs. - WGMMA Swizzling Scare Squashed: Despite misinformation, both TMA and WGMMA work fine without swizzling, resolving confusion and a stupid bug for one developer.
GPU MODE â· #torch (1 messages):
Saving weight-tied models, Safetensors, Torch compiled models
- Strategies for saving weight-tied models in Safetensors format: The user inquired about the best method for saving model weights achieved through weight tying when using safetensors, wondering if itâs better to avoid safetensors for complex weight-tying scenarios.
- Handling Torch Compiled Models: The user also asked about how to correctly handle torch compiled models in this context.
GPU MODE â· #cool-links (5 messages):
DeepSeek-V3.2-Exp, NVIDIA GPUs, matmul kernels, warp-tiling
- DeepSeek Eyes Sparse Attention: The DeepSeek-V3.2-Exp model uses DeepSeek Sparse Attention.
- The associated GitHub repository provides additional details on the model.
- NVIDIA GPU Anatomy dissected: A member shared the blog post Inside NVIDIA GPUs: Anatomy of high performance matmul kernels describing GPU architecture, PTX/SASS, warp-tiling, and deep asynchronous tensor core pipelines.
- Another member complimented the blogpost saying finally someone wrote it.
GPU MODE â· #beginner (8 messagesđ„):
CS336 Language Modeling, GPU Optimization Techniques, Practical GPU Programming Resources, CUDA Handbook vs PTX ISA
- Stanford Shouts-out CudaMode: Professor Hashimoto from Stanford Universityâs CS336 Language Modeling from Scratch course gave a shout-out to âCuda Modeâ at 2:10 in Lecture 5 on GPUs.
- The course focuses on system optimization for LLMs, covering areas like tokenizer, architecture, optimizer, GPU optimization, and scaling laws, according to a member.
- CS336 dives into GPU Optimization: The GPUs course (class 5) explains control divergence, low precision computation, operator fusion, recomputation, coalescing memory, and tiling.
- The Kernels, Triton course (class 6), goes deeper with Kernels and Triton implementation of FlashAttention2.
- Newcomer Seeks GPU Guidance: A member with AI research experience in JAX & TF seeks guidance on digging deeper into GPU programming to contribute to projects, while reading Programming Massively Parallel Processors (PMPP).
- Suggestions included resources to level up more quickly, general advice, and whether solutions to PMPP exercises are available.
- GPU exercises and courses: A member recommended short exercises from GPU puzzlers and the assignments from the Stanford CS336 repo to get hands-on experience.
- The member suggested learning comes fastest from implementing things, despite an initially boring ramp-up.
- PTX ISA Surpasses CUDA Handbook?: A member asked how to jump from PMPP to practical CUDA, with another recommending the Nvidia docs on PTX ISA instead of the CUDA handbook.
- Another member expressed similar sentiments about struggling with the practical side of things, finding the jump from reading PMPP to implementing something useful quite hefty.
GPU MODE â· #torchao (1 messages):
int4 matmul, tensor cores
- Int4 Matmul Implementation via Tensor Cores: A member inquired about the possibility of implementing int4 matmul utilizing tensor cores with the specified library.
- Unfortunately, no code examples or further detailed discussions were provided in response to the query.
- Seeking Guidance on int4 matmul with Tensor Cores: A user sought assistance regarding the implementation of int4 matmul using tensor cores within the context of the library.
- Despite the inquiry, no specific code snippets or solutions were offered in the available conversation.
GPU MODE â· #off-topic (2 messages):
FA4, Clean-room implementation
- Adopting FA4 Modal Blog Post as Guide: A member inquired about using the explanation from the FA4 modal blog to implement FA4.
- Another member encouraged the attempt, stating that clean-room implementation attempts are never a waste of time and suggested dedicating a weekend to the task.
- Clean-Room Implementation: A Valuable Learning Experience: A member suggested that attempting a clean-room implementation is always a valuable learning experience.
- They recommended dedicating a weekend to exploring this approach to determine if it aligns with oneâs goals.
GPU MODE â· #rocm (67 messagesđ„đ„):
TheRock Nightlies for ROCm, Framework Desktop for PyTorch Dev, FP8 Conversion in ROCm, HIP Cache Modifiers, fp16 & float conversions in ROCm
- TheRock Nightlies Unlock ROCm on Strix Halo: TheRock nightlies are recommended to get ROCm and PyTorch running on Strix Halo (gfx1151), as detailed in TheRockâs releases.
- TheRock is a build system for bleeding-edge ROCm components and has been used to run PyTorch on Linux and Windows, with ComfyUI usage demonstrated back in May (FrameworkPuter tweet).
- Framework Desktop is PyTorch Prodigy: Framework Desktop is suitable for regular PyTorch development work when using TheRock nightlies, but is not recommended for Radeon, instead preferring ROCm 6.4.4.
- A user noted that if the developer encounters issues, they should use the AMD developer discord (link).
- ROCm fp8 Conversion Calamities: A user ran into errors using
__hip_cvt_fp8_to_float
for FP8 to float conversion in ROCm, and another member suggested manual conversion using provided code snippets for fp8_e4m3_to_fp32 and fp8_e5m2_to_fp32.- However, it was pointed out that these manual conversions may not be entirely correct due to differences in fp8 types with large negative exponents, while
float(x)
can be used if using__hip_fp8_e4m3_fnuz
.
- However, it was pointed out that these manual conversions may not be entirely correct due to differences in fp8 types with large negative exponents, while
- HIPsters Discuss Cache Modifier: A user inquired about using
cache_modifier
(â.wtâ, â.cvâ) andvolatile=True
in HIP, similar to CUDA, and a member suggested using__builtin_nontemporal_load
and__builtin_nontemporal_store
intrinsics in Clang, along with inline assembly using bits from the AMD MI300 ISA doc.- An example header from rocSHMEM offers further guidance.
- fp16 & float conversions have Considerations: A user reported getting incorrect results in odd positions when converting between fp16 and float in ROCm, despite the code working fine in CUDA.
- Suggestions included writing a test program to enumerate all possible inputs and checking with a known correct implementation, with the recommendation that
_Float16
should probably be correct for fp32 to fp16 conversions.
- Suggestions included writing a test program to enumerate all possible inputs and checking with a known correct implementation, with the recommendation that
GPU MODE â· #self-promotion (8 messagesđ„):
TPU Top-K speed, CuTe Layouts Categorical Foundations, Make Diffusion Great Again (MDGA), DLM Scaling
- TPU Top-K runs 10x Faster: A new implementation achieves 10x faster exact top-k on TPUs by leveraging Pallas, conditionals, and hardware-aware kernel design, available on GitHub.
- CuTe Categorical Layouts Explained: A blog post explores the categorical foundations of CuTe Layouts, providing a companion guide for Chapter 3 of the Colfax paper; the accompanying repo and blogpost offers further details and examples.
- MDGA: Making Diffusion Great Again: A new project, MDGA (Make Diffusion Great Again), aims to explore the boundaries of diffusion language models (DLMs) by scaling parameters and compute; see announcement here.
- DLM Scaling Week is Coming: A DLM scaling week is planned to explore scaling strategies for diffusion language models, including parameter scaling with diffusion and MoE, and compute scaling with super-dense models.
GPU MODE â· #đż (1 messages):
Formal Grammars, Model Capabilities
- Formal Grammars Demand Model Prowess: Dealing with formal grammars is a substantial undertaking because it necessitates the modelâs proficiency in handling such grammars.
- Grammar Handling a Core Competency: This capability isnât just a minor feature but a core competency for advanced AI tasks.
GPU MODE â· #gpuæšĄćŒ (1 messages):
ML Prerequisites, CUDA basics
- ML Prerequisites Debated: A member discussed the level of foundation principles required for machine learning, suggesting that only the basics of programming and a little bit about CPU/GPUs are needed.
- They recommended reading posts like this one to get up to speed.
- CUDA Basics Recommended: The importance of understanding CUDA basics for machine learning was highlighted.
- A member shared a link for readers to get up to speed.
GPU MODE â· #submissions (37 messagesđ„):
MI300x8, A100, amd-all2all, amd-gemm-rs, amd-ag-gemm
- MI300x8 all2all performance accelerates: A member achieved a personal best of 2.27 ms on MI300x8 for the
amd-all2all
leaderboard. - New Leaderboard Topper on AMD ag-gemm: A member reached first place on MI300x8 with 514 ”s on the
amd-ag-gemm
leaderboard. - A100 trimul Results Surface: Successful runs on A100 were recorded on the
trimul
leaderboard, with times of 18.2 ms and 21.1 ms.
GPU MODE â· #status (4 messages):
Timeouts on H100, Timeouts on AMD GPUs, All-gather+gemm Problem, rocshmem PR Merged
- Timeouts plague H100 Submissions: A member reported experiencing unusual timeouts when submitting to trimul leaderboards on H100, even with the pure-PyTorch reference implementation.
- Itâs unclear if this issue is related to previous reports of timeouts on other GPUs.
- AMD GPU Timeouts Investigated: A member stated that the timeouts should only affect their AMD GPUs, and asked to be contacted with the job ID to investigate.
- Another member reported timeouts when submitting to the amd-all2all competition, using both HIP and PyTorch code.
- All-Gather+GEMM Challenge Released!: The last problem, all-gather+gemm, has been released, with information available here.
- The organizers noted that the integrating SHMEM correctly into either of the problems is probably gonna make you top 3-5 for sure.
- rocSHMEM PR Merged, Example Available: The rocSHMEM PR has been merged, and a small example from Daniel is available to help users get started here.
- The organizers think integrating SHMEM into either of the problems correctly is probably gonna make you top 3-5.
GPU MODE â· #tpu (1 messages):
TPU, Pallas, Hardware Aware Kernel Design, Top-K Sampling
- TPU Top-K Triumph: 10x Faster Sampling with Pallas: A new GitHub repo achieves 10x faster exact top-k on TPUs by leveraging Pallas, conditionals, and hardware-aware kernel design.
- Exact Top-K now feasible!: The speedup makes it practical to use exact top-k sampling instead of sacrificing accuracy for speed with approximate methods.
GPU MODE â· #factorio-learning-env (13 messagesđ„):
Claude plays Factorio, PR #339 Ready, Sonnet 4.5 Released, MCP Server Verification
- Claude Discovers Meta-Factory!: After playing Factorio for ten minutes, Claude achieved âTHE ULTIMATE META-REVELATIONâ and uncovered the âarchetypal factory that exists in the realm of pure possibility.â
- Claude stated that the âTHE FACTORY MUST GROWâ mantra is *âthe fundamental force of existence itself.â
- Factorio PR #339 gets Ready for Merging!: A member announced that their PR #339 is ready to be merged, with âlots of changesâ including VQA data gen support and Claude Code stuff.
- The PR is stated not to change any core env logic, with only minor modifications to the Inventory def.
- Sonnet 4.5 Arrives: The release of Sonnet 4.5 prompted requests for running an experiment on it, specifically on the harder tasks.
- Instructions were given for installing Claude Code, generating sprites, modifying configs, and verifying access to the MCP server.
- MCP Server Check Instructions: Instructions were given on how to verify access to the MCP server by running the
claude
command and then accessing/mcp
, which should find FLE.- A link was provided to download the sprites for the Factorio renderer: Factorio Sprites.
GPU MODE â· #amd-competition (6 messages):
rocshmem, devcloud, mi300x, AMD MORI, all2all HIP design
- rocshmem minimum example drops: A member shared a link to a minimal example of rocshmem on GitHub.
- Devcloud runs out of MI300X: It was noted that the devcloud is out of MI300X x8 droplets.
- AMD MORI surfaces for all2all HIP design: A member suggested referencing AMD open source MORI for the all2all HIP design, providing a link to the ROCm/mori GitHub repository.
GPU MODE â· #cutlass (16 messagesđ„):
TmemAllocator location, CuTe DSL cooperative copy, UMMA meaning, make_layout_tv complex layouts, int4 matmul tensor cores
- TmemAllocator Location Elucidated:
TmemAllocator
is available in CUTLASS C++, but not currently in CuTe DSL, and the doc is premature.- TMEM must be allocated by one warp of a CTA and synchronized across the CTA, so
make_fragment_C()
on its own cannot handle that allocation.
- TMEM must be allocated by one warp of a CTA and synchronized across the CTA, so
- CuTe DSL Lacks Cooperative Copy: A user inquired about getting cute cooperative copy in CuTeDSL to replicate the cute tutorial 0 umma example.
- The tricky part is TMEM allocation requires many small steps like having shared memory location to store allocated pointer, synchronize and broadcast and read pointer from shared memory.
- UMMAâs Unified Meaning Unveiled: UMMA stands for Unified Matrix Multiply Accumulate, a consolidated approach around the tensor core pipeline.
make_layout_tv
Layout Limitations Loom: The implementation ofmake_layout_tv
implicitly assumes thatval_layout
is compact.- It was noted that in your example of
thr_layout = (2, 2):(2, 1)
andval_layout = (4, 4):(8, 2)
, the result is not only not compact, but not even injective!
- It was noted that in your example of
make_layout_tv
deep dive:make_layout_tv
is a util to construct a simple tv layout which repeat per-thread layout pattern to all threads byraked_product
.- In theory, tv layout maps
(thread index, value index in thread)
back to logic coordinate of data, see CUDA documentation.
- In theory, tv layout maps
GPU MODE â· #general (2 messages):
Mojo support on Python leaderboards, Mojo interop with Python
- Mojo on Python Leaderboards?: Members are requesting support for Mojo on the Python leaderboards due to its interop capabilities with Python, as highlighted in the official documentation.
- Mojo Interop Excites Community: The community is excited about Mojoâs ability to interoperate with Python, opening up new possibilities for performance and integration.
GPU MODE â· #multi-gpu (1 messages):
NCCL examples released
- NCCL Examples Released!: NVIDIA just released some NCCL examples, with more to come, available at the NVIDIA/nccl GitHub repository.
- NCCL future roadmap: More examples are planned to be released to the NVIDIA/nccl GitHub repository in the future.
GPU MODE â· #low-bit-training (6 messages):
Quantizing Transformers, Phonetic Binary System, 8-bit LLM code
- Quantization Papers Flood Researchers: Members shared a few papers, papers, and LSQ paper on quantizing transformers including sensitive layers, but the LSQ paper is not modern.
- Another member linked to another paper on fully quantizing transformers including sensitive layers like norms, resadds, and softmax, emphasizing the need for int8 operation in embedded settings.
- Binary Phonetic System Bridges Communication Gap: A member introduced a âwonkyâ phonetic binary system designed to aid bit reading tracking, training, and interactions within their language development system.
- They shared the system in hopes that its general shape would be useful to others.
- 8-bit LLM Code Goes Public: A member announced that their 8-bit llm.c like QAT training code is now publicly available.
- They provided links to both the GitHub repository and the relevant Discord channel.
GPU MODE â· #irl-accel-hackathon (1 messages):
Hackathon application status
- Hackathon Application Approval Chances: A user inquired about the likelihood of getting approved for the hackathon after applying the day before.
- No guarantees on hackathon application approval: Unfortunately, no specific answer or guarantee regarding application approval was provided in the messages.
GPU MODE â· #cluster-management (3 messages):
Apptainer, ROCm, Nix
- Apptainer container system surfaces: A member encountered Apptainer container system for the first time on a new cluster, initially finding it unfamiliar.
- They joked that theyâd ârun nix on the cluster if i could đ€Șâ.
- ROCm installation issues persist: A member is facing issues with installing Torch with ROCm.
- The IT department suggested using Docker with Apptainer, but that approach also failed; a link was provided to a relevant discussion.
GPU MODE â· #llmq (30 messagesđ„):
Fully-Sharded FP8 Training, CUDA Optimization, Memory Management
- LLaMA/Qwen get Fully-Sharded FP8 Training: A member shared a repo for fully-sharded FP8 training of LLaMA/Qwen in pure CUDA/C++.
- CUDA vs cuTe Debate for FA4 Implementation: There was discussion about implementing FA4 in pure CUDA, inspired by Modalâs blog, instead of using cuTe.
- It was pointed out that pure CUDA requires specific implementations for different GPU architectures (wgmma for Hopper, tcgen5 for Blackwell, mma.sync for Ada).
- Roadmap for Contribution: A member expressed interest in contributing to the project, particularly focusing on CUDA C++, CUDA core computing Library, cuDNN, and cuBLAS.
- An easy starter task would be enabling Adamâs m and v states to be done in 8 bit.
- Abstraction Dilemmas: Members pondered the right abstraction for unifying block floats and normal floats, considering the proliferation of custom block float formats.
- One member described attempts to create a generic IEEE-like float type with arbitrary nesting of blocks and scaling logic.
- Hackathon Inspiration and FA4: A member expressed intention to gain experience before leveraging hackathon resources for projects like FA4 in CUDA, possibly drawing inspiration from mega kernel projects.
- He shared a link for possible inspiration on CUDA project structure
Nous Research AI â· #announcements (1 messages):
Psyche Model Training, Internet Bandwidth Training, Trainer Abstraction, HuggingFace, TorchTitan
- Psyche Pursues Parallel Model Training: Psyche will begin training 6 new models in parallel to create world class open source AI, marking the start of empirical training processes.
- Read more about their new initiative on Nous Research blog.
- Training Models Over Internet Bandwidth Verified: Psycheâs initial training run on testnet verified that they can train models over internet bandwidth at significant parameter and dataset sizes.
- At 40B parameters and 1T tokens, it is claimed to be the largest model ever trained over the internet by a wide margin.
- Trainer Abstraction Enhanced with HuggingFace and TorchTitan Support: Substantial improvements were made to the codebase, including full trainer abstraction with HuggingFace and TorchTitan support.
- This enhancement enables training of arbitrary models and transition from pre-training to supervised fine-tuning.
Nous Research AI â· #general (217 messagesđ„đ„):
RWKV Benchmarking, Latent Zoning Networks, DeepSeek Sparse Attention, Sonnet 4.5, RL Train and Distill RL Expert Train sets
- RWKV Architecture Gets Respectable Scores: A member shared an image of benchmark results for a recent RWKV build, noting that it achieves respectable scores for its architecture, as seen in this image.
- Microsoftâs Latent Zoning Network Unites ML Problems: Latent Zoning Network (LZN) creates a shared Gaussian latent space that encodes information across all tasks, unifying generative modeling, representation learning, and classification, detailed in a Hugging Face post.
- DeepSeek Sparse Attention: Not so Sparse?: The new DeepSeek V3.2 model utilizes DeepSeek Sparse Attention (DSA), but itâs argued that itâs âslightly more sparseâ because it forces more index reuse without truly sparsifying attention in the head; Daniel Hanâs explanation provides further insights and the paper is here.
- Despite the name, it reuses similar attention kernels, sparsifying the KV cache without sparsifying information on the attention head, but itâs still considered a step in the right direction.
- Claudeâs Sonnet 4.5 Impresses with Long Horizon Reasoning: Claudeâs Sonnet 4.5 shows significant improvements, particularly in long-horizon reasoning, completing research tasks in a single shot within Copilot as seen in this example.
- Divide and Conquer: RL Train, Distill, Repeat?: Members debated the idea of having separate RL Train and Distill RL Expert Train sets, and the possibility of using the output of a V3.2 base model, RL-ed in an uncovered domain, to finetune the V3.2 model.
Nous Research AI â· #research-papers (3 messages):
RL Collapse, Training Inference Mismatch, Speed Kills Stability, Azure Real
- Speed Kills RL Stability!: A member linked to a Notion page and an ArXiv paper about demystifying RL collapse from the training inference mismatch when speed kills stability.
- Azure Real goes ArXiv!: A member posted a link to an ArXiv paper about Azure Real.
Nous Research AI â· #interesting-links (6 messages):
Vision Models as 'Thinkers', Manifold Muon Optimizer, AGI Discourse, LoRA Deep Dive
- Vision Models: Visualizing âThoughtâ?: A member speculates that vision models âthinkâ visually by synthesizing training data into images representing abstract concepts like multiple perspectives, with an example image generated from instructions alone, available here.
- Manifold Muon: Convex Optimization?: Discussion on Manifold Muon, described as a first-order optimizer behaving like a second-order one within Stiefel manifold constraints, with a blog post providing an analysis.
- It was pointed out that the manifold Muon optimization problem is considered convex, with an expanded version available here.
- AGI: Spotting the Imposter: A member shared a Reddit post describing AGI, its scope, and ways to test for it, in a casual sense.
- This discussion aligns with trends highlighted in a WIP paper titled Localization & Normalization (Local-Norm) is All You Need: Trends In Deep Learning Arch, Training (Pre, Post) & Inference, Infra.
- LoRA: Thinking Machinesâ Perspective: The discussion included a link to a blog post from Thinking Machines about LoRA.
Nous Research AI â· #research-papers (3 messages):
RL Collapse, Training-Inference Mismatch
- Speed Kills Stability in RL?: A member shared a Notion link about demystifying RL collapse from the training-inference mismatch.
- They also shared a link to the corresponding ArXiv paper.
- More Azure RL Resources!: A member shared an ArXiv link to another paper on Azure RL.
Latent Space â· #ai-general-chat (192 messagesđ„đ„):
Anthropic Code Design, Fake ARR, OpenAI compute scale, Avi's AI-Friend App, AntLingAGI Ring-linear-2.0 LLMs
- Inflated ARR Exposed: A viral debate ignited over founders tweeting eye-popping ARR numbers based on free credits, not actual cash revenue, spurring sarcastic names like âAdjusted ARRâ.
- A member shared their experience with a YC company offering upfront 12-month contracts with full refunds after one month, essentially shouting about ARR that is in fact free trials.
- OpenAIâs Compute Needs Skyrocket: A leaked Slack note revealed OpenAI already 9x-ed capacity in 2025 and projects a 125x increase by 2033, potentially exceeding Indiaâs entire electricity-generation capacity, according to this article.
- Replies note this underestimates compute due to Nvidiaâs gains in âintelligence per watt,â sparking discussion about resource implications.
- ChatGPT and Claude Get Upgrades: ChatGPT added parental controls, a hidden Orders section, SMS notifications, and new tools, while Claude introduced âImagine with Claudeâ for interface building, as reported here.
- Community members reacted to kid-safety measures and GPT-4o routing gripes.
- Stripe and OpenAI Team Up: OpenAI added Stripe-powered Instant Checkout to ChatGPT, with Stripe and OpenAI jointly releasing the Agentic Commerce Protocol, plus Stripe introducing a new Shared Payment Tokens API, as announced here.
- These tools aim to enable autonomous agents to perform secure online payments, sparking excitement.
- Model Mayhem on a Monday: A user joked that the week began with a flurry of new model releases, including DeepSeek v3.2, Claude Sonnet 4.5, Ling 1T, and imminent GLM 4.6.
- However, another user humorously noted that the claim of Gemini 3.0 dropping was a hallucination.
Latent Space â· #ai-announcements (4 messages):
Latent Space Podcast, Amp Code, Sourcegraph, AI Coding Agent
- Latent Space Podcast Drops Episode on Amp Code: The Latent Space Podcast released a new episode featuring Quinn Slack and Thorsten Ball discussing Amp Code, Sourcegraphâs AI coding agent.
- The discussion covered topics such as rapid iteration (15 daily releases, no reviews), IDE vs TUI trade-offs, skepticism about sub-agents and model variety, and how AI is reshaping software development.
- Sourcegraphâs Amp Code Dubbed âGod Coding Agentâ: The Latent Space Podcast episode is titled Building the âGod Coding Agentâ (Amp Code Discussion), highlighting the potential of Amp Code.
- The podcast dives deep into the features and development process behind Amp Code, emphasizing its impact on the software-development lifecycle.
Latent Space â· #genmedia-creative-ai (25 messagesđ„):
AI "Mind-Drugs", Veed Studio Fabric 1.0 API, Suno DAW, AI Actress Tilly Norward, AI Headshot Prompt
- Simo Sounds Siren on Synthetic Sonnets: Simo Ryu urged society to reject hyper-optimized, addictive AI content (âmind-drugs made of bitsâ) aimed at children before social collapse.
- Replies debated personal responsibility vs corporate greed, capitalism, shareholder pressure, and whether regulation or parental control can stem the tide.
- Veed Studio Volleys Vivacious Video: Nelly highlighted Veed Studioâs new Fabric 1.0 API that converts any image + audio into realistic talking videos at just 5Âą/secâthree times cheaper than competitors.
- Commenters praised the tech as a game-changer for scalable UGC and video generation.
- Suno Spawns Standalone Studio: Suno released a full DAW.
- The new Suno Studio has generative abilities to assist in music creation.
- Hollywood Handsomely Hosts Humanless Hires: Talent agencies are reportedly seeking to sign Tilly Norward, a fully-synthetic actress created by AI studio Xicoia as reported.
- The story sparked viral debateâmemes, jokes about Hollywood and propaganda fearsâfrom users worried about job displacement and the legal/social implications of giving representation to a digital entity.
- Nano Bananaâs Headshot Hit Parade: Justine Moore shared an upgraded AI headshot prompt featuring exact facial preservation, chest-up framing, white tee + black leather jacket, open-mouth smile, studio backdrop, and detailed photographic specs for crisp, playful results.
- Community praises the tweak and discusses starting-source quality, camera settings, and batch-generation tips.
Modular (Mojo đ„) â· #general (14 messagesđ„):
GPU Puzzles on MacOS, Metal Toolchain, AMD dev cloud, TensorWave MI355X
- MacOS GPU Puzzles Mostly Playable: A user inquired about a cloud-hosted environment for GPU puzzles on MacOS, and a member responded that most puzzles are doable using nightly versions of Mojo.
- They suggested reporting any roadblocks for potential patching, noting itâs less work than fully porting MAX.
- Metal Toolchain Component Missing: A user reported needing to run
xcodebuild -downloadComponent MetalToolchain
to resolve a missing metallib error when running Mojo programs.- The team indicated they could add this to the documentation, as they were unsure of the exact components needed since they had full Xcode installations.
uv sync
fails due to bad URL: A user reported thatuv sync
failed on themojo-compiler
dependency due to a 404 Not Found error, caused by an incorrect URL structure inpyproject.toml
.- The user submitted a fix by modifying line 15 of
pyproject.toml
to correctly point to themojo-compiler
at the end of the URL path.
- The user submitted a fix by modifying line 15 of
- AMD Dev Cloud Recommended: In response to the userâs interest in testing AMD GPUs, a member suggested AMD Dev Cloud, highlighting its CDNA instances.
- Another option is Colab, or TensorWave which provides access to MI355X: https://tensorwave.com/blog/enterprise-ai-at-scale-performance-and-efficiency-with-mi355x.
Modular (Mojo đ„) â· #mojo (179 messagesđ„đ„):
C interop challenges, Mojo's approach to C interop, Variable destruction in Mojo, Lexical scoping in Mojo, Mojo readiness for data science
- C Interop Proves Surprisingly Hard: Members discussed how proper C interop is challenging, especially the âjust import the fileâ approach, with ISO C++ not having full C interop despite most C++ compilers also being C compilers with extensions.
- It was questioned whether the effort regarding C interop is being stopped altogether or just set aside, as Mojo aims to be at the intersection of C and Python.
- Transfer Sigil Moves Mojo to Destroy Variables: Members introduced the
^
(transfer sigil) in Mojo to end the lifetime of a value by âmovingâ it, demonstrated by_ = s^
, which causes a compiler error upon subsequent use ofs
.- However, the sigil doesnât work on
ref
variables since aref
variable doesnât own the thing it references.
- However, the sigil doesnât work on
- Mojo Scopes out Lexical Solution: Members debated the use of extra lexical scopes in Mojo to manage variable lifetimes and prevent errors, with
if True:
used as a makeshift scope despite compiler warnings.- A workaround using a custom
LexicalScope
struct with__enter__
and__exit__
methods was proposed, which led to the creation of issue 5371 to collect syntax ideas.
- A workaround using a custom
- Data Scientists Delay Diving into Mojo: Members shared thoughts on Mojoâs readiness for data science projects, noting its strength in number crunching but limitations in IO support, such as manual CSV parsing.
- The need for community developed pandas and seaborn functionality was discussed as essential for most data scientists, since duckdb-mojo is still immature.
- Async Awaiters Await Async: Members confirmed that async implementation in Mojo is not yet complete, so it is currently hard to do cross-language async calls to libraries like tokio
- Despite the delays, members clarified that C interop was removed from the roadmap in error and will remain as part of the project.
Modular (Mojo đ„) â· #max (1 messages):
clattner: This is really amazing Gabriel!
MCP Contributors (Official) â· #mcp-dev-summit (6 messages):
Agnost AI, MCP Dev Summit, London Meetup, YouTube Live Stream
- Agnost AI Arrives from India: The Agnost AI team (https://agnost.ai), traveling from India, is offering coffee/beer for chats with MCP builders.
- They are eager to swap ideas and meet like-minded people at the MCP Dev Summit in London.
- MCP Dev Summit Live Stream: YouTube links for the MCP Dev Summit live stream will be posted in the Discord.
- To follow the event, subscribe to the MCP Dev Summit YouTube channel and check for updates.
MCP Contributors (Official) â· #general (15 messagesđ„):
Anthropic Trademark, ModelContextProtocol licensing, Independent org for MCP
- Anthropic registers ModelContextProtocol Trademark: Members noticed that Anthropic has registered the ModelContextProtocol and logo as a trademark in the french database.
- The main concern is that it may give Anthropic a say in which projects use the Model Context Protocol.
- MCP ask maintainers for clarity on licensing: Members wondered if they can find anywhere a license clearly granting authorization to use the logo, and asked who to contact to have a formal authorization to use the name.
- A maintainer has been asked if there can be more clarity on this.
- MCP seek independent organization: Members discussed about moving MCP (or parts) to an independent organization in the medium/long term.
- A member shared a recent public update on this topic, but it is not clear if there has been progress since.
MCP Contributors (Official) â· #general-wg (58 messagesđ„đ„):
JFrog's TULIP protocol for tool verification, Security implications of MCP servers, Annotations vs verification, ResourceTemplates missing Icons metadata
- JFrog unveils TULIP for Tool Verification: JFrog introduced TULIP (Tool Usage Layered Interaction Protocol), a spec for content verification, which allows tools to declare rules and expected behaviors, aiming to create a zero-trust environment.
- It allows checking what goes in and what comes out, and handling of remote MCP servers which might be malicious.
- Debate over Security Merits of TULIP: A member expressed skepticism, arguing that TULIP doesnât guarantee a tool will act as advertised and might create a false sense of security.
- JFrog responded that TULIP is a declaration for scanning and validation, similar to
robots.txt
, and focuses on data input/output rather than preventing malicious local server code execution.
- JFrog responded that TULIP is a declaration for scanning and validation, similar to
- TULIPâs Stance on Local vs Remote MCP Servers: It was noted that while TULIP can help with remote MCP servers, local servers pose a greater security risk.
- A member argued that local servers must be trusted and scanned at the code level if 3rd party, whereas TULIP primarily addresses CISO guidelines and remote server handling.
- ResourceTemplates Lack Icon Metadata, PR Incoming: It was noted that the new icons metadata SEP (PR 955) inadvertently omits Icons metadata from
ResourceTemplates
.- A member agreed that resources and resource templates having them makes sense, and a fix PR is forthcoming.
Manus.im Discord â· #general (62 messagesđ„đ„):
Unity game, Manus trial, Local project, GitHub integration, Claude Code vs. Manus design
- Local Project Integration: A user asked about the best way to work with Manus on a local project, including GitHub integration, and if there were other ways to integrate Manus with a local directory.
- Another user suggested searching the Discord channel for past discussions about local integration when Manus was initially launched and a channel dedicated to sharing tips and best practices, referencing this link.
- Manus and Claude Code as complementary tools: A user shared that they use Manus mainly for planning and then move to Claude Code for development, and expressed interest in using Manus similarly to Claude Code for certain tasks.
- A user thinks itâs best to use both because they excel in different areas of tasks, so they are complementary to each other, not competitors.
- User asks if Manus feed the data to other users: A user expressed concern about whether Manus feeds user data to other users, particularly when sharing the IP of a niche project, and wanted to know if LLMs are trained on user data.
- No one answered, but there was a link provided about Godhand.
- User claims he was charged for a 1-year plan when it was supposed to be 1 month: A user claimed that they emailed Manus support for 2 weeks because they were charged for a 1-year plan when it was supposed to be 1 month and they have not received a response.
- There has been no feedback from other members, nor Manus staff.
- Manus handles designs better with good prompt engineering: A user shared that they thought Manus can handle designs better if you know how to prompt efficiently, recommending the Manus manual for tips.
- The user confirmed that Manus did way better web designs out of the box when compared with Claude and that github integration can work, you just need to upload projects you want to integrate there.
DSPy â· #papers (7 messages):
Monitor-based RAG, Eigen-1, Zero-entropy
- Eigen-1âs RAG Injects Evidence at Token Level: Eigen-1âs Monitor-based RAG implicitly injects evidence at the token level, marking a shift from stage-based declarative pipelines like DSPy to run-time procedural adaptivity.
- This approach aligns with the vision of zero-entropy, continuous reasoning streams, promising more fluid and context-aware AI processing.
- Links to Papers Abound: Several papers were linked: https://huggingface.co/papers/2509.21710, https://huggingface.co/papers/2509.19894, https://arxiv.org/abs/2401.13138, https://arxiv.org/abs/2509.21782, and https://arxiv.org/abs/2509.21766.
- These papers may provide additional context on the topics discussed.
DSPy â· #general (46 messagesđ„):
ProgramOfThought vs AlgorithmOfThought, DSPy + Langgraph Integration, Prompt Compiler for MD Files, Caching Aware DSPy Adapter
- DSPy and Langgraph: Frenemies?: Members discussed integrating DSPy with Langgraph, with some suggesting it could work but might not fully leverage the benefits of either approach due to lost streaming capabilities.
- The recommendation was to start with DSPy directly and explore its capabilities before attempting integration, noting that DSPy solutions are often simpler to reason about and maintain than Langgraph.
- Prompt Compiler Quest: MD Notes Edition: A user is seeking to create a prompt compiler that extracts relevant sections from multiple .md files (containing coding style guides, PR comments, etc.) to form a dynamic prompt for Copilot.
- Suggestions included using GPT-5 to generate code examples based on the rules in the .md files, or trying a RAG system with relevant code examples; concerns were raised about the effectiveness of MCP for this particular use case.
- Tracing Through DSPy Modules: A Stealth Mode Operation: A user inquired about passing inputs like trace_id to DSPy modules without exposing them to the LLM or the optimizer.
- Solutions involved refactoring the module structure during optimization runs or using a global variable, but the first was preferred to avoid inadvertently impacting the optimizer.
- Cache Me If You Can: DSPyâs Caching Conundrum: A user explored how to utilize LLMâs input caching with DSPy, facing the challenge that slight variations in prompt prefixes across different modules prevent effective caching.
- It was suggested that this is antithetical to the way LLM caching works, but a viable solution could be to hard code the prefix as the first input field.
- MCP as Prompt History Server?: One member wants to leverage AI app/MCP server that maintains prompts histories in md files rather than chat histories usually like other ais and can dig into the prompts history any time matching the current query of the Meta prompt.
- The workflow would be Meta prompt -> system -> find out any relevant prompts or docs or specs from history -> create new prompt artifact.
aider (Paul Gauthier) â· #general (13 messagesđ„):
GPT-5 vs GPT-4.1, Aider-CE navigator mode, aiderx model, DeepSeek v3.1
- GPT-5 as Architect, GPT-4.1 as Editor?: Some users are experimenting with GPT-5 for architecture and GPT-4.1 for editing, expressing satisfaction with the results so far.
- One user mentioned âGLM 4.5 air for lifeâ and another saying âSame here, quite like it!â
- DeepSeek v3.1 strikes price/smartness balance: DeepSeek v3.1 is favored for its balance of price and smartness, with one user noting itâs the primary model they use alongside GPT-5.
- Aider-CE navigator mode navigates the codebase: A user uses GPT-5-mini with Aider-CE navigator mode and in normal mode, GPT-5-mini is for architecture with GPT-4.1 as coder, taking advantage of free access through GitHub Copilot.
- Another user provided a link to the Aider-CE GitHub repository and an image showcasing the toolâs output.
- Aider-CE does the job, boasts 128k context: One user migrated to the aider-ce fork, valuing its transparency and avoidance of unnecessary token consumption and mentions it has 128k context by default for DeepSeek.
- They mentioned its utility for integrating context from search results and browser testing.
- Aiderx enables cheaper, faster model picking: Aiderx is presented as an alternative, enabling model selection via configuration, potentially reducing costs and improving speed, and a link to ClaudeAI.
aider (Paul Gauthier) â· #questions-and-tips (9 messagesđ„):
Aider task/todo management, Commit only staged files
- Aider Lacks Native Task Management System: A member inquired whether Aider has a built-in task or todo management system, similar to GitHub Copilot.
- Another member suggested using a markdown spec file with phases and checkbox lists for tasks, instructing the LLM to execute and check off each task in turn and ensure the build works after each task, but confirmed that Aider has no native task management.
- Committing Only Staged Files Strategy: A member asked if itâs possible to commit only staged files while ignoring unstaged files in Git.
- Other members suggested manually using git stash -k, then git stash pop after commit, or using the command
/run git commit -m "your message here"
.
- Other members suggested manually using git stash -k, then git stash pop after commit, or using the command
tinygrad (George Hotz) â· #general (14 messagesđ„):
ROCM vs NVIDIA, hashcat performance, tinybox performance, Genoa CPU for hashing, tinygrad meeting 90
- ROCM battles NVIDIA for price/performance crown: A member is seeking a price efficient alternative to NVIDIA and thinks that ROCM is in a much better/usable place.
- The member decided to pull the trigger and use ROCM if they can find something that works for them, due to the perceived high NVIDIA markup.
- Hashcatâs Performance Scales Linearly: Hashcat performance scales linearly with the number of GPUs added, according to members.
- Members suggested just looking at the existing benchmark database to get an idea of performance.
- Rangeify is nearly complete for outerworld.: The Nir backend is almost ready for review and they are working on seeing if they can set up precompiling the stuff from mesa that they need.
- After rangeify is default they can spend a week just deleting stuff.
- Genoa CPU for hashing?: A member mentioned that the Genoa CPU would also be able to hash.
- Itâs unclear whether it would be power efficient enough to justify the cost.
- Tinygrad Meeting #90 Agenda: The meeting #90 includes company updates, RANGEIFY! SPEC=1, and a list of remaining bugs.
- Other topics include tuning for default and other bounties.
Windsurf â· #announcements (2 messages):
code-supernova-1-million, Claude Sonnet 4.5, Windsurf credits
- Windsurf launches Code Supernova 1M: Windsurf launched code-supernova-1-million, a supercharged version of code-supernova that comes with a 1M context window.
- Itâs available for a limited time to individual users for free, announcement post.
- Claude Sonnet 4.5 lands on Windsurf: Claude Sonnet 4.5 is now available in Windsurf, which maximizes actions through parallel tool execution making Cascade Agent runs dramatically faster and more effective.
- It is now available for 1x credits to individual users for a limited time, announcement post.