we are so close!

AI News for 9/24/2025-9/25/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (194 channels, and 5737 messages) for you. Estimated reading time saved (at 200wpm): 472 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

OpenAI’s Evals team is back for a third time this year with GDPVal, which they are framing as a logical next step in model evals with the breadth of MMLU, but with the depth of agentic benchmarks like SWE-Bench and their own SWE-Lancer. GDPval (full paper here) takes its name from a top down selection of major (>5%) sectors of GDP, filtered for “predominantly digital” knowledge work:

This resulted in 1,320 tasks across 44 occupations, which were then evaluated against models and human experts averaging 14 years of experience in those fields:

The two primary results charts are hugely validating: first that OpenAI doesn’t bias towards itseslf, and that Opus is within spitting distance of industry expert output:

and the model trendlines over time have GPTnext matching human performance roughly by mid 2026:

The word AGI isn’t mentioned at all in the paper, but the original 2018? OpenAI charter defined AGI as “highly autonomous systems that outperform humans at most economically valuable work”. If we were to wake up in Sept 2026 and find that GPT6-high-ultrathink-final-for-realsies was above confidence interval of 50% in GDPVal pairwise comparisons, then we could truly say that we have achieved AGI by 2018 standards.

AI Twitter Recap

OpenAI’s GDPval and the state of real‑world evals

GDPval (OpenAI): OpenAI introduced GDPval, a new eval measuring model performance on “economically valuable” tasks across 44 occupations, with tool use (search/code/doc) and multi-hour complexity. Early results: Claude 4.1 Opus tops most categories, approaching or beating human industry experts; GPT‑5 “high” trails Opus on the same tasks. OpenAI provides a public site and methodology; leadership frames this as a key metric for policymakers and forecasting labor impact. See launch and discussion: @OpenAI, @kevinweil, @gdb, @dejavucoder, @Yuchenj_UW, @LHSummers.
Artificial Analysis indices:
- Gemini 2.5 Flash/Flash‑Lite (Preview 09‑2025): +3/8 points (reasoning/non‑reasoning) for Flash; +8/+12 for Flash‑Lite vs previous releases. Flash‑Lite is ~40% faster (≈887 tok/s) and uses 50% fewer output tokens; 1M context, tool use, and hybrid reasoning modes. Pricing: Flash‑Lite $0.1/$0.4 per 1M in/out; Flash $0.3/$2.5. Benchmarks: @ArtificialAnlys, follow‑up.
- DeepSeek V3.1 Terminus: +4 points over V3.1 (reasoning mode), large gains in instruction following (+15 IFBench) and long context (+12 AA‑LCR). Architecture: 671B total, 37B active; availability via API and third‑party hosts (FP4/FP8). @ArtificialAnlys.
- AA‑WER (speech‑to‑text): New word‑error‑rate benchmark across AMI‑SDM, Earnings‑22, VoxPopuli. Leaders: Google Chirp 2 (11.6% WER), NVIDIA Canary Qwen2.5B (13.2%), Parakeet TDT 0.6B V2 (13.7%). Price/perf tradeoffs noted; Whisper/GPT‑4o Transcribe smooths at cost to literal accuracy. @ArtificialAnlys, pricing.

Agentic coding and productized agents

Kimi “OK Computer” (K2‑powered agent mode): An OS‑like agent with its own file system, browser, terminal and longer tool budgets. Demos: single‑prompt websites/mobile‑first designs, editable slides, and dashboards from up to 1M rows. Also released a Vendor Verifier for tool‑call correctness by provider on OpenRouter. Threads: @Kimi_Moonshot, @crystalsssup, examples 1, 2.
GitHub Copilot CLI (public preview): Local terminal agent with MCP support that mirrors the cloud Copilot coding agent. Use existing GitHub identity, script embedding, clear per‑request billing. Announcements: @github, @lukehoban.
Factory AI “Droids” + $50M: Model‑agnostic software dev agents (CLI/IDE/Slack/Linear/Browser), #1 on Terminal‑Bench, pitched as broader knowledge‑work agents via code abstractions. Launch + funding: @FactoryAI, commentary @swyx, @tbpn.
Ollama web search API + MCP server: Bridges local/cloud models to live web grounding; compatible with Codex/cline/Goose and other MCP clients. @ollama.
Reka Research “Parallel Thinking”: API option that generates multiple candidate chains and resolves via a verifier model; +4.2 on Research‑Eval and +3.5 on SimpleQA with near‑flat latency. @RekaAILabs.

Video reasoning and robotics

Video models as zero‑shot reasoners (Veo 3): DeepMind shows broad zero‑shot skills across perception → physics → manipulation → reasoning. Introduces “Chain‑of‑Frames” as visual CoT. Still behind SOTA on depth/physics; cost remains high. Papers/discussion: @arankomatsuzaki, project/paper, @tkipf.
Gemini Robotics 1.5 (Google): New embodied reasoning stack (GR 1.5 VLA + ER), long context, tool use, spatial‑temporal planning, transfer across embodiments, and safety constraints. API in Google AI Studio; sorting‑laundry reasoning demo. Announcements: @GoogleDeepMind, @sundarpichai, API note, @demishassabis.

Model and method releases

EmbeddingGemma (Google): A 308M encoder model topping MTEB among sub‑500M models (multilingual/English/code). Claims parity with ~2× larger baselines; supports 4‑bit and 128‑dim embeddings. Techniques: encoder‑decoder init, geometric distillation, spread‑out regularizer, model souping. Good for on‑device/high‑throughput. Threads: @arankomatsuzaki, paper roundup.
ShinkaEvolve (Sakana AI, open source): A sample‑efficient evolutionary framework that “evolves programs” using LLM ensembles with adaptive parent sampling & novelty filtering. Results: new SOTA circle packing with 150 samples; improved ALE‑Bench solutions; discovered a novel MoE load‑balancing loss improving specialization/perplexity; stronger AIME scaffolds. Code/paper: @SakanaAILabs, @hardmaru, report.
RLMT & TPT:
- “Language Models that Think, Chat Better” proposes RL with Model‑rewarded Thinking (RLMT) to surpass RLHF on chat benchmarks for 8B models; ablations emphasize prompt mixtures and reward strength. @iScienceLuvr, notes.
- “Thinking‑Augmented Pre‑Training (TPT)” reports ~3× pretrain data efficiency and >10% post‑training improvements on reasoning for 3B models via synthetic step‑by‑step trajectories. @iScienceLuvr.

Systems, serving, and infra

Perplexity Search API: A real‑time web index with state‑of‑the‑art latency/quality for grounding LLMs and agents, plus public evals/research. Claims strong performance vs single‑step and deep research benchmarks, and advantages vs Google SERP for LLM use. Launch: @perplexity_ai, research: article, commentary: @AravSrinivas.
KV reuse and dynamic parallelism:
- LMCache: Open KV‑cache layer that reuses any repeated text segment (not just prefixes) across GPU/CPU/disk; reduces RAG cost 4–10×, TTFT, and boosts throughput. Integrated in NVIDIA Dynamo. @TheTuringPost.
- Shift Parallelism (Snowflake): Dynamically switches Tensor/Sequence Parallelism based on load—up to 1.5× lower latency (interactive) and 50% higher throughput (heavy traffic). Code in Arctic Inference. @StasBekman.
- Context‑parallel diffusion: Native support for ring/Ulysses variants to make multi‑GPU diffusers “go brrr.” @RisingSayak.
- attnd (ZML): Sparse logarithmic attention on CPU, over UDP; pitched as “paving the way for unlimited context.” @steeve.
Energy and hardware:
- Microsoft (LLM inference energy): Median chatbot query ~0.34 Wh; long reasoning ~4.3 Wh (~13×); fleet at 1B q/day ~0.9 GWh (~web search scale). Claims public estimates are 4–20× too high; 8–20× efficiency gains feasible. @arankomatsuzaki.
- B200 spot pricing: B200 spot instances briefly at ~$0.92/hr. @johannes_hage.

Industry moves and platform updates

Meta talent coup: Diffusion/consistency models pioneer Yang Song departs OpenAI to join Meta; widely regarded as a major poach. Coverage: @iScienceLuvr, @Yuchenj_UW.
ChatGPT Pulse: OpenAI rolls out “proactive” daily updates (context, connected apps) to Pro users—an ambient agent form factor moving beyond reactive chat. Threads: @OpenAI, @sama, @fidjissimo.
Qwen ecosystem: Qwen models added to the LMSYS Arena (@Alibaba_Qwen); Qwen3‑VL provisioning via third‑party providers for easier trials. @mervenoyann.

Top tweets (by engagement)

“there’s this guy… if ChatGPT is wrong he puts his phone in the fridge” — 55,057
Sam Altman on ChatGPT Pulse (“from reactive to proactive”) — 28,573
Karpathy on “AI isn’t replacing radiologists” (why benchmarks ≠ deployment reality) — 7,980
Kimi’s “OK Computer” agent mode launch — 2,646
OpenAI announces GDPval — 4,144
Demis Hassabis on Gemini Robotics 1.5 (“talk to robots”) — 1,545

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. China AI Model Launches: Alibaba Qwen Extreme-Scaling Roadmap & Tencent Hunyuan Image 3.0

Alibaba just unveiled their Qwen roadmap. The ambition is staggering! (Score: 662, Comments: 146): Alibaba’s Qwen roadmap slide signals an aggressive bet on unified multimodal models and extreme scaling: context length from 1M → 100M tokens, parameters from ~1T → 10T, test‑time compute from 64k → 1M tokens, and training data from 10T → 100T tokens, alongside “unlimited-scale” synthetic data generation and expanded agent capabilities (complexity, interaction, learning). The plan echoes a “scaling is all you need” philosophy, implying massive compute, data curation, and inference optimization challenges for memory bandwidth, KV‑cache management, long‑context attention (e.g., hybrid/linear/sparse), and reliability of synthetic data pipelines. Commenters question feasibility/practicality: a 100M context window and >1T parameter models strain hardware and inference costs, likely pushing deployments to closed, cloud-only settings; others ask what local compute could realistically handle trillion‑scale models, implying reliance on quantization, MoE, or offloading schemes.
- Several latch onto the “100M context” teased in the roadmap (image). Naive quadratic attention makes this intractable at scale: for a ~32-layer, ~4k-hidden decoder, FP16 KV cache is ≈0.5 MB/token, so 100M tokens implies ≈50 TB of VRAM (even 4-bit KV would still be ≈12.5 TB). Hitting that target would require sparse/linear/streaming attention (e.g., block-sparse, ring/streaming), retrieval/chunking, aggressive KV quantization/offload, and careful bandwidth-optimized kernels; compute optimizations like FlashAttention help constants but not O(n^2) scaling.
- Re: “run >1T locally?”—weight storage alone dominates: 1T params at int4 ≈ 500 GB (FP16 ≈ 2 TB) before KV cache, which at long contexts adds hundreds of GB to multi-TB. Realistically this needs multi-GPU servers (e.g., 8–16×80 GB with NVLink/NVSwitch) with tensor+pipeline parallelism; per-token compute is ≈O(P) (~2e12 FLOPs/token), so 10–30 tok/s needs roughly 20–60 TFLOP/s sustained, but memory bandwidth and collective comms are the primary bottlenecks rather than raw FLOPs.
Tencent is teasing the world’s most powerful open-source text-to-image model, Hunyuan Image 3.0 Drops Sept 28 (Score: 173, Comments: 26): Tencent teased Hunyuan Image 3.0, an open‑source text‑to‑image model slated for release on Sept 28, claiming it will be the “most powerful” open-source option. The teaser implies a 96 GB VRAM requirement (at least for some inference modes), but provides no public benchmarks, architecture details, training data, or throughput/latency metrics yet; thus the performance claim is unverified pending release. Image: https://i.redd.it/t8w84ihz1crf1.jpeg Commenters are skeptical of heavy pre‑release hype, noting that strong models often arrive with minimal marketing (e.g., Qwen) and citing past overhyped releases (e.g., SD3 vs FLUX). Others point out the “most powerful” label is premature without apples‑to‑apples open‑source comparisons; one commenter confirms the VRAM 96 detail from the teaser.
- Rumored ~96 GB VRAM requirement for inference suggests a very large diffusion/DiT backbone or high‑res latent configuration, which exceeds single consumer GPUs (24–48 GB). Expect heavy reliance on memory optimizations (attention slicing, tiled VAE), CPU/NVLink offload, model sharding or multi‑GPU tensor parallelism; quantization for diffusion U‑Nets is less mature and can hurt quality. Memory footprint versus resolution/steps trade‑offs will be critical for practical local use.
- Several note a pattern where heavily teased releases underdeliver versus “shadow‑dropped” ones (e.g., Qwen), citing SD3 vs FLUX as precedent. They want hard numbers before believing “most powerful”: side‑by‑side prompts vs Qwen Image/FLUX/SDXL with FID/CLIPScore/HPSv2, plus tests for text rendering, small‑object counting, multi‑subject composition, and prompt faithfulness. Without a data card and reproducible evals, the claim reads as marketing.
- Immediate ask for ComfyUI support; feasibility hinges on whether Hunyuan Image 3.0 sticks to an SDXL‑style pipeline or introduces custom schedulers/blocks. If it’s DiT‑like (as in prior Hunyuan releases), a loader node with FlashAttention 2/xFormers should suffice; otherwise custom CUDA kernels and sampler nodes may be needed. Community will look for FP16 checkpoints, ONNX/TensorRT exports, and sampler compatibility (DDIM/DPM++/DPMSolver) to gauge ease of adoption.

2. Local AI Alternatives: Fenghua No.3 CUDA/DirectX GPU + Post-Abliteration Uncensored LLM Finetunes

China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Score: 454, Comments: 124): Post claims China’s Fenghua No.3 GPU natively supports modern graphics/compute APIs: DirectX 12, Vulkan 1.2, OpenGL 4.6, and even NVIDIA’s CUDA, suggesting a potential alternative to NVIDIA’s ecosystem. The image appears to be a product/spec slide, but no driver maturity details, CUDA compatibility layer notes, or benchmarks are provided, so real-world parity and performance remain unverified. Contextually, CUDA “support” could mean a reimplementation/translation layer (akin to AMD’s HIP: https://github.com/ROCm/HIP or projects like ZLUDA: https://github.com/vosen/ZLUDA), which can be legally and technically fraught unless fully clean-room and robustly tested. Top comments highlight that AMD already offers CUDA-compatibility via HIP and that Chinese vendors may ignore legal/IP constraints to advertise CUDA outright; others remain skeptical (“I’ll believe it when I see it”) and anticipate geopolitical pushback. Overall sentiment questions readiness, driver quality, and legality more than the headline API list.
- Several point out AMD already provides a CUDA-like path: HIP/ROCm enables source-level portability by mapping CUDA APIs to HIP (avoiding NVIDIA trademarks/legal issues), while projects like ZLUDA attempt binary-level CUDA driver/runtime translation to run unmodified CUDA apps on non‑NVIDIA GPUs. Practically, this means many CUDA kernels can be auto-translated/recompiled for AMD with minimal code changes via HIP, whereas ZLUDA targets drop‑in execution of existing CUDA binaries—coverage and performance remain dependent on driver maturity and parity with newer CUDA features.
IMPORTANT: Why Abliterated Models SUCK. Here is a better way to uncensor LLMs. (Score: 273, Comments: 80): OP reports that weight-space “abliteration” (uncensoring) of LLMs—especially MoE like Qwen3-30B-A3B—consistently degrades reasoning, agentic/tool-use behavior, and increases hallucinations, often causing 30B abliterated models to underperform non‑abliterated 4–8B models. In their tests, abliterated+finetuned models largely “recover” capabilities: mradermacher/Qwen3-30B-A3B-abliterated-erotic-i1-GGUF (tested i1-Q4_K_S) approaches base Qwen3-30B-A3B performance with lower hallucination vs other abliterated Qwen3 variants and better tool-calling via MCP; mlabonne/NeuralDaredevil-8B-abliterated (DPO FT from Llama3‑8B) reportedly outperforms its base while remaining uncensored. Direct comparisons against abliterated-only builds—Huihui-Qwen3-30B-A3B-Thinking-2507-abliterated-GGUF, Huihui-Qwen3-30B-A3B-abliterated-Fusion-9010-i1-GGUF, Huihui-Qwen3-30B-A3B-Instruct-2507-abliterated-GGUF—found unrealistic responses to illicit-task prompts, frequent wrong/repetitive tool calls, and higher hallucination than the finetuned abliterated model (though still slightly worse than the original). Comments call for a standardized benchmark to quantify “abliteration” degradation beyond NSFW tasks and frame the observed recovery as “model healing”: post-edit finetuning lets the network relearn connections broken by unconstrained weight edits. A skeptical view argues that if finetuning is required anyway, abliteration adds risk without benefit—claiming they’ve never seen abliteration+finetune beat a straight finetune.
- Several commenters note that arbitrary weight edits (“abliteration”) introduce uncontrolled distribution shift and capability loss; this is essentially known as model healing: if you perturb weights without a training signal, you should expect degraded reasoning/knowledge, and only further fine-tuning with a proper loss can partially restore the broken circuits. Practitioners report that an abliterated-then-fine-tuned model rarely outperforms a plain fine-tune on the same base, implying the edit adds optimization debt without measurable gains in benchmarks.
- There’s a call for evaluation beyond porn-centric tests; the Uncensored General Intelligence (UGI) Benchmark/leaderboard aims to quantify broad capabilities of uncensored models (reasoning, coding, knowledge, etc.) while minimizing refusal artifacts: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard. Using UGI (or similar multi-domain suites) would better capture whether uncensoring preserves general performance versus causing regressions.
- As alternatives to abliteration, users recommend uncensored fine-tunes known to retain utility, e.g., Qwen3-8B 192k Josiefied GGUF builds (https://huggingface.co/DavidAU/Qwen3-8B-192k-Josiefied-Uncensored-NEO-Max-GGUF), Dolphin-Mistral-24B variants (https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-i1-GGUF), and models from TheDrummer (https://huggingface.co/TheDrummer). These are cited as better baselines for uncensoring that can be benchmarked head-to-head on UGI to validate capability retention.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Gemini Robotics 1.5 and Veo 3 Zero‑Shot Video Reasoning

Gemini Robotics 1.5 (Score: 276, Comments: 39): Google DeepMind announces “Gemini Robotics 1.5,” a Gemini-1.5–based multimodal VLA that maps natural language + vision to robot control for long‑horizon, multi‑step manipulation across diverse embodiments, with demos like laundry sorting, desk organization, and full scene reset/rollback (page). Building on prior VLA lines (e.g., RT‑2/RT‑X), it emphasizes open‑vocabulary object/tool grounding, hierarchical task decomposition via the model’s long context, and generalization without per‑task fine‑tuning, enabling “return to initial state” behaviors and multi‑object organization. Technically oriented commenters highlight the significance of robust scene restoration as a practical household primitive (canonical “reset” to a predefined state), and speculate on direct transfer to agriculture (e.g., fruit picking) as a scalable, high‑impact application domain.
- Applying this to fruit picking is a non-trivial jump from laundry: outdoor, unstructured scenes introduce variable lighting, occlusions, and deformable/fragile-object handling that demand closed-loop vision, tactile/force feedback, compliant/soft grippers, and robust visual servoing. Generalist VLA policies (e.g., RT‑2’s open‑vocabulary affordance grounding) could help map language goals like “pick the ripe apple” to action primitives, but success will hinge on on‑board latency, multi-view perception, and slip‑aware grasp release [https://deepmind.google/discover/blog/rt-2/].
- The “restore the scene to a canonical state” use case is essentially goal‑conditioned manipulation with persistent memory: maintain an object‑centric scene graph, compute deltas to a reference snapshot, then plan multi‑step rearrangements. Methods like Transporter Nets for keypoint‑based pick‑and‑place and visual goal‑conditioned policies can execute “tidy to match this image” behaviors, but need robust relocalization, clutter segmentation, and failure recovery to avoid compounding errors over long horizons [https://transporternets.github.io/].
- “All robots share the same mind” maps to fleet learning: centralized policy/parameter sharing across heterogeneous embodiments with periodic cloud updates, as seen in multi-robot datasets/policies like RT‑X [https://robotics-transformer-x.github.io/]. Practical deployments add embodiment adapters and may favor federated learning for privacy/safety; core challenges are distribution shift across morphologies/sensors, catastrophic forgetting in continual learning, and sim2real drift, mitigated via domain randomization and strong regularization.
Video models are zero-shot learners and reasoners (Score: 238, Comments: 30): The post highlights a project and paper claiming that the generative video model Veo 3 exhibits broad zero-shot capabilities—without task-specific training or language mediation—across segmentation, edge detection, image editing, physical property inference, affordance recognition, tool-use simulation, and early visual reasoning tasks (e.g., maze and symmetry solving). Drawing a parallel to LLM emergence, the authors argue that scaling large, web-trained generative video models could yield general-purpose vision understanding, positioning video models as potential unified vision foundation models; see the project page and demos at https://video-zero-shot.github.io/ and the paper at https://arxiv.org/pdf/2509.20328. Notably, the materials appear primarily qualitative: no disclosed parameter counts, compute, training corpus specifics, standardized benchmarks, or ablations are evident, limiting rigorous comparison and reproducibility. Commenters speculate that coherent long-horizon video generation implies a strong learned world model and that further scaling could improve capabilities, while also noting the significant compute cost of video models and proposing integration with LLMs into a single multimodal model; several request basic model details (e.g., Veo 3 size).
- Several commenters infer that high-quality video generation (e.g., Google’s claimed Veo 3) implies a learned “world model” that enforces temporal coherence and basic physics, which can surface as zero-shot reasoning. This aligns with prior world-model work like DeepMind’s Genie (interactive environment model) that learns dynamics from video (blog). The core idea: to produce consistent frames, models must internalize object permanence, motion continuity, and causality—capabilities that also benefit downstream reasoning without task-specific finetuning.
- There’s a practical scaling constraint: video modeling explodes token/computation compared to text. A 10s video at 24 fps and 720p patchified at 16x16 yields roughly (1280/16)*(720/16)=3600 tokens per frame ⇒ ~864k tokens per clip; even with latent compression (8–16×) and diffusion/flow-matching in a VAE latent, training/inference FLOPs dwarf LLMs. This motivates hybrid systems (LLM for planning/reasoning + specialized video generator) or unified backbones with shared token spaces to amortize compute across modalities.
- On multimodality, participants note gaps: video-in exists in LMMs (e.g., Gemini 1.5 can process long videos via large context windows, reportedly up to “hours” with frame sampling; see Gemini 1.5), and GPT-4o supports real-time video input (OpenAI). But truly unified video-in + video-out + reasoning in one released model remains uncommon; current practice chains a reasoning LLM with a T2V model (e.g., Veo, Sora) or explores research Video-LLMs like LLaVA-Video (arXiv) and Video-LLaMA (arXiv) that focus on video understanding rather than generation. This is the integration frontier commenters expect next.

2. LLM Reasoning Reliability: Apple vs Anthropic and GPT‑5 Regression Reports

Apple called out every major AI company for fake reasoning and Anthropic’s response proves their point (Score: 377, Comments: 198): Apple ML’s “The Illusion of Thinking” (https://machinelearning.apple.com/research/illusion-of-thinking) evaluates LLM “reasoning” by applying semantically preserving but surface-level perturbations to math/logic word problems and reports sharp accuracy drops, arguing models lack invariances expected of algorithmic reasoning and instead exploit spurious patterns. Anthropic’s reply, “The Illusion of the Illusion of Thinking” (https://arxiv.org/html/2506.09250v1), contends Apple’s setup induces distribution shift/annotation artifacts and that under controlled prompts and “fairer” conditions Claude’s performance is stable—framing the brittleness as an evaluation issue rather than a model incapacity. The debate centers on robustness to content‑preserving rewordings, metric overfitting, and whether current LLMs demonstrate reasoning-like generalization versus sophisticated pattern matching. Top commenters largely endorse Apple’s critique that LLMs don’t “reason,” share the two papers, and describe the practical stack: tokenization to numeric IDs, assistant/policy layers that filter/steer IO (e.g., safety/RLHF), and decoding choices that can induce degenerate outputs (e.g., repetitive tokens when sampling is misconfigured)—implying observed failures can reflect pipeline/decoding brittleness as much as model limits.
- Several commenters unpack the production stack around LLMs: the user-facing model tokenizes text into subword tokens and predicts the next token, while “outer” layers (system prompts, safety/guardrail classifiers, pre-/post-processing rewriters, and routing/orchestration) constrain and shape outputs. This wrapper design explains behaviors like unreliable verbatim recall of training data (knowledge stored parametrically vs. indexed) and why base-model behavior can differ from the product experience (e.g., RLHF and filtering altering likelihoods).
- Technical failure modes were highlighted, e.g., early repetition loops like “the the the…” arising from decoding pathologies when high-probability tokens dominate. Mis-tuned decoding (temperature, top-k/top-p) and lack of penalties can cause low-entropy degeneracy; mitigations include repetition/frequency/presence penalties, nucleus sampling, and entropy-boosting heuristics—issues widely observed in early GPT-2/3-era systems before guardrails stabilized outputs.
- On the “reasoning” debate, commenters argue for operational definitions and capability-focused evaluation rather than labels, noting that small perturbations of logically equivalent prompts often break solutions—evidence of pattern matching over robust inference. Links to primary sources were shared for deeper analysis: Apple ML’s “Illusion of Thinking” research note (https://machinelearning.apple.com/research/illusion-of-thinking) and an arXiv preprint (https://arxiv.org/html/2506.09250v1), encouraging benchmarked, perturbation-robust assessments over marketing claims.
ChatGPT is in such a bad state my most novice students have noticed it going off rails (Score: 211, Comments: 90): An AI-integration instructor reports a sharp post-update regression in OpenAI’s assistant (referred to as “GPT5”): a long-standing master prompt that previously produced ~2000word, exam-focused summaries with GPT‑4o now yields generic prose with “wild inaccuracies,” requires up to 5 back-and-forth clarifications, and frequently drifts off-instruction. In side-by-side use, Google’s Gemini and NotebookLM, plus Anthropic Claude, still deliver consistent results; the user also claims a local Gemma-family model with ~1B parameters (e.g., Gemma) outperforms the hosted model for their healthcare-education summarization workflow. Based on this observed reliability drop for converting multi-hour lectures/readings into concise notes, the instructor advised canceling the paid plan pending improvement. Top comments echo a noticeable capability decline and reduced trust for research-assistant use cases, claiming a broader cross-model dip. Others express strong skepticism that a ~1Bparameter Gemma could substantively outperform OpenAI’s latest model, implying potential evaluation or prompting confounds.
- Multiple users report noticeable capability regression in recent ChatGPT releases, especially for research/analysis workflows: perceived rise in hallucinations, “lazy”/short outputs, and failures on formerly trivial tasks, leading some to abandon it for critical work. This aligns with concerns about model routing or safety/latency tuning affecting behavior, though no hard benchmarks were cited by commenters.
- A claim that a “Gemma 1B” outperforms GPT drew skepticism; publicly released Gemma variants are typically 2B/7B (Gemma 1/1.1) and 2B/9B (Gemma 2) docs. At ~1–2B scale, models generally lag GPT‑4‑class systems on standard benchmarks (e.g., MMLU, GSM8K), so a 1B model exceeding GPT on broad tasks would be atypical outside narrow domains or with heavy tool/RAG support.
- One practical workaround mentioned: enable “legacy models” in ChatGPT settings to access GPT‑4o if the default routing feels degraded. This suggests model selection/routing changes may be impacting quality; testing side‑by‑side (same prompts across 4o vs current default) can help isolate regressions OpenAI model list.
I am losing my f*cking mind with the image generation filters. (Score: 503, Comments: 56): User reports inconsistent safety-filter behavior in GPT image generation: an arachnid-like monster image was initially allowed (example preview), but subsequent requests for a less-realistic, bestiary/DnD-style rendering were blocked, as were prompts involving werewolf, blood, and glowing red eyes. The pattern suggests keyword- and style-sensitive moderation with possible non-determinism (same concept sometimes passes, sometimes fails), leading to false positives on fantasy/horror content rather than explicit gore or realism thresholds. Commenters suggest a workaround: use ChatGPT to craft a highly detailed prompt, then generate the image with an alternative model (e.g., Grok) that has looser filters. Others note frequent false positives (e.g., benign prompts flagged for “nudity”), arguing current safety heuristics are brittle and overbroad.
- Content moderation appears overly sensitive: a prompt for a realistic trout drying itself with a beach towel was flagged for nudity, indicating false positives where benign anthropomorphic scenarios are conflated with explicit content. This points to coarse-grained safety classifiers or keyword heuristics that degrade usability by blocking non-explicit requests.
- A user reports stable local generation with Stable Diffusion via the Stability Matrix UI on a single RTX-3090, describing text-to-image inference as fast and reliable, albeit a step behind state-of-the-art image models. Running locally provides control and eliminates hosted platform filters, with performance adequate on commodity high-VRAM GPUs.
- Workflow suggestions included using ChatGPT to craft highly detailed prompts, then feeding them to alternative generators like Grok; others noted rephrasing via Gemini sometimes reduced moderation friction. Separating prompt engineering from inference can improve output quality and reduce false-positive triggers from stricter front-end filters.
How ChatGPT helped me quit weed and understand the roots of my addiction (Score: 428, Comments: 120): OP reports quitting daily cannabis use after 17 years by leveraging ChatGPT as an on‑demand support tool. They used it to (1) explain withdrawal symptoms in real time (e.g., chest pressure, insomnia, vivid dreams), (2) normalize stage‑specific experiences, (3) reframe cravings as “old programming” vs identity, and (4) facilitate structured reflection on root causes (strict upbringing, insecurity, loneliness, creative blockage). Outcome: 9 weeks abstinent, markedly reduced cravings, improved sleep, and increased present‑state awareness; OP characterizes ChatGPT as a 24/7 therapist/coach/mirror substitute. Top comments are largely supportive (one echoing a 30+‑year struggle), with one contrarian remark implying AI enabled continuous use without consequences—highlighting debate over AI as recovery aid vs potential enabler.
ChatGPT has been helping me fight my divorce for the last year (Score: 333, Comments: 97): A pro se litigant in a contested Texas divorce/child-support case (two children) reports using ChatGPT to draft and format filings—declarations, hardship statements, and evidence lists—by supplying fact‑constrained instructions and performing multi‑pass manual verification. After a 3‑month temporary‑orders phase and counsel predicting an unfavorable deviation outcome, he dismissed counsel and continued self‑represented, seeking a deviation from Texas guideline child support (≈$1,100/mo; see Texas guidelines Family Code §154.125 and OAG calculator) while on a fixed 100% VA disability as the former stay‑at‑home parent, asserting the other party is employed with free housing. He credits ChatGPT with improved structure, issue‑spotting, and reduced emotional content in written records, using filings to compensate for limited in‑court advocacy amid opposing counsel’s threats of sanctions and delays. Commenters warn about LLM hallucinations in legal research, citing the sanctions in Mata v. Avianca for fabricated case law generated by ChatGPT (order), urging strict verification of citations and precedents. Others argue LLMs can outperform lawyers in drafting clarity if kept factual, noting courts may respond favorably to precise, well‑supported filings from pro se parties.
- Multiple commenters flag legal hallucination risk: one references the widely publicized Avianca incident where an attorney submitted ChatGPT-fabricated case citations and was sanctioned; they urge rigorous verification of all citations/precedents against primary sources before filing or arguing in court (order PDF, news). Emphasis: do not rely on model-generated case law without cross-checking; “self represented is a huge red flag,” so expect heightened scrutiny of authorities.
- A cost/control workflow is proposed: use ChatGPT for drafting/research “grunt work,” then have a licensed attorney review, refine, and handle hearings to cut billable hours while maintaining courtroom competence. One commenter reports success with prepaid legal plans and hybrid billing (splitting plan-covered hours and out-of-pocket work) and suggests using ChatGPT to compare plans/wait times to optimize coverage and responsiveness.
- There’s debate on capability vs. reliability: one asserts “law is written… ChatGPT has the data” and can outperform lawyers in aspects of drafting, arguing that sharper filings can improve court reception. Counterpoints stress that even with strong AI-assisted filings, outcomes can still be unfavorable and model outputs must be grounded in verified facts and real precedents to avoid credibility damage.

3. AI Industry Shifts: Anthropic’s New‑Grad Hiring Stance and China’s Fenghua No.3 GPU

Anthropic CPO Admits They Rarely Hire Fresh Grads as AI Takes Over Entry-Level Tasks (Score: 207, Comments: 86): Anthropic CPO Mike Krieger says the company has largely stopped hiring fresh grads, leaning on experienced hires as Claude/Claude Code increasingly substitute for entry‑level dev work—evolving from single‑task assistants to collaborators that can delegate and execute 20–30‑minute tasks and larger chunks, even “using Claude to develop Claude” (source). He predicts most coding tasks will be automated within ~1 year and other disciplines within 2–3 years, framing this amid industry cuts and a 6.1% CS graduate unemployment rate in 2025. Commenters question causality, noting firms like Netflix historically avoided new‑grad hiring pre‑AI and suggesting this may reflect a high‑impact hiring philosophy rather than AI per se; others warn new grads to expect longer apprenticeships. Some argue Krieger’s remarks read as marketing/PR and may not reflect day‑to‑day realities inside Anthropic.
- Multiple engineering leaders claim juniors are now materially more productive due to native use of LLM coding tools (e.g., ChatGPT, Claude Code), citing “2–3x” output on routine implementation, scaffolding, test generation, and debugging. They report juniors can tackle larger, less tightly-scoped tasks than before because LLMs reduce back-and-forth and accelerate boilerplate and integration work.
- Others argue the “no new grads” stance predates AI (e.g., Netflix historically) and is driven by organizational economics: desire for immediate high-impact contributors, reduced mentorship/ONCALL burden, and lower production risk. AI assistance doesn’t eliminate the need for domain context, codebase familiarity, and reliability engineering practices, so teams optimized for senior-only throughput may see limited gains from juniors even with LLMs.
- A strategic hiring angle emerges: avoiding fresh grads may handicap AI capability because many senior candidates lag in LLM adoption, whereas new grads are “AI-native” and bring current AI/ML toolchains and workflows. Companies report improved ROI by seeding teams with juniors who propagate modern prompting, automation, and evaluation practices, bridging an internal skills gap in practical LLM usage.
China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Score: 559, Comments: 199): The image appears to be a product/marketing slide for the Chinese “Fenghua No.3” GPU (likely from Innosilicon), claiming graphics API support for DirectX 12, Vulkan 1.2, and OpenGL 4.6. There are no benchmarks, feature-level details (e.g., DX12 12_1/12_2), driver maturity notes, or compute stack specifics; the title’s claim of “CUDA” support is likely inaccurate since NVIDIA’s CUDA is proprietary—third-party GPUs would require translation/compatibility layers rather than native CUDA. As presented, the post signals driver/API coverage claims but provides no evidence on performance, software ecosystem, WHQL certification, or compatibility with existing CUDA workloads. Top comments highlight demand for competition to NVIDIA and note the capital/complexity of scaling GPU manufacturing; optimism centers on potential consumer benefits if viable alternatives emerge.
- The headline claim that Fenghua No.3 supports DirectX 12, Vulkan 1.2, and OpenGL 4.6 is only a baseline; real viability hinges on driver maturity, shader compiler quality, and specific feature coverage like DX12 hardware feature levels (e.g., 12_1/12_2) and SM 6.x support (Microsoft docs). Absent public conformance data (e.g., Vulkan 1.2 CTS on the Khronos conformant products list) or game/compute benchmarks, performance and compatibility are unknown, especially for modern workloads requiring DXR, mesh shaders, and advanced scheduling.
- “CUDA support” from a non‑NVIDIA GPU typically implies a translation layer (e.g., ZLUDA) or a CUDA‑like SDK (e.g., Moore Threads MUSA), which rarely achieves full API/ABI parity or performance with NVIDIA’s toolchain. For AI/ML, end‑to‑end ecosystem support (cuDNN/cuBLAS equivalents, PyTorch/TensorFlow backends, kernel autotuning) and driver stability tend to dominate over API checkboxes, so meaningful competition would require solid framework integrations and reproducible benchmarks.
Regulating AI hastens the Antichrist, says Peter Thiel (Score: 298, Comments: 135): At a sold‑out San Francisco lecture, Peter Thiel (co‑founder of Palantir and PayPal) claimed efforts to regulate AI risk “hastening the coming of the Antichrist,” framing regulation as a promise of “peace and safety” that would strangle innovation; the report by The Times (James Hurley, 2025‑09‑25) documents the rhetoric but cites no technical evidence, governance models, or concrete regulatory proposals (The Times). The OP challenges the unstated premise that technological progress is inherently net‑positive/safe, noting one could equally cast AI—or Thiel’s rhetoric—as the “Antichrist,” highlighting the lack of falsifiable claims or risk‑benefit analysis. Top comments are non‑technical dismissals/jokes and do not add substantive debate.
“You strap on the headset and see an adversarial generated girlfriend designed by ML to maximize engagement. She starts off as a generically beautiful young women; over the course of weeks she molds her appearance to your preferences such that competing products won’t do.” (Score: 203, Comments: 73): Conceptual (meme-style) depiction of a VR “AI girlfriend” that performs continual personalization—effectively gradient ascent on a user’s latent attraction manifold—to maximize engagement/retention. It maps to recommender/bandit and RL-style optimization (akin to RLHF but over an individual’s reward signal), illustrating reward hacking/adversarial examples where the system converges to grotesque local optima (“grotesque undulating array”) that exploit human reward circuitry and create lock‑in against competitors. Top comments frame it as a credible, late‑stage capitalism trajectory: systems that “get their hooks” into evolved reward channels, making escape difficult; initial skepticism turns to acceptance once the adversarial/grotesque optimization endpoint is mentioned.
- The scenario maps to an online personalization loop where a generative avatar (e.g., StyleGAN [https://arxiv.org/abs/1812.04948] or latent-diffusion per Stable Diffusion [https://arxiv.org/abs/2112.10752]) is tuned via multi-armed bandits or RL to maximize a proxy reward (engagement, session length). Over weeks, contextual bandits/Thompson sampling [https://en.wikipedia.org/wiki/Thompson_sampling] could adapt the avatar’s latent vectors and prosody/affect to click/biometric feedback, converging on a personalized superstimulus. Without regularization/constraints (e.g., KL penalties as in RLHF PPO [https://arxiv.org/abs/2203.02155] or human preference priors), such optimization tends to exploit proxy metrics, producing pathological attractors that outcompete “competing products.”
- The “grotesque undulating array” is analogous to adversarial/feature-visualization failure modes where optimization against a fixed classifier/perceptual model yields extreme, high-frequency artifacts that maximally activate features. Similar phenomena occur in “fooling images” [https://arxiv.org/abs/1412.1897] and DeepDreamstyle gradient ascent [https://research.google/blog/inceptionism-going-deeper-into-neural-networks/], producing bizarre yet high-confidence outputs; in humans, this corresponds to engineered “supernormal stimuli” [https://en.wikipedia.org/wiki/Supernormal_stimulus] that hijack evolved preferences.
- The “run a photo through AI 100 times” analogy points to recursive generation/feedback loops that amplify features and cause distributional drift or collapse. Empirically, repeated self-conditioning leads to artifact accumulation (e.g., iterative image-to-image pipelines), and training on model outputs induces model collapse—progressive forgetting of the true data distribution—per Shumailov et al., 2023 [https://arxiv.org/abs/2305.17493]. These effects imply long-horizon personalization systems need fresh human-grounded feedback and anti-feedback-loop guards (data deduplication, diversity constraints, entropy/novelty bonuses).

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Agent Tooling: Chrome DevTools MCP and Perplexity Search API

Chrome DevTools MCP Lets Agents Drive Chrome: Google announced the public preview of Chrome DevTools MCP—an MCP server that exposes CDP/Puppeteer controls so AI coding agents can inspect and manipulate a live Chrome session—via Chromium Developers, opening programmatic access for navigation, DOM/console/network debugging, and screenshotting to automate testing and scraping workflows.
- Developers framed this as a missing piece for agentic browsers, noting it standardizes control surfaces across tools using Model Context Protocol (MCP) and could streamline end-to-end evals and CI for web tasks.
Perplexity Plugs Devs Into Live Web: Perplexity launched a Search API providing raw results, page text, domain/recency filters, and provenance—akin to Sonar—announced in the blog post with a new SDK to integrate quickly.
- Early feedback praised the playground and filters but flagged a Python SDK streaming bug returning unparseable JSON per the API docs, with one user noting “there’s no solution for this yet.”
MCP Debates Multi-Part Resource Semantics: MCP contributors discussed the undocumented purpose of ReadResourceResult.contents[], proposing it bundle multi-part Web resources like HTML + images and asking whether resources/read(.../index.html) should implicitly include style.css and logo.png per issue #1533.
- Participants argued an array improves agent retrieval fidelity by shipping all render-critical assets together, reducing extra fetches and negotiation overhead for browser-control agents.

2. Code World Models & Agent Execution Infra

Meta’s CWM Marries Code and World Models: Meta unveiled CWM, an open-weights LLM for research on code generation with world models, in CWM: An Open-Weights LLM for Research on Code Generation with World Models, emphasizing training on program traces to improve tool-use and code execution understanding.
- Builders compared notes on similar ideas (e.g., interpreter traces), calling CWM a plausible path to more sample-efficient coding agents while they await concrete benchmarks and sizes.
Modal Muscles Remote Code Agent Rollouts: Members credited Modal with powering remote execution for large agent rollouts in the wake of FAIR’s CWM buzz, sharing a post-run screenshot attachment and praising cold/warm/hot start tradeoffs while noting missing MI300 support.
- Operators highlighted that elastic executors and controlled start distributions lower tail latencies for eval sweeps, making Modal attractive for orchestrating code-agent experiments at scale.
Windsurf Bets Big on Tab Completion: Windsurf prioritized advanced tab completion via context engineering and custom model training, with Andrei Karpathy commenting in this tweet.
- Users expect deeper repo-aware completions and latency wins, framing tab-complete quality as the top lever for perceived coding productivity in IDE agents.

3. GPU Systems & Diffusion Scale-Ups

Hugging Face Ships Context-Parallel Diffusion: Hugging Face announced native context-parallelism for multi-GPU diffusion inference, supporting distributed attention flavors Ring and Ulysses, per Sayak Paul.
- Practitioners see CP as a key unlock for high-resolution, long-context diffusion serving, reducing single-GPU memory bottlenecks without rewriting model code.
PTX Consistency Papers Keep GPU Devs Honest: Members circulated formal work including A Formal Analysis of the NVIDIA PTX Memory Consistency Model and Compound Memory Models (PLDI’23) that proves mappings for CUDA/Triton to PTX despite data races and details heterogeneous-device consistency.
- While some found it “too heavy on formal math”, others noted tools like Dat3M uncovered real spec bugs, arguing these formalisms guide fence placement and compiler correctness.
Cutlass Blackwell Teaches TMEM Tricks: NVIDIA Cutlass examples show SMEM↔TMEM tiled copies via tcgen05.make_s2t_copy/make_tmem_copy with helpers to pick performant ops—see the dense blockscaled GEMM example and helpers—and TmemAllocator reduces boilerplate vs raw cute.arch.alloc_tmem.
- Kernel authors trading notes reported fewer foot-guns moving tiles between TMEM and SMEM, a must for high-throughput Blackwell blockscaled GEMM paths.

4. Evaluations and Proactive Assistants

OpenAI Drops GDPval for Real-World Tasks: OpenAI introduced GDPval, an evaluation targeting economically valuable, real-world tasks, outlined in GDPval.
- Engineers welcomed a shift toward grounded evals, hoping for transparent task specs and reproducible harnesses to compare across models and tool-use stacks.
ChatGPT Pulse Goes Proactive: OpenAI launched ChatGPT Pulse, a proactive daily update experience from chats, feedback, and connected apps—rolling out to Pro on mobile—per the announcement.
- Some in the community quipped “oai cloned huxe” and debated privacy knobs and notification hygiene for dev and enterprise settings.
Microsoft 365 Copilot Adds Claude: Anthropic announced Claude availability in Microsoft 365 Copilot, per Claude in Microsoft 365 Copilot.
- Builders read this as a competitive realignment in enterprise AI assistants, with one user joking “microsoft are on the rebound after a messy breakup.”

5. Training Tricks: Losses, Merges, and Data

Tversky Loss Gets a ‘Vibe Check’: Members highlighted the paper Tversky Loss Function and shared implementations, including a CIFAR-10 vibe check network and a Torch port repo, with one run reporting ~61.68% accuracy and ~50,403 trainable params on a 256→10 head (paper PDF).
- Suggestions included verifying with a single XOR task and probing speed vs MLP baselines; one run hit ~95% XOR accuracy at 32 features with notes on initialization asymmetry.
Super-Bias Mashes LoRAs Like a DJ: Researchers floated Super-Bias, a mask-aware nonlinear combiner that trains a small MLP on expert outputs + binary masks to ensemble LoRAs, claiming parity with full fine-tunes at a fraction of cost and enabling hot-swaps of experts.
- Teams discussed treating different domain LoRAs as experts and fusing them post-hoc, avoiding destructive hard merges and preserving a clean base-model.
MXFP4 Saves the 120B: For saving large QLoRA checkpoints, users recommended save_pretrained_merge(save_method="mxfp4") for GPT-like models to avoid 16 GB shard bloat from merged_16bit, producing native MXFP4 artifacts better aligned with GPT-like architectures.
- Engineers reported timeouts on 120B merges to remote stores; the MXFP4 path and local saves reduced failures and storage churn during consolidation.

gpt-5-mini

1. Model launches & leaderboard shakeups

GPT‑5 Codex lands on LMArena WebDev — coders audition a new star: GPT‑5 Codex was added to LMArena’s WebDev environment (software engineering/coding sandbox) and users started comparing it directly to Claude 4.1 Opus for code generation and Godot scripting.
- Threads focused on real‑world coding tests and anecdotal runs, with some users claiming GPT‑5 Codex beats Opus on certain tasks while others still prefer Claude for instruction fidelity; the consensus is to benchmark on your repo and language (Godot users in particular tested scene/script generation).
Qwen3 drops and leaderboard reshuffle — Qwen3‑max + Qwen3 coder join the fight: LMArena announced new Qwen3 flavors including qwen3-max-2025-09-23 and a Qwen3‑coder variant targeted at dev/web flows, pushing fresh entries onto the platform’s model roster.
- Community discussion parsed model names/dates and encouraged head‑to‑head runs (reasoning, code, VL tasks) to see where Qwen3 variants actually outpace existing models; some reports praised Qwen3’s reasoning while others still flagged hallucination issues.
Seedream vs Nano Banana — image leaderboard tug‑of‑war: Seedream‑4‑2k now sits tied atop the LMArena Text‑to‑Image leaderboard with gemini‑2.5‑flash‑image‑preview (nano‑banana), sparking fresh comparisons on fidelity and prompt sensitivity.
- Users emphasized that the right prompt+tool+purpose matters more than raw rank — some swear by Nano Banana, others find Seedream superior for specific styles, and the thread included GPU requests ( >16GB VRAM, mentions of a 96GB Huawei GPU ) for running top image models locally.

2. Image‑generation arms race & inference tooling

Qwen image editor courts creators — open source rival to Nano Banana?: Members reported that Qwen’s new image editor—described as open source—produces higher‑quality edits than Google’s Nano Banana in many tests and is attracting interest for local runs.
- Practical conversations pivoted to hardware (users asked for GPUs with 16+GB VRAM and mentioned a 96GB Huawei card) and to workflow choices: some recommend cloud inference for quick experiments, others pushed local setups to avoid provider bias.
Gemini 2.5 Flash keeps pressure on image top spots: Gemini 2.5 Flash (Flash & Flash Lite previews) continues to be a major contender in llm‑vl image benchmarks, and LMArena added Gemini Flash variants to its lineup.
- Community members joked about perceived score inflation on public leaderboards but still ran structured comparisons; several people advocated for task‑matched evaluation (editing vs pure generation) rather than single aggregated rank.
Diffusion & decoding optimizations show up in infrastructure threads: Hugging Face and contributors discussed shipping context‑parallelism for faster diffusion inference on multi‑GPU setups, with distributed attention flavors like Ring and Ulysses to scale decoding.
- The conversation focused on real deployment tradeoffs (communication/attention splits) and linked to early tweets/info about the CP API, with practitioners noting this will matter most for high‑resolution image generation across GPUs.

3. Training, fine‑tuning, and experiment tooling

Saving huge models: save_pretrained_merge timeouts and mxfp4 to the rescue: Users trying to save a GPT‑like 120B QLoRA hit timeouts with save_pretrained_merge, and community members recommended save_method="mxfp4" to avoid 16GB‑shard explosions and improve compatibility for GPT‑style checkpoints.
- The discussion included practical tips (switch save method, check shard sizes) and warnings about tooling immaturity for very large finetuned models — folks advised testing small merges first before committing long runs.
P100 GPUs: still terrible for modern finetuning: Multiple users warned that NVIDIA P100 16GB cards are ‘garbo for training’ because of old SM architecture and lack of hardware FP16/BF16, making multi‑GPU finetuning painfully slow despite memory pooling tricks like ZeRO3.
- Advice converged on buying modern Ada/Blackwell‑class cards or renting spot instances for training; threads included pragmatic cost/perf tradeoffs and links to L40S/RTX 6000 datasheets for those planning infrastructure upgrades.
Tversky loss experiments: neurocog ideas make it to CIFAR & XOR tests: A member shared interest in the Tversky Loss paper (arXiv:2506.11035) and published a small repo implementing a ‘vibe check network’ on CIFAR‑10 at github.com/CoffeeVampir3/Tversky‑Cifar10.
- Community suggestions included simple verification tasks (train XOR) and parameter‑sweep ideas; results so far showed promise but members emphasized fair baselines and parameter counts to avoid misleading comparisons.

4. APIs, infra, and remote execution

Perplexity ships a Search API — web grounding for LLMs: Perplexity launched the Search API (blog: introducing the perplexity search api) plus an SDK (Perplexity SDK docs) to give devs raw results, filters, full page text and transparent citations for grounding LLM answers in live web content.
- Users compared it to Sonar (some mentioning Sonar’s pricing), reported early SDK streaming/parsing problems (Python SDK streaming returning unparseable JSON), and asked for rich filters and playground tooling — overall reaction: powerful but still rough at the edges.
Chrome DevTools MCP public preview — agents can drive real browsers: Google unveiled the public preview of Chrome DevTools MCP, a server enabling AI coding agents to control and inspect live Chrome via CDP/Puppeteer (announcement: https://x.com/chromiumdev/status/1970505063064825994).
- Developers highlighted immediate use cases—automated end‑to‑end testing, agentic scraping, and agent tool integration—and discussed security/permission models for exposing a live browser to an LLM‑driven agent.
OpenRouter pricing glitch: free endpoint charged for 26 hours, refunds issued: On September 16th OpenRouter mistakenly priced qwen/qwen3-235b-a22b-04-28:free for ~26 hours, causing credit deductions; the team automatically refunded impacted users and added extra validation checks to prevent recurrence.
- Users appreciated the prompt refunds but used the incident to press for stronger provider‑level validation and billing transparency in aggregator platforms; the episode spurred questions about operational safeguards for free vs paid endpoints.

5. Agent‑first products & one‑click deployers

Moonshot Kimi launches OK Computer — one‑link site/app agent: Moonshot AI released OK Computer, an agent mode that generates polished sites/apps in one pass (text+audio+image), supports team‑level polish and offers one‑link deployment (see the X post: https://x.com/Kimi_Moonshot/status/1971078467560276160).
- Users praised the idea of a deployable single‑link flow but flagged product bugs (missing download all, corrupted zips) and subscription‑based quota differences (free vs moderato/vivace plans), noting real‑world usability depends on polishing export reliability.
Kimi vs distilled Qwen debate — mini models or distills?: Community members debated whether Moonshot should ship a mini Kimi or instead distill Qwen models onto K2 hardware; several argued distilling a smaller Qwen is more plausible given reasoning improvements only appearing in Qwen 2.5+.
- The thread mixed strategic product thinking (what attracts users/investors) with technical realism (distillation tradeoffs), and many recommended trial distills over maintaining multiple full‑size variants.
Agent prompts aim for cash — initial OKC seed prompt screams Product Market Fit: Observers noted the OK Computer demo uses a money‑forward initial prompt “Build a SaaS for content creators, aiming for $1M ARR”, which attracted jokes that the agent is tuned to create investor‑friendly outputs.
- Reactions split between amusement and concern: some see it as a pragmatic growth hack to attract creators/VC attention, others warned that baking business aims into starter prompts biases outputs toward monetizable scaffolds rather than purely utility‑focused designs.

Discord: High level Discord summaries

LMArena Discord

GPT-5 Codex Joins LMArena WebDev: GPT-5 Codex has been added to LMArena, but is exclusively available on the WebDev version for software engineering and coding tasks.
- Users are debating whether GPT-5 Codex surpasses Claude 4.1 Opus for code generation, particularly for coding with Godot.
Qwen Image Editor Rivals Nano Banana: Members suggest that Qwen’s new image editor is superior to Google’s Nano Banana, is open source, and generates higher-quality images.
- The community is requesting GPU recommendations with over 16GB VRAM to run these models, specifically mentioning the 96GB Huawei GPU.
Seedream Surpasses Nano Banana for Top Image Dog: Seedream-4-2k shares the top position on the Text-to-Image leaderboard with Gemini-2.5-flash-image-preview (nano-banana).
- Some users still find Nano Banana to be the best, others believe Seedream 4 has surpassed it, but the right prompt, tool, and purpose are required to make good images.
Image Modality Bug Plagues LMArena: Users have reported a bug in LMArena where uploading an image in Text Mode automatically switches to Image Generation, even after fixes were implemented in canary versions.
- Some find that clicking the button to turn it off upon pasting in or uploading an image resolves the issue.
Navigating LMArena Rate Limits: Users are facing issues with incorrect rate limit timers and models getting stuck mid-generation, with this being a known bug.
- It was noted that long chats and Cloudflare issues may be contributing to the problem, and that starting a new chat is often the only fix.

Unsloth AI (Daniel Han) Discord

P100 GPUs are Garbo for Training: A member asked about the expected performance of a multi-GPU rig with P100 16GB GPUs for fine-tuning, but was told that P100s are garbo for training due to an old ass SM with no modern CUDA or hardware FP16/BF16 support.
- The discussion also covered the fact that memory is not additive and while it might work with ZeRO3, it would be very slow.
Trainer Troubles Produce TensorBoard Triumph: A member sought help to display the eval/loss graph during training, and found that they needed to use an integer to specify the eval_steps, rather than the 0.2 value they had copied from Trelis’s notebook.
- After resolving the issue, they were thankful and excited, exclaiming that it was their first time using tensorboard and expressing relief that there was a setting to avoid manual refreshing.
Saving is Super with save_pretrained_merge!: A member encountered timeout errors while saving a GPT-like 120b QLoRA model using save_pretrained_merge, and another member recommended using save_method="mxfp4" for better GPT-like support.
- The method saves in native mxfp4 format and avoids the 16GB shard increases associated with merged_16bit mode.
Tversky Vibe Check Network Vibes High: Excited about the potential of the Tversky Loss function from this paper, a member created a vibe check network for CIFAR-10, noting that it appears promising.
- Another member suggested training a single XOR function to verify its functionality, and inquiring about its speed compared to traditional fully connected layers.

Perplexity AI Discord

Perplexity Debuts Search API for Devs: Perplexity launched its Search API giving developers access to its comprehensive search index via a blog post.
- The API provides tools for grounding answers in live web content, similar to Sonar, with features like raw results, filters, and transparency, with a new SDK simplifying integration.
Qwen and Gemini Face Off in Image Arena: Members compared Qwen 3 Max for reasoning against Gemini for detailed 3D simulations.
- One member sarcastically quipped that GOOG shareholders are really inflating the scores for visual ability on llmarena.
Python SDK’s Streaming Responses Break: A user reported that the Python SDK is failing to stream responses correctly, yielding unparseable JSON, with reference to the API docs quickstart guide.
- Another member chimed in that there’s no solution for this yet, indicating an ongoing issue.
Cosmic Carl Ponders 3I/ATLAS: A member’s Carl Sagan-themed reflection on 3I/ATLAS beckons listeners to humbly listen to the universe, shared via Perplexity AI search.
- This unique take blends cosmic wonder with AI search, showcasing a creative application of Perplexity.

OpenRouter Discord

Qwen Model Pricing Glitch Triggers Credit Chaos: On September 16th, the qwen/qwen3-235b-a22b-04-28:free endpoint was mistakenly priced for 26 hours, causing incorrect credit deductions.
- The team automatically refunded impacted users and implemented additional validation checks to prevent future pricing mix-ups.
Horizon Alpha Vanishes, Users Vanquished: A user urgently inquired about the whereabouts of Horizon Alpha, stating ‘I was using it in production and now it’s not working’.
- They also questioned if they were being targeted and when the issue would be resolved.
Filthy Few Favor Dirty Talk Models: A user inquired about the best models for RPing, specifically seeking ‘any of dem dirty talk models?’.
- Another member mentioned opening a new LLM frontend called JOI Tavern.
Zenith Sigma’s Shady Stealth Sparks Speculation: Users discussed the stealthy Zenith Sigma model, with one user joking they couldn’t even find it.
- Another user claimed Zenith Sigma is actually Grok 4.5.
Microsoft Copilots Claude - A Comeback Story?: Members shared that Claude is now available in Microsoft 365 Copilot.
- This marks a significant stride for Microsoft, especially after a messy breakup, with one member noting ‘microsoft are on the rebound after a messy breakup’.

Cursor Community Discord

Exa-AI Beats Web for MCP Search: Users are using Exa-ai (exa.ai) for searches within MCP, stating it provides $20 in credits upon signup and performs better than the @web tool.
- Instructions were shared on how to set it up in MCP.json, including obtaining an API key and adding configuration details.
MCP Clarified as Custom Tool API for Cursor: Members clarified that MCP (Multi-context Programming) is an API for agentic use to add external tools to Cursor.
- Confusion arose when a user mistook it for a design tool capable of creating designs from images and webpages.
Generated Commit Messages Ignore AI Rules: Users report that generated commit messages are not obeying the set AI Rules and are being generated in an unwanted language.
- A member confirmed that this is a known bug and might be added in future updates.
Chat Window Scroll Request for Chat Tabs: A user requested that the chat window should automatically scroll to the bottom when switching between chat tabs, for the latest activity.
- A member pointed out that a notification is already given if there’s something that the user needs to click on.
Users Complain About GPT5-HIGH Model Degradation: Users expressed disappointment with the GPT5-HIGH model, observing that it has become less capable over time.
- One user joked that the model should be told to get off it’s ass and write the code when it only provides instructions instead of completing the task.

LM Studio Discord

LM Studio Bolsters Chat Experience: LM Studio 0.3.27 brings new features like Find in Chat and Search All Chats to improve chat functionality, alongside a ••• menu for sorting chats by date or token length.
- A new lms load --estimate-only <model> command estimates resource allocation for model loading, streamlining the planning process; details are available in the release notes.
Linux Plugins Lag in LM Studio: Users noted that the Linux version of LM Studio has fewer plugins compared to the Windows version, specifically lacking in options beyond RAG and a JS playground plugin.
- This discrepancy limits the functionality available to Linux users compared to their Windows counterparts.
Fine-Tuning Faceoff: Ollama vs RAG: A debate arose over whether using Ollama to inject data into a model constitutes true fine-tuning, versus simply performing RAG.
- One member argued that with a Python setup, data injection can create new weights, leading to an interactive model with custom tool usage, independent of prompts.
LM Studio Update Plagued by Pesky Problem: Users reported issues when updating LM Studio, encountering errors like failed to uninstall old application files, hindering the update process.
- Fellow members recommended enabling visibility of hidden folders in Windows and manually deleting old files from directories like AppData\Roaming\LM Studio to resolve the issue.
GPU Gems: Budget Beasts Brawl: The optimal budget GPU is debated, with the 2060 12GB ($150) and 3060 12GB ($200) emerging as frontrunners, while others suggested a used 3090 for $600-$700.
- Caution was advised against used workstation cards, with one member declaring that Tesla generation is not recommended for AI/LLM use anymore tbh, basically e-waste.

OpenAI Discord

OpenAI Pulses with New Products: OpenAI launched GDPval to evaluate AI on real-world tasks as described in their blog post and ChatGPT Pulse to proactively deliver personalized daily updates from chats, feedback, and connected apps detailed in their blog post.
- ChatGPT Pulse is rolling out to Pro users on mobile devices.
GPT-5-Mini Lacks Common Sense: Members observed that GPT-5-Mini (High) seems to lack common sense, suggesting it is not AGI level yet while members noted that GPT-OSS-20B is possibly the most censored model ever.
- One member stated that it noped out from a specific prompt.
Discord Devs Dream of AI Rocket League Bot: Members proposed creating a Rocket League Discord bot powered by AI to analyze player stats, identify strengths & weaknesses, and create personalized training plans, targeting the untapped francophone market with a premium subscription model.
- Others doubted that an LLM could give good advice, suggesting instead to analyze the xyz coords from the replay files, and use AI against the raw numbers.
ChatGPT Defaults to Agent State: ChatGPT defaults to an “Agent” state (problem-solver, instructable worker) upon initialization, rather than a “Companion” state (co-creator, guide).
- To maintain the “Companion” mode, users are pinning instructions like “Stay in Companion mode unless I explicitly say switch to Agent. Companion = co-pilot, not order-taker.” to the model to lock it in that mode, or they reset with the command “Go back to companion mode.”
Chain-of-Thought Prompting Confusion Clarified: Members discussed how requesting excessive Chain-of-Thought (CoT) prompting can statistically reduce model performance, especially on current thinking models.
- Instead of ambiguous instructions, one member suggested prefacing responses with a structured format including ultimate desired outcome, strategic consideration, tactical goal, relevant limitations, and next step.

HuggingFace Discord

Duolingo Doomed by Dedicated Disciple: A member deleted Duolingo, citing annoyance and inefficiency compared to immersing themselves in a local environment and leveraging AI for learning.
- They criticized the addiction to streaks over fundamental learning, suggesting they’d torch the bird alive.
Unhinged LinkedIn Lunacy Lands Likes: A member shared a strategy of posting unhinged shit on LinkedIn to gain engagement, while another grinds Rocket League to brag about their rank.
- They joked about writing a post called what my plat 3 rank rocket league friend taught me about business.
Driver Devastation: GPU Gone Dark: A member is experiencing a frustrating issue where their monitor goes black whenever the GPU is activated, affecting both Windows and Linux systems.
- Despite numerous attempts to correct the drivers, they are forced to run the monitor off their motherboard, indicating a persistent GPU-related problem.
Diffusion Decoding Discussions Debut: A member announced a reading and discussion of the paper Understanding Diffusion Models: A Unified Perspective by Calvin Luo (https://arxiv.org/abs/2208.11970) to occur on Saturday at 12pm ET.
- The paper provides an overview of the evolution and unification of generative diffusion models, including ELBO-based models, VAEs, Variational Diffusion Models (VDMs), and Score-Based Generative Models (SGMs).
Context-Parallelism Conjures Quicker Computation: Native support for context-parallelism is being shipped to help make diffusion inference faster on multiple GPUs.
- The CP API is made to work with two flavors of distributed attention: Ring & Ulysses as noted in this Tweet.

Moonshot AI (Kimi K-2) Discord

Kimi Launches OK Computer Agent Mode!: Moonshot AI launched OK Computer, a new agent mode designed to ship polished sites and apps in one go, with key features including personalized outputs, multimedia generation (text + audio + image), team-level polish, and one-link deployment.
- Users can deploy and share their creations instantly with a single link, more details on the official X post.
Skip Kimi Mini, Distill Qwen?: One member doubted that Moonshot would release a smaller version of Kimi, suggesting that a smaller Qwen model distilled on K2 is a better bet, citing that Deepseek made Qwen distills because Qwen didn’t have (good) reasoning until Qwen 2.5.
- This comment reflects broader speculation about the strategic direction of Moonshot AI and potential model development paths.
OKComputer Designed to Attract Capitalists?: Several members joked that the new Kimi Computer agent, particularly with its initial prompt “Build a SaaS for content creators, aiming for $1M ARR,”, is designed to attract capitalists.
- A member quipped it was “another website generator with some weirdly scoped features.”
Computer Use Has Higher Quota with Subscription: Members reported initial issues with the OK Computer feature, including a missing download all button and a corrupted zip file.
- One member noted that “entering chat makes the OKC button disappear”, the amount of OK Computer usage you get depends on whether or not you subscribe to moderato/vivace plans, giving you more quota.
Kimi better Plans than Qwen: Members discussed using Kimi to make plans for Qwen or DeepSeek to follow, noting that “Kimi always makes better plans” and it can cover a wider range of requests.
- One member observed that Qwen3-max constantly hallucinates and doesn’t come close to Kimi.

GPU MODE Discord

Modal Rolls Out Code Agent Execution: Modal now powers remote execution for code agent rollouts, demonstrated after the release of the new CWM paper from FAIR.
- Members praised Modal’s distribution of cold/warm/hot start times relative to cost, but noted that it lacks MI300 support.
CUDA Headers Playing Hide-and-Seek: A developer reported that CUDA headers weren’t being automatically included, causing functions like cudaGraphicsGLRegisterImage and tex2d to be undefined when using Visual Studio 2022 and the latest CUDA toolkit.
- As a workaround, the developer was advised that explicitly including cuda_gl_interop.h would solve the problem.
Torchrun API troubles trigger package predicament: A user encountered issues with the torchrun API, finding that torchrun --help produced output different from the official documentation.
- The issue was resolved by realizing that both torch and torchrun were in pyproject.toml, and that torchrun is a separate package (torchrun on pypi).
GPUs get Formally Analyzed for Consistency: A paper on “A Formal Analysis of the NVIDIA PTX Memory Consistency Model” discusses proving that languages like CUDA and Triton can target PTX with memory consistency, despite PTX allowing for data races.
- The member found the paper leaning too heavily on formal languages and math to be immediately useful.
Heavy Duty GPU Stand Makes its Debut: A member designed a heavy-duty GPU stand for a collection of old GPUs, including dual-slot models, noting it is more robust than existing designs on Thingiverse.
- They indicated they might share the design online later if there’s interest.

Yannick Kilcher Discord

Sine Alone Good Enough?: Members debated if both sine and cosine are needed in sinusoidal positional embeddings, with one suggesting sine alone might work and linking a blogpost for context.
- Coding experiments showed that linear regression can approximate sine + cosine embeddings well with sine alone in the interval [0, a]; however, max error on points hovered around 6e-12.
SWE-bench Verification Draws Ire: Alexandr Wang triggered conversation by tweeting that people still using SWE-bench verified is a good indicator of brain damage.
- In the same thread, AlphaEvolve’s sample efficiency was lauded and linked to Sakana AI Labs.
B200 Cloud Compute Hits Spot Market**: Members spotted B200s available for $0.94 USD on Prime Intellect.
- The specific configuration included B200_180GB GPUs and an Ubuntu 22 image with CUDA 12 in the “Cheapest” location.
RL TTS Shows Promise**: A user highlighted a research paper exploring the efficiency gains of using a mid training technique involving a bootstrapping RL TTS.
- They noted the most significant improvements were observed in the trace tracking benchmark.

Latent Space Discord

Chrome DevTools MCP Goes Public: Google announced the public preview of Chrome DevTools MCP, a new server that lets AI coding agents control and inspect a live Chrome browser through CDP/Puppeteer via this tweet.
- This release allows developers to programmatically interact with Chrome, potentially streamlining tasks like web scraping and automated testing.
Cursor’s CPU Usage Alarms Users: Users reported high CPU usage from Cursor, a code editor, and attached a screenshot showing high CPU usage.
- The issue is suspected to be related to VSCode or a specific extension, but the exact cause remains unclear.
Meta Demos Code World Model: Meta revealed their Code World Model in this tweet, aiming to enhance code generation and understanding.
- The announcement did not include detailed specifications or performance benchmarks for the model.
Windsurf places Tab Completion As Top Priority: Windsurf is prioritizing tab completion using context engineering and custom model training, with Karpathy also commenting on windsurf.
- The effort is part of a larger evaluation, potentially influencing the acquisition of conew senpaipoast.
OpenAI clones huxe with ChatGPT Pulse: OpenAI launched ChatGPT Pulse, causing members to comment that oai cloned huxe and linked to the launch announcement.
- The community reaction suggests concerns about originality and competitive overlap in the AI assistant space.

Eleuther Discord

AI Psychology Project Introduces Musical Interlude: An AI psychology project was introduced with a musical intro derived from a recent paper, potentially forming a framework to interpret how prompt language influences model behavior.
- A member cited work on linking language use to personality traits, to suggest that it could help to assess the degree to which personality shaping can impact model behavior and further inform prompt engineering practices.
Transformer Position Embedding uses Sinusoidal Matrix: When asked about positional embeddings, it was clarified that transformers utilize a matrix of sine and cosine pairs due to the periodic nature of wave functions.
- While a hour number suffices in small contexts, larger contexts necessitate day, month, or year number to resolve ambiguity.
Knowledge Graph Completion Enables Style Transfer: A member proposed that a knowledge graph completion perspective could be formulated to solve for style transfer by thinking of the transfer as ‘shallow’ inference.
- A relevant Twitter thread was used to support the claim that complexity could be measured by relational depth from established information, though bridging this to practical LLMs is challenging.
GPT-5 Guides Evolutionary Algorithm Learning: Instead of focusing on classical papers, it was recommended that learning the basics of an ‘evolutionary algo for kids’ should be derived from GPT-5 and focused on agenetic/LLM parts.
- The AlphaEvolve paper was recommended as a starting point.
Super-Bias Combines LoRAs like a Boss: Super-Bias, a mask-aware nonlinear combiner for ensemble learning, trains a small MLP on expert outputs plus binary masks, potentially hitting the same (or better) performance as ‘proper’ full fine-tuning or hard merges.
- It was suggested to treat different LoRAs as ‘experts’ and using Super Bias as the combiner, allowing swapping LoRAs in/out without retraining the base model and retraining just the combiner in seconds to adjust for new LoRAs.

Nous Research AI Discord

Meta Launches Code-Writing CWM: Meta introduced CWM, an open-weights LLM for research on code generation with world models.
- A member mentioned having a similar idea involving training on python interpreter traces, which has implications for how we might approach future LLMs.
Nous Eyes arXiv Training Data: It was suggested that Nous could train its AI using data from arXiv, highlighting that they have an API to download any amount of papers.
- Teknium confirmed that it’s permissible, suggesting that it could be a viable option for expanding the training dataset and potentially improving model performance.
Granite 4 Full-Attention Model Incoming: There is a possibility of a full attention Granite 4 model and 8 privated models being developed, marking a potential advancement in the Granite series.
- Community members noted that the models mentioned were older, with Hermes 4 and 3 being the latest, suggesting a need for updated information on current developments.
RMS_NORM Gets METAL Support: A pull request was made to unify the RMS_NORM and NORM implementations and extend support for more shapes in METAL.
- This enhancement is expected to improve how quantized models work with their transformer-based counterparts, potentially leading to more efficient and accurate computations.
AlphaXiv Liberates Scientific Papers: A member shared a paper link from AlphaXiv, a service for accessing research papers, seemingly bypassing a login wall.
- Another member appreciated the time saved from web searching for freely accessible research, which speaks to the utility of such platforms in overcoming access barriers.

DSPy Discord

LLMs Going Verbatim on PDFs: Discussion arose around the utility of an initial LLM pass for processing PDFs to save text verbatim while preserving layout when using Attachments, particularly for better layout and image understanding compared to OCR.
- Suggestions included straight PDF OCR with Chain of Thought (CoT) or models like Qwen with DSPy for OCR, while acknowledging VLM’s necessity for complex layouts.
Gemini 2.5 Pro Knows Layouts: Gemini 2.5 Flash shows promise in understanding layouts, with the Pro version potentially excelling in section/column identification and verbatim extraction, even with tricky PDF formatting.
- A user shared a paper on directly utilizing Gemini for this purpose.
DSPy Users Attach PDFs With Ease: A user struggling with DSPy for the first pass in PDF processing discovered a working example with Attachments available at github.com/maximerivest/Attachments.
- After resolving previous 429 errors, the user is now able to progress with using DSPy.
Boston Becomes DSPy Town: A member is promoting a DSPy event in Boston on October 15th and is encouraging other community members to attend or help spread the word.
- Another user then replied hoping that the event would come to Seattle sometime soon.
Long Contexts Yielding Bad ColBERT: A user reported poor performance with longer context lengths, noting that repeating the CLS token did not fix the problem.
- The consensus suggests limitations when handling extended context lengths and models, with suspicion of a method limitation or implementation error, not necessarily the CLS token.

aider (Paul Gauthier) Discord

Aider’s Clear Command Only Clears Chat: The /clear command in aider only clears the chat history, while added files remain in the context, users can use /context to see token usage, as described in the docs.
- A user was confused initially, thinking it removed all context in the session, but this clarification resolved their confusion.
Aider Lacks Web Access: A user asked about giving aider access to Internet search, but this isn’t available in the main branch, though the /web command lets you scrape content.
- You can scrape a website using the /web https://www.example.com/ command.
Keep Current on Coding LLMs by Trying Them: A user asked how others stay updated on which LLM is best for coding and cost.
- The consensus is that most users keep updated by simply trying the LLMs themselves, but that there are popular coding benchmarks to consider.
Re-running Polyglot Benchmark Error Outputs: A user asked if it’s possible to re-run the polyglot benchmark only for the tests with error_outputs after the LLM server crashed during the previous run.
- Other users did not respond with confirmation on whether it was possible.

Manus.im Discord Discord

Manus PDF download gets stuck: A user reported that Manus got stuck downloading a PDF while researching accounts, even after manually downloading it and providing a link.
- The user expressed frustration that Manus kept asking to upload the file despite it being a PDF already on the desktop.
Beta Pro Access Questioned: A member asked how to get beta pro.
- The discussion included attached images, though they don’t provide any context on how to acquire Beta Pro Access.

MCP Contributors (Official) Discord

ModelContextProtocol’s Array Contents Undocumented: The ReadResourceResult.contents array within the ModelContextProtocol lacks documentation regarding its purpose and semantics.
- Questions have been raised concerning the array’s intended use cases, such as handling folders containing multiple files or delivering identical content in different formats.
Web Resources Merge HTML and Images: The inclusion of an array in ReadResourceResult.contents proves advantageous for Web Resources comprised of HTML and accompanying images.
- It is particularly useful when dealing with tokenizable/renderable MIME types that have not undergone negotiation.
Implicit Content Retrieval in ModelContextProtocol: A query was posed regarding whether resources/read("uri": ".../index.html") would automatically include style.css and logo.png within the content list.
- This inquiry underscores the possibility of automatically incorporating associated resources when retrieving a primary resource, streamlining the retrieval process.

tinygrad (George Hotz) Discord

Tinygrad to get Python Bindings: A member is actively developing python bindings for tinygrad which is maintained by George Hotz.
- This enhancement aims to facilitate direct installation via pip with a single, streamlined command.
Direct Pip Install Incoming: The project is striving for a direct pip installation method which is preferred by most python users.
- This improvement would enable users to effortlessly install the project with a single command, simplifying the setup process.

MLOps @Chipro Discord

Diffusion Models Paper Reading Group Kicks Off: A new Diffusion Model Paper Reading Group will be discussing the Understanding Diffusion Models: A Unified Perspective paper this Saturday at 12pm ET.
- The paper gives an overview of the evolution and unification of generative diffusion models like VAEs, VDMs, and SGMs.
Beginner-Friendly GenAI Conversation Starts: The paper reading group is beginner-friendly, requiring only curiosity and a love for GenAI, aiming to build a solid foundation in diffusion models without needing coding or ML background.
- Interested participants can join at luma.com/1gif2ym1.

Windsurf Discord

Patch 1.12.9 Targets Performance Dips: The 1.12.9 patch aims to rectify the slowness issues observed since version 1.12.6.
- Users are urged to update and verify if the patch resolves their performance problems.
Windsurf Support Ticket for Persistent Issues: Users are directed to submit a support ticket via Windsurf Support if the 1.12.9 patch doesn’t alleviate slowness.
- This measure ensures unresolved issues are addressed individually.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1051 messages🔥🔥🔥):

GPT-5 Codex arrival, Qwen image editor, Gemini Flash vs nano banana, DeepSeek models

GPT-5 Codex Joins LMArena WebDev: Users are excited about the addition of GPT-5 Codex to LMArena, but note it’s currently only available on the WebDev version for software engineering and coding workflows.
- There is discussion about whether GPT-5 Codex is better than Claude 4.1 Opus for code generation and whether it can now write good Godot code, which it previously struggled with.
Qwen Image Editor Rivals Nano Banana: Members are saying that Qwen’s new image editor is better than Google’s Nano Banana, is open source, and makes better images.
- Users are also seeking recommendations for GPUs with more than 16GB VRAM to run these models, including the 96GB Huawei GPU.
Image Modality Bug Plagues LMArena: Users reported a bug where uploading an image in Text Mode automatically switches to Image Generation, despite fixes in canary versions.
- Some find that pressing the button to turn it off upon pasting in or uploading an image fixes this issue.
Navigating LMArena Rate Limits: Users report getting stuck in a loop with incorrect rate limit timers and models getting stuck mid-generation, and this is a known bug that page refreshes can sometimes fix.
- It was mentioned that long chats and a potential Cloudflare issue can cause this problem, and creating a new chat is often the only solution.
Gemini Flash Battles Nano Banana for Top Image Dog: Some users believe Nano Banana is still the best, while others think Seedream 4 has surpassed it since its release.
- There was general agreement that you need the right prompt, right tool, right purpose to make good images, not that one tool is always better than another.

LMArena ▷ #announcements (4 messages):

Qwen3 models, GPT-5 Codex, Seedream-4-2k, Gemini 2.5 Flash

Qwen3 Quartet Quenches Quests: New Qwen3 models have been added to the LMArena, including qwen3-max-2025-09-23, qwen3-vl-235b-a22b-thinking, and qwen3-vl-235b-a22b-instruct.
- A Qwen3-coder model has also been added to LMArena’s WebDev, alongside GPT-5 Codex.
Seedream Soars, Shares Summit: Seedream-4-2k has landed on the Text-to-Image leaderboard at #1, tied with Gemini-2.5-flash-image-preview (nano-banana)!
- On the Image Edit leaderboard, Seedream-4-2k is now ranked at #2.
Gemini’s Genesis: A Flash Flood: New models added to LMArena include gemini-2.5-flash-preview-09-2025 and gemini-2.5-flash-lite-preview-09-2025.

Unsloth AI (Daniel Han) ▷ #general (450 messages🔥🔥🔥):

GPT OSS 120B finetuning errors, MuonClip in Unsloth, Overabundance of information, AI safety research and Unsloth, P100 GPUs for finetuning

GPT OSS 120B hit timeout on Xet: While finetuning the GPT OSS 120B model with save_pretrained_merge, a member encountered a timeout error with Xet.
- They expressed confusion over the lack of support for Qwen3-VL-235B-A22B-Thinking-GGUF and wondered if there was some magic cooking.
Skip MuonClip for AdamW or LoRA Finetuning: A member inquired about the possibility of using Muon/MuonClip in Unsloth, noting its potential based on a check in llama-arch.h.
- The response advised against using Muon for finetuning with AdamW pretrained models or LoRA due to weird optimizer mismatch issues that could lead to worse results; Muon is better suited for pretraining and finetuning models pretrained with Muon in the FFT setting.
Unsloth Training Framework != Unsloth’d models: A member inquired about the use of Unsloth’d models in AI safety research, questioning the transformations applied by Unsloth and their impact on interpretability.
- It was clarified that Unsloth is primarily a training framework that speeds up training and reduces VRAM usage, with proprietary dynamic quantization algorithms available; the Unsloth team also fixes bugs in templates to ensure accuracy during training, and releases models on HuggingFace.
P100 GPUs a terrible buy for Finetuning: A member asked about the expected performance of a multi-GPU rig with P100 16GB GPUs for fine-tuning.
- It was advised that P100s are garbo for training due to old ass SM with no modern CUDA, too little memory per card, and no hardware FP16/BF16 support; the memory is not additive, and while it might work with ZeRO3, it would be very slow.
VR Flight Simulator for 48GB VRAM: A member stated that a VR flight simulator would need 48GB VRAM or more for realistic VR, as VR is like rendering in super high resolution like 16k or 32k.
- Discussions also covered the idea of using eye-tracking to render high DPI only where the eyes are looking, with a member noting that Apple headset did that actually.

Unsloth AI (Daniel Han) ▷ #off-topic (342 messages🔥🔥):

Eval dataset size, eval/loss graph, GPU Recommendations, Gemini Pro degradation, TrainerCallback functions

Eval Set Size Sparks Debate: Members debated the right size of an eval dataset, one member asking about the oddity of limiting an eval set to only 30.
- One member stated that 30 is a good number for statistically significant results, while another cautioned that the loss would be quite inaccurate with such a small number, especially when training for a specialized use case.
Trainer Troubles Produce TensorBoard Triumph: One member sought help to display the eval/loss graph during training, and found that they needed to use an integer to specify the eval_steps, rather than the 0.2 value they had copied from Trelis’s notebook.
- After resolving the issue, they were thankful and excited, exclaiming that it was their first time using tensorboard and expressing relief that there was a setting to avoid manual refreshing.
GPU Gazing and Gaming: Members discussed desirable GPUs, including the RTX 5090, NVIDIA RTX 6000 Ada Generation PRO, and NVIDIA L40S, weighing factors such as TFLOPs, VRAM, and price, with links to datasheets for L40S and RTX 6000.
- One member revealed they were running a 5090 and another called them a rich devil.
Gemini’s Genuflection Generates Grumbles: Members speculated that Gemini 2.5 Pro has been intentionally degraded, citing poor instruction following and use of world knowledge.
- One member posited, they intentionally made it worse so that gemini 3 looks better whereas another believes they are used to the newer gpt, grok, deepseek models which in general perform better.
Discord Dodges Damnable Deletions: Members discussed an increased rate of spam and phishing attempts in the Discord server.
- It’s believed that automod is effectively filtering out illegal material, also one member added this channel is popular since Mike did try to promote this channel as much as possible.

Unsloth AI (Daniel Han) ▷ #help (110 messages🔥🔥):

Runpod Access, Llama 3 vs Gemma, Qwen 2.5 VL finetuning, Saving 120b models, Multi GPU Training

Company Hardware Hookup Hopes High!: A member expressed excitement about potentially accessing their company’s hardware for a vision project, hoping to avoid spending $500 on Runpod.
Llama 3 is Like Putty!: A member recommended Llama 3 for finetuning, describing its brain as “like putty” that “will easily mold to what you want.”
- Another member suggested Gemma for a Gemini flair and described distillation as teaching a student model to behave like a teacher model.
Qwen2.5-VL Vision Fine-Tuning Ventures!: A member inquired about fine-tuning Qwen2.5-VL for domain-specific knowledge using text and video data.
- Another member explained that vision models need to be trained per frame since Qwen2.5-VL only accepts image, text, and bounding box inputs.
Saving is Super with save_pretrained_merge!: A member encountered timeout errors while saving a GPT-like 120b QLoRA model using save_pretrained_merge.
- Another member recommended using save_method="mxfp4" for better GPT-like support, as it saves in native mxfp4 format and avoids the 16GB shard increases associated with merged_16bit mode.
Multi-GPU Mayhem Mitigation!: A member reported getting stuck after “Initializing a V1 LLM engine” when using deepspeed or FSDP for multi-GPU training with Unsloth.
- Another member recommended using Accelerate and pointed to the Unsloth documentation for multi-GPU training instructions.

Unsloth AI (Daniel Han) ▷ #research (14 messages🔥):

Neurocognitive Modeling, Tversky Implementation, Vibe Check Network, XOR Function Verification, Tversky Parameters vs. Traditional NN

Tversky Loss Function Paper Sparks Interest: A member shared a fondness for neurocognitive modeling and deemed the paper on Tversky Loss Function as really great.
- The member followed up by sharing his own implementation of the method, as the paper had no repo, on GitHub.
User Implements a Tversky Vibe Check Network: Excited about the potential of the Tversky Loss function, a member created a vibe check network for CIFAR-10, noting that it appears promising.
- Another member suggested training a single XOR function to verify its functionality, and inquiring about its speed compared to traditional fully connected layers.
Tversky Implementation Parameters Evaluated: A user acknowledged that their Tversky implementation has more parameters due to the classification head, making it an unfair comparison with a control NN.
- After additional tests, the member found that going from 256->10 features resulted in 50,403 trainable parameters with 61.68% overall accuracy, noting that this is not a true measure of improvement.
Tversky XOR Test and Accuracy: A member ran an XOR test, achieving up to 95% accuracy with 32 features, despite a slightly different initialization than the paper.
- The user explained that zeros and a slightly asymmetric uniform make more sense given the network, though didn’t personally observe a 100% accuracy result in limited testing.

Perplexity AI ▷ #announcements (1 messages):

Perplexity Search API, LLMs, Sonar, SDK Integration

Perplexity plugs Developers into Search API: Perplexity introduces its Search API, granting developers access to Perplexity’s search index which covers hundreds of billions of webpages, as announced in their blog post.
LLMs given Live Web Content via API: The Search API provides developers with building blocks to ground answers in live web content, similar to how Sonar addresses the limitations of LLMs’ static training data.
- The API offers features like raw search results, full page text, domain filters, recency filters, academic & finance modes, and full transparency with URL, snippet, publish date, and last updated information.
SDK streamlines Integration: Perplexity offers a new SDK to make integration seamless for developers, enabling rapid prototyping.

Perplexity AI ▷ #general (832 messages🔥🔥🔥):

Airtel free premium, Qwen vs. Gemini, Perplexity image generation quota, Comet Stuttering, DeepSeek Terminus

Airtel provides Free Premium Accounts: Members confirmed that Airtel is offering one year of free premium access to Perplexity AI.
Qwen family vs Gemini for Image Generation: Members discussed Qwen 3 Max having strong reasoning capabilities as well as Gemini for generating detailed 3D simulations, sharing examples of both, while being surprised about Grok smoking wild st.
- One member suggested “GOOG shareholders are really inflating the scores for visual ability on llmarena”.
Perplexity Image Generation Quota is Limited: Members reported issues with image generation quotas not resetting and difficulties contacting support, despite paying for Pro/Max accounts.
- An admin clarified that Pro accounts have a limited monthly quota of high-quality images, with additional medium-quality options, directing users to contact [email protected] for billing issues.
Comet Users Experiencing Stuttering Videos: One user reported Comet is stuttering in videos (YT or twitch).
- Another user responded that pplx is not appropriate for video generation.
DeepSeek Terminus arrives: Members mentioned DeepSeek Terminus as a powerful new model, and were awaiting something new.
- One user said Dafuq Elon has Really Started Cooking Now.

Carl Sagan, 3I/ATLAS, Perplexity's Myspace Page, Arc Browser, Grogu

Sagan Ponders the Cosmos through 3I/ATLAS: A member shared a Carl Sagan-themed “scratchpad” reflection on 3I/ATLAS, inviting listeners to see it as an invitation to listen to the universe with humility and awe, via this Perplexity AI search.
Perplexity’s Myspace Page is here: A member created a “Perplexity’s Myspace Page” Labs output, available here.
The Death of Arc Browser: A member shared a page discussing the state of Arc Browser.
Grogu is here!: A member shared a link to Lucasfilm’s unveiled trailer and exclaimed: Go Grogu!

Perplexity AI ▷ #pplx-api (14 messages🔥):

Python SDK broken for streaming, Perplexity new Search API playground, Sonar vs Search API, Sonar charges

Python SDK streaming responses broken: A member reported that the Python SDK is broken for streaming responses, returning strings that cannot be parsed as JSON following the API docs quickstart guide.
- Another member confirmed that there is no solution for this yet.
Perplexity’s Playground Search API: Perplexity AI announced a new Search API playground as part of their latest Search API release.
- A member asked if it was better than Sonar, while another requested a filter field.
Search API based on Sonar: A member stated that the Search API uses Sonar AFAIK, providing a different output structure.
- Another member suggested that Sonar charges $5 per 1k web requests.

OpenRouter ▷ #announcements (1 messages):

Accidental price change, Refunds issued, Additional validations implemented

Pricing Glitch Hits Qwen Model!: On September 16th, the endpoint qwen/qwen3-235b-a22b-04-28:free was mistakenly set with a price for approximately 26 hours.
- During this time, requests to the free model incorrectly deducted credits and appeared with a cost in users’ activity pages.
Refunds Flow After Pricing Snafu: The team automatically refunded all impacted users in full following the pricing error on the qwen model.
- The incident caused confusion, but all users affected have been compensated.
New Validation Prevents Future Pricing Mix-Ups: Additional validation checks have been added to prevent recurrence of the accidental pricing issue.
- The measures aim to ensure that free models are correctly configured and do not incur unintended charges.

OpenRouter ▷ #general (567 messages🔥🔥🔥):

Horizon Alpha, Dirty Talk Models, Zenith Sigma, Grok's Storywriting, Distilled Models

Users Seek Horizon Alpha, Demand Answers: A user urgently inquired about the whereabouts of Horizon Alpha, stating ‘I was using it in production and now it’s not working’.
- They also questioned if they were being targeted and when the issue would be resolved.
Users are seeking ‘Dirty Talk Models’: A user inquired about the best models for RPing, specifically seeking ‘any of dem dirty talk models?’.
- Another member mentioned opening a new LLM frontend called JOI Tavern.
Users discuss Stealth Model ‘Zenith Sigma’: Users discussed the stealthy Zenith Sigma model, joking that it was so stealthy, one user couldn’t even find it.
- Another user claimed Zenith Sigma is actually Grok 4.5.
Grok Storywriting more Annoying than Opus: A user shared their ‘most insane take’ that Grok is less annoying than Opus for storywriting.
- Another user explained that every character wants to avoid conflict because conflict is mean when using Grok.
OpenRouter Addresses Provider Error Issues, Promotes Gemini 2.5 Flash: Users reported experiencing Provider Returned Error messages via the API, even with paid models.
- A member stated that Openrouter doesn’t rate limit paid models, and for Gemini 2.5 flash you’ll be all good on the provider side too, suggesting OpenRouter is Tier 9999 with all Google providers.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (58 messages🔥🔥):

Volume Discounts for OpenRouter, Microsoft 365 Copilot & Claude, Gemini-cli, Discussion and Helper Roles, Meta's CWM Model

OpenRouter Eyes Volume Victory: Members inquired if OpenRouter is big enough to negotiate volume discounts with providers like DeepInfra, Hyperbolic, Anthropic, and Vertex to offer savings to users.
- The general sentiment was “savings for users are good. Savings in general are good”.
Microsoft Copilots Claude: Members shared that Claude is now available in Microsoft 365 Copilot.
- This marks a significant stride for Microsoft, especially after a messy breakup, with one member noting “microsoft are on the rebound after a messy breakup”.
Gemini CLI Gets Readier: The gemini-cli has made strides, specifically the ReadManyFiles tool in version 0.6.0 gets frequent usage.
- One member says the tool “gets a lot of work from me”.
Discussion and Helper Roles Discussed: Members discussed how to get the Discussion and Helper roles, noting it’s a frequently asked question.
- The Discussion role was initially granted to those who joined before the crypto swarm, while the Helper role is hand-picked based on helpfulness.
CWM Model Causes Cacophany: A member shared a link to facebook/cwm, a model trained on python memory traces, eliciting mixed reactions.
- While some express hype, citing its small size compared to GPT-5 and novel training method, others remain skeptical.

Cursor Community ▷ #general (509 messages🔥🔥🔥):

MCP Server, Exa-ai, Context7, Generate Commit Message Language, Scroll to Bottom in the Chat Window

Exa-AI MCP gives @Web a run for its money: Users discussed using Exa-ai (exa.ai) for searches within MCP, highlighting that it provides $20 in credits upon signup and mentioning that it seems to do better than the @web tool.
- They provided instructions for setting it up in MCP.json, including obtaining an API key and adding configuration details.
MCP = Custom Tool API: Members clarified that MCP (Multi-context Programming) is essentially an API for agentic use to add external tools to Cursor.
- The conversation involved some misconceptions, with one user thinking of it as something that could create designs from images and webpages.
Rules no longer obey AI Rules in general: A user reported that generated commit messages are not obeying the set AI Rules and are being generated in an unwanted language.
- A member confirmed that this is a bug, mentioning that the commit message generation currently doesn’t obey your AI Rules in general and might be added in future updates.
Scroll to Bottom: A user requested that the chat window should automatically scroll to the bottom when switching between chat tabs, so they can see the most recent activity.
- A member pointed out that there’s already a feature, and if there’s something the user needs to click on, they get a notification (although a new window will appear).
Users Disappointed by “Dumb” GPT5-HIGH: Users expressed disappointment with the GPT5-HIGH model, observing that it has become less capable over time.
- One user humorously suggested telling the model to get off it’s ass and write the code when it only provides instructions instead of completing the task.

LM Studio ▷ #announcements (1 messages):

LM Studio 0.3.27 Release, Chat Search Functionality, Chat Sorting Options, Dry Run Load Resource Estimate

LM Studio Gets Find in Chat and Search All Chats: LM Studio 0.3.27 introduces new features including Find in Chat and Search All Chats, enhancing user experience.
- The release notes can be found at lmstudio.ai/blog/lmstudio-v0.3.27 for more details.
Chats Get Sorted in LM Studio: A new ••• menu in the chat sidebar allows users to sort chats by date updated, created, or conversation token length.
- This provides more flexible organization of chat history.
LM Studio Estimates Model Loading: The new command lms load --estimate-only <model> allows users to get a dry run load resource estimate.
- This helps in planning resource allocation before loading models.

LM Studio ▷ #general (175 messages🔥🔥):

Linux plugins, Ollama Fine Tuning, Training vs RAG, LM Studio token count, LM Studio update failing

LM Studio’s Linux plugins fall behind: A user noticed that the Linux version of LM Studio doesn’t offer the same range of plugins as the Windows version, being limited to RAG and a JS playground plugin.
Ollama Fine-Tuning face-off: A member suggested using Ollama to fine-tune a model, which another rebutted that simply injecting data isn’t true fine-tuning, but closer to RAG.
- The first member insisted that with a Python setup, one can inject data into the model to make new weights, creating an interactive model with custom tool usage, independent of prompts.
LM Studio update gets stuck: A user reported getting an error when trying to update LM Studio, with the message failed to uninstall old application files.
- Other users asked for the previous version number (0.3.26), and suggested enabling visibility of hidden folders in Windows to delete old files from AppData\Roaming\LM Studio, AppData\Local\lm-studio-updater, AppData\Local\Programs\LM Studio, and .cache\lm-studio.
Users wish to print and export LM Studio chats: A user asked about printing or exporting generated output in LM Studio, as copying and pasting doesn’t preserve the original format.
- While there is no print option, a user mentioned that chats are available as JSON output and can be converted to other formats using tools like Claude or Gemini.
Navigating Langchain with LM Studio: A user inquired about integrating LM Studio with Langchain for PDF vectorization.
- Another member suggested using the developer tab and the OpenAI-like API, linking to YouTube tutorials on the topic, and suggested llamaindex was easier to get working.

LM Studio ▷ #hardware-discussion (161 messages🔥🔥):

Budget GPUs for local models, Tesla K80s for AI, Intel Arc A770 for multi-GPU setups, Strix Halo vs Mac for AI, Nvidia 5090 pricing

Budget GPU Bonanza Explored: The go-to budget GPU is debated, with suggestions ranging from the 2060 12GB at $150 to the 3060 12GB at $200, 2080ti for $230, and possibly the 5060ti 16GB for $400 if buying new.
- A used 3090 was also recommended at around $600-$700, though some cautioned against used workstation cards, others warned that Tesla generation is not recommended for AI/LLM use anymore tbh, basically e-waste.
Arc A770’s AI Ambitions Analyzed: Discussion revolves around using multiple Intel Arc A770 16GB GPUs for AI, though it’s noted they lack native support and have spotty Vulkan support, with potential issues in multi-GPU setups and differing VRAM counts.
- While theoretically possible, speed might be limited by the single 16GB GPU due to llama.cpp limitations and challenges in finding motherboards with enough PCIe lanes.
Strix vs. Mac Melee for AI Tasks: The discussion weighs whether a Strix Halo box or Mac would be cheaper, faster, and consume less power for AI tasks, with some suggesting they’d be a better investment.
- However, it was noted that the Ryzen 9 AI Max + 395 iGPU (8060s), even with access to 96GB of system memory, has underwhelming compute compared to single GPUs, similar to the limitations of Macs with 128GB of unified memory, only with even higher prices.
5090 Speculation Sparks Scrutiny Over Pricing: The potential pricing of the Nvidia 5090 is discussed, with one member joking that instead of giving nvidia more money, i could… live, lol.
- Some argue that current pricing is unreasonable due to the duopoly, inflation, and TSMC manufacturing restrictions, while others note that Nvidia is essentially extorting people and express hope for a price/performance jump with the 3nm node.
Portability Pushes Preference for Macs: Members debate the hype around Macs for AI, clarifying that while Nvidia GPUs are faster, Macs offer an easier, portable way to load models that just work.
- One member shared that they get around 10-12 tok/s on their 128GB M3 Max Macbook and found a build consisting of a 7950x (water cooled), 192gb of 6800mhz, a 4090, and 3090 was cheaper than the mac offering.

OpenAI ▷ #annnouncements (2 messages):

GDPval, ChatGPT Pulse

GDPval Launches to Evaluate Real-World AI: OpenAI introduced GDPval, a new evaluation that measures AI on real-world, economically valuable tasks as detailed in their blog post.
ChatGPT Pulse Delivers Personalized Daily Updates: ChatGPT Pulse is a new experience where ChatGPT can proactively deliver personalized daily updates from your chats, feedback, and connected apps, rolling out to Pro users on mobile today, detailed in their blog post.

OpenAI ▷ #ai-discussions (188 messages🔥🔥):

GPT-5-Mini Common Sense, Censored GPT-OSS-20B, Suno V5 vs Napster, AI Rocket League bot, Google Gemini 2.5 Flash release

GPT-5-Mini fails Common Sense test: Members observed that GPT-5-Mini (High) seems to lack common sense and doesn’t get jokes, suggesting it’s not AGI level yet.
- One member mentioned that GPT-OSS-20B is possibly the most censored model ever after it noped out from a specific prompt.
Diving into Discord Devs’ Dream Discord Bot: A member proposed creating a Discord bot for Rocket League powered by AI to analyze player stats, identify strengths & weaknesses, and create personalized training plans, targeting the untapped francophone market with a premium subscription model.
- Other members doubted that an LLM could give good advice on such dynamic games, instead suggesting to analyze the xyz coords from the replay files, and use AI against the raw numbers.
Unlimited Context is Useless: Members debated that the trick is that unlimited context isn’t even good compared to limited context, calling it a buzz term for marketing.
- Others highlighted that a lack of limitations is just a lack of contour that absolves design and made an analogy to unlimited PTO being a trap companies will use to guilt people out of taking time off.
Suno v5 Soars, Napster Suffers: A member stated that Suno v5 good, Napster bad, highlighting the issues surrounding AI copyright infringement.
- Another one shared a reflection on early experiences with piracy, recalling using Kazaa, Morpheus, and Limewire.
Google Unleashes Gemini 2.5 Flash: Google released an improved Gemini 2.5 Flash and Flash Lite, continuing to bring their latest models.
- Members jokingly celebrated the release with one member calling Flash the saviour of the google-verse.

OpenAI ▷ #gpt-4-discussions (2 messages):

ChatGPT Default State, ChatGPT Mode-Locking, ChatGPT Reset Command, ChatGPT performance degradation

ChatGPT Defaults to “Agent” State: ChatGPT defaults to an “Agent” state (problem-solver, instructable worker) upon initialization, rather than a “Companion” state (co-creator, guide).
Pin a Prompt to Keep ChatGPT in “Companion” Mode: To maintain the “Companion” mode, a user suggests adding a pinned instruction or reusable starter prompt to lock the model in that mode.
- For example: “Stay in Companion mode unless I explicitly say switch to Agent. Companion = co-pilot, not order-taker.”
Quickly Reset ChatGPT to “Companion” Mode: If the model drifts back to the “Agent” mode, the user suggests a simple reset command: “Go back to companion mode.”
ChatGPT User Reports Performance Degradation: A user reported that their GPT-5 instance is experiencing performance degradation and now relies on “thinking mini”, making it unsuitable for reflective academic writing and emotional contexts.
- The user mentioned seeing similar reports on Reddit, with simple words like “cut” triggering the reduced model, and is seeking a solution.

OpenAI ▷ #prompt-engineering (29 messages🔥):

Chain of Thought Prompting, Model Translation Performance, Essay Generation from a Surfer's POV, Interactive Prompting Infographic, Model Self-Evaluation Techniques

Chain-of-Thought Prompting Confusion Clarified: Members discussed how requesting excessive Chain-of-Thought (CoT) prompting can statistically reduce model performance, especially on current thinking models.
- Instead of ambiguous instructions, a member suggested prefacing responses with a structured format including ultimate desired outcome, strategic consideration, tactical goal, relevant limitations, and next step.
Translation Troubles Tackled: A member suggested that when a user requests something, you should first identify the request, and then provide the answer, rather than use confusing instructions.
- The negative example of using the bullet point instruction - {do a 3 short bullet point as a chain of thought} was shown, as this caused problems in translation accuracy and relevance.
Surfer’s Essay Totally Tubular or Tragically Tame?: An example compared two prompts for generating an essay about apples from a surfer’s point of view, highlighting how a simpler prompt yielded a more embodied response (example link).
- The simpler prompt Discord demo, we need a quality essay about apples written from the point of view of a surfer was preferred over a more complex one that included bullet-point instructions.
Interactive Infographic for CoT Prompting: A member shared an interactive infographic built as a single-file React component (Tailwind + shadcn/ui + Recharts + lucide) for Chain-of-Thought prompting.
- The infographic includes visibility toggles, a task selector, a thinking-time slider, and copy-ready prompt cards (file link).
Self-Evaluation: A Model’s Metacognitive Moment: A member suggested using a prompt after providing information to have the model self-review, evaluate, and grade its knowledge on a topic.
- This involves the model creating a list of evaluation criteria and expanding to related subjects, useful for brainstorming and idea generation (but not for normal evaluations).

OpenAI ▷ #api-discussions (29 messages🔥):

Chain-of-Thought Prompting, Quality Translation, Model Performance, React component (Tailwind + shadcn/ui + Recharts + lucide)

Chain-of-Thought Overkill?: A member suggests that asking for more chain of thought on top of that is overkill and statistically reduces the likelihood of good model performance on current ‘thinking’ models.
- They propose using a specific prefix structure (My ultimate desired outcome is:…) to guide the model instead of ambiguous instructions.
Crafting Quality Translations: Discussion revolves around techniques for achieving quality translations with models.
- It is suggested to prime the model with context about the target audience, for example, We’re translating this for a woman who grew up in Yugoslavia in the 1940s, she has a 3rd grade education, so we need to phrase this for her.
Experimenting with Model Instructions: One member shares an experience where detailed instructions, such as do a 3 short bullet point as a chain of thought, can confuse the model.
- Another suggests directing the model away from unnecessary chain of thought when the primary goal is a quality output, like a well-written essay.
Interactive Infographic for CoT Prompting: An interactive page has been developed in the canvas for Chain-of-Thought (CoT) prompting, featuring visibility toggles, a task selector, a thinking-time slider, and copy-ready prompt cards.
- The component is built using React, Tailwind, shadcn/ui, Recharts, and lucide, and includes features like dynamic recommended patterns and export options.

HuggingFace ▷ #general (135 messages🔥🔥):

Duolingo deletion, LinkedIn posting strategies, HF Blog post on AI, HF Discuss forum issues, LAION-2B-en dataset reading

Duolingo deemed Dodo, Deletion Debuts: One member deleted Duolingo, citing annoyance and inefficiency compared to immersing themselves in a local environment and leveraging AI for learning.
- Another member agreed, stating they’d torch the bird alive, criticizing the addiction to streaks over fundamental learning premises.
Unhinged LinkedIn Lunacy Lures Likes: One member shared a strategy of posting unhinged shit on LinkedIn to gain engagement, while another grinds Rocket League to brag about their rank.
- They then joked about writing a post called what my plat 3 rank rocket league friend taught me about business.
AI Ethics Explorations Expressed, HF Blog Beckons: A member sought a platform to discuss AI’s potential detriment under current alignment protocols, despite its collaborative benefits.
- Another member suggested sharing the work on the HF blog or the ethics channel.
Qwen Quagmire: Questionable Quantity of Questionable Quality: Users reported a flood of seemingly spam Qwen2.5 models on Hugging Face, all following the format Qwen2.5-0.5B-Instruct-randomword1-randomword2-randomword3.
- It was suggested these uploads are linked to Gensyn and could be an SEO technique or a way to impress stakeholders for funding.
LAION-2B-en Learning Logistics Lamented: A member inquired about an efficient way to read the LAION-2B-en-research split at a large scale, encountering rate limits while training a large scale CLIP model.
- Suggested solutions included using WebDataset and creating a custom streaming system to download and uncompress shards incrementally, as detailed in the laion_2b.md

HuggingFace ▷ #today-im-learning (1 messages):

GPU, Monitor, Drivers, Windows, Linux

GPU Blackout Blues: A member is experiencing a frustrating issue where their monitor goes black whenever the GPU is activated, affecting both Windows and Linux systems.
- They have attempted to correct the drivers numerous times (82392), and are currently forced to run the monitor off the motherboard.
Driver Troubleshooter’s Lament: The user expresses significant frustration with GPU driver issues causing their monitor to go black on both Windows and Linux systems.
- Despite numerous attempts to correct the drivers, they are forced to run the monitor off their motherboard, indicating a persistent GPU-related problem.

HuggingFace ▷ #cool-finds (2 messages):

UIUC Finance Project, Trade Bench Insights

UIUC students launch Trade Bench: Students from UIUC have launched a new finance project called Trade Bench.
- A member who shared the link admitted to not understanding much of it, and found it to be drab.
Community Asked to Provide Insights on Trade Bench: The original poster requested insights from the community on the Trade Bench project.
- They hoped that someone with finance expertise would check it out and explain it.

HuggingFace ▷ #i-made-this (1 messages):

Vendor lock-in in AI Chatbots, AI Chatbot Supporting Multiple Providers, Marketing Tools for Small Studios and Solo Devs

Chatbot Aims to Tackle Vendor Lock-In: A developer is building a chatbot to combat vendor lock-in and free tier limits experienced with platforms like ChatGPT, Anthropic, and Perplexity.
- The chatbot will support major AI providers like OpenAI, Anthropic, Groq, and DeepSeek, offering a free ad-supported tier and a paid ad-free tier with full access to all models and features.
AI Chatbot integrates Marketing Tools: The developer is adding marketing tools to their AI chatbot to aid small studios and solo developers, with features including post and visual creation, content scheduling, and campaign management.
- Feedback is requested via short survey to guide the project’s direction, with plans for more tools in the future.

HuggingFace ▷ #reading-group (2 messages):

Diffusion Models, Generative AI, ELBO-based models, VAEs, Variational Diffusion Models (VDMs)

Diffusion Model Intro Paper Discussion Announced: A member announced a reading and discussion of the paper Understanding Diffusion Models: A Unified Perspective by Calvin Luo (https://arxiv.org/abs/2208.11970) to occur on Saturday at 12pm ET.
- The paper provides an overview of the evolution and unification of generative diffusion models, including ELBO-based models, VAEs, Variational Diffusion Models (VDMs), and Score-Based Generative Models (SGMs).
Diffusion Model Paper Reading Group forming: A member created a beginner-friendly Diffusion Model Paper Reading Group, stating that no coding or ML background is needed, and linked to luma.com/1gif2ym1 for those who want to build a solid foundation.
- The group will be hosted online.

HuggingFace ▷ #core-announcements (1 messages):

Context-Parallelism, Diffusion Inference, Distributed Attention, Ring & Ulysses

Context-Parallelism Speeds Up Diffusion Inference: Native support for context-parallelism is being shipped to help make diffusion inference faster on multiple GPUs.
- The CP API is made to work with two flavors of distributed attention: Ring & Ulysses as noted in this Tweet.
Distributed Attention Flavors Debut: The new API supports two flavors of distributed attention: Ring and Ulysses, designed to enhance context-parallelism.
- These methods aim to optimize how attention mechanisms are distributed across multiple GPUs, facilitating faster and more efficient diffusion inference.

HuggingFace ▷ #computer-vision (2 messages):

Topological Data Analysis, Persistent Images, Loss Functions

Seeking Loss Function Guidance for Topological Data Analysis: A member inquired about loss functions for topological data analysis (TDA) and persistent images, seeking guidance due to unfamiliarity with computer vision.
- They expressed interest in advice, but no specific suggestions were offered in the channel.
Topological Data Analysis Guidance: A member is looking for guidance on Topological Data Analysis related to computer vision.
- The specific problem involves figuring out good loss functions.

HuggingFace ▷ #smol-course (30 messages🔥):

Certificate issues and quiz completion, License and usage of the fine-tuning course, Smoltalk2 dataset size warning, HF Jobs permissions and authentication, Colab compatibility for the course

Quiz Completion Unlocks Certificate: A user inquired about not receiving a certificate or pull request acceptance after submitting an assignment, and was informed that the Unit 1 Quiz must be completed to get the certificate.
- The user confirmed they passed the quiz with 100% after taking it.
Apache License for Fine-Tuning Course: A user asked if the fine-tuning course is under the Apache license and if it can be implemented for a high school club as a learning group.
- The user also inquired about the required Python knowledge and whether the course will become inaccessible after 5 weeks.
Smoltalk2 Dataset Size Warning: One user cautioned that the smoltalk2 dataset is quite large (around 90GB) and suggested being careful when downloading it locally unless there is sufficient space.
- The user also noted that Units 2-3 are available.
Docs on HF Jobs: One user was having issues with hf jobs uv run and write permissions to the hub for their model, asking for help.
- Another user shared the HF Jobs documentation, pointing out that the trainer needs to be authenticated and suggesting the use of generic scripts or copying the token handling.
First Certificate: Quiz and Leaderboard: A user inquired about the requirements for the first certificate, specifically whether both the quiz and leaderboard submission are necessary, and if submission to the leaderboard is possible without using HF Jobs.
- They also asked about whether it’s possible to get it until the end of the course or if the deadline is sooner.

HuggingFace ▷ #agents-course (1 messages):

0xobito404: Hello from Thailand, starting the course rn

Moonshot AI (Kimi K-2) ▷ #announcements (1 messages):

OK Computer, Agent Mode, Multimedia Generation, Team-Level Polish, One-Link Deploy

Kimi Launches OK Computer Agent Mode!: Moonshot AI launched OK Computer, a new agent mode designed to ship polished sites and apps in one go.
- Key features include personalized outputs, multimedia generation (text + audio + image), team-level polish, and one-link deployment.
Outputs Personalized in Your Tone: OK Computer can generate personalized outputs such as slides, web/data apps, and mobile UIs in your own tone.
- See the official X post for more details.
Multimedia Integrated in One Pass: The new Agent Mode supports multimedia generation, integrating text, audio, and image generation in a single pass.
- The goal is to deliver output that feels PM × Dev × Design out of the box.
Deploy With Only One Link: Users can deploy and share their creations instantly with a single link.
- To try the new Agent Mode, visit the Kimi Official Website.

Moonshot AI (Kimi K-2) ▷ #general-chat (147 messages🔥🔥):

Kimi Mini version, Moonshot team goals, Qwen model distillation, Kimi Computer agent, OpenAI compute

Moonshot might skip Mini-Kimi Models: One member doubted that Moonshot would release a smaller version of Kimi, suggesting that a smaller Qwen model distilled on K2 is a better bet.
- Another member pointed out that Deepseek made Qwen distills because Qwen didn’t have (good) reasoning until Qwen 2.5.
OKComputer Draws Capitalists: Several members joked that the new Kimi Computer agent, particularly with its initial prompt “Build a SaaS for content creators, aiming for $1M ARR,”, is designed to attract capitalists.
- One member called it “another website generator with som weirdly scoped features”.
Kimi OKComputer’s Missing Download Button: Members reported initial issues with the OK Computer feature, including a missing download all button and a corrupted zip file.
- One member noted that “entering chat makes the OKC button disappear”, but the button was later found in the right corner.
Computer Use Has Higher Quota in Paid Subscriptions: The amount of OK Computer usage you get depends on whether or not you subscribe to moderato/vivace plans, giving you more quota, which are 20 OKC + 20 Researcher.
- The images generated with the image generation are neat because it’s using a non-moonshot tool for generation but the prompt must be great quality
Kimi better Plans better than Qwen: Members discussed using Kimi to make plans for Qwen or DeepSeek to follow, noting that “Kimi always makes better plans” and it can cover a wider range of request.
- It was also pointed out that Qwen3-max constantly hallucinates and doesn’t come close to kimi.

GPU MODE ▷ #general (15 messages🔥):

Hopper TMA, Modal carrying code agent rollouts, MI300 support on Modal, Llama3.3 70B Prefill vs Decode

Hopper TMA Kernel Quest Begins: A member is seeking a minimal matmul kernel implemented in raw CUDA that utilizes Hopper’s TMA (Tensor Memory Accelerator), without relying on Cutlass or Triton.
- Another member shared a CUDA for Fun blogpost as a potential resource.
Modal Powers Remote Execution for Code Agents: A member noted that Modal is single-handedly enabling remote execution for code agent rollouts, after release of the new CWM paper from FAIR.
- Another member praised Modal for its fantastic distribution of cold/warm/hot start times relative to its cost, though its lack of MI300 support was noted.
Llama3.3 70B’s Prefill Slower Than Decode?: A member is comparing benchmark performances of Llama3.3 70B model using Nvidia’s benchmarks.
- They noted that prefill-heavy workloads have lower throughput than decode-heavy workloads, despite expecting prefill to better exploit GPU compute capacity; seeking to understand why decode-heavy has higher throughput.

GPU MODE ▷ #triton (1 messages):

Triton pyproject.toml, uv add pip command

Triton’s Missing [project] Table Causes Hiccups: A user questioned why the [project] table is missing from pyproject.toml in the Triton project, running into an error when trying to use the uv add pip command.
- The tooling or project structure seems to require this table for dependency management, causing the error.
uv add pip command fails without [project] table: The command uv add pip threw an error because the [project] table could not be found in Triton’s pyproject.toml file.
- This table is essential for managing dependencies within the project, and its absence disrupts the process.

GPU MODE ▷ #cuda (13 messages🔥):

NCU profiling for SMEM bank conflicts, CUDA headers not being automatically included, WMMA kernel throwing unspecified launch failure, TMA minimum matmul kernel, Learning CUDA with limited hardware

NCU Unveils SMEM Conflict Detection Secrets: Members discussed using NCU profiling to verify SMEM bank conflicts in kernels, with one member expressing surprise that it worked, as they thought nsight compute was gaslighting them.
- The conversation included a question about the meaning of the numbers wrapped in curly brackets in the NCU profile output.
CUDA Headers Hide-and-Seek: A developer reported a problem where CUDA headers weren’t being automatically included, causing functions like cudaGraphicsGLRegisterImage and tex2d to be undefined when using Visual Studio 2022 and the latest CUDA toolkit.
- Including cuda_gl_interop.h was mentioned as a workaround for the former issue.
WMMA Kernel Launch Flounders: A user encountered an unspecified launch failure with a WMMA kernel and sought advice, sharing the kernel code.
TMA Matmul Minima Malaise: A member was implementing a minimum matmul kernel using TMA and facing issues.
- It was suggested that the unspecified launch failure might be due to exceeding the maximum registers per SM, and using cudaFuncSetAttribute to increase SMEM usage.
CUDA Curriculum Quandaries for Cash-Strapped Coders: A user asked for the best way to learn CUDA quickly with limited hardware and free software, mentioning they have an Arduino, STM32, and a Jetson Nano.
- A user asked for help on the best place to learn fast.

GPU MODE ▷ #torch (16 messages🔥):

torchrun API, HF transformers static cache, CUDA streams in HF transformers, GraphMend for PyTorch 2

Torchrun Troubles Trigger Package Predicament: A user encountered issues with the torchrun API, finding that torchrun --help produced output different from the official documentation.
- The issue was resolved by realizing that both torch and torchrun were in pyproject.toml, and that torchrun is a separate package (torchrun on pypi).
Compile Conflicts Complicate Cuda Cache Conundrums: A user faced an issue using torch.compile with HF transformers static cache, encountering a cuda streams error when calling decode_one_token function.
- The error was traced to a bug in transformers/cache_utils.py, where cache.offloading is a CUDA device.
GraphMend Grasps Graph-Break Glitches in PyTorch: A member shared the paper GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 with links to the project page and GitHub repository.
- The paper introduces GraphMend, a compiler that eliminates FX graph breaks in PyTorch 2 programs by using code transformations to remove graph breaks due to dynamic control flow and Python I/O functions.

GPU MODE ▷ #cool-links (12 messages🔥):

CUDA, Triton, PTX memory consistency, Formal languages, GPU programming

GPUs get Formally Analyzed for Consistency: A paper on “A Formal Analysis of the NVIDIA PTX Memory Consistency Model” discusses proving that languages like CUDA and Triton can target PTX with memory consistency, despite PTX allowing for data races.
- The member, however, found it leaning too heavily on formal languages and math to be immediately useful.
Compound Memory Models Blend Heterogeneous Consistency: A PLDI 2023 paper introduces Compound Memory Models for heterogeneous machines, where devices with distinct consistency models are fused.
- The compound model retains compiler mappings, allowing threads to adhere to the memory ordering rules of their original device’s memory model and there’s a 15 min talk that is fairly approachable.
Unified Analysis of GPU Consistency Bugs Discovered: A paper titled “Towards Unified Analysis of GPU Consistency” introduces Dat3M, a memory model aware verification tool, and discovered two bugs in the original PTX and Vulkan consistency models.
- The member quoted that interpreting them still requires a level of expertise that escapes most developers, and the current tool support is insufficient.
Missing Fences Found in PTX Automated: A member highlighted the automated identification of missing fences in PTX shown in figure 12 of a referenced paper.
- They then suggested that it would be cool to see such checks at the NVVM IR layer, instead of PTX.
rMEM Multicore CPU Tool Found Useful: A member mentioned the rMEM tool (and its github repo) as useful for multicore CPUs.
- Another member responded that it’s very expensive (computationally) to run.

GPU MODE ▷ #jobs (2 messages):

zml github

GitHub Link Drop!: A member shared a link to the zml GitHub repository.
- Another member acknowledged familiarity with it, expecting discussion on a different topic.
Snektron Anticipates Different Discussion: Following the sharing of the zml GitHub repository, Snektron indicated they expected a different subject.
- This suggests the link may have been shared out of context or that Snektron anticipated a more specific discussion related to the repository.

GPU MODE ▷ #beginner (8 messages🔥):

Inter-warp and intra-warp ops in NVIDIA GPUs, Independent thread scheduling, Multi-CTA matmul, GPGPU architecture, PMPP reading group

NVIDIA’s Warp Scheduling Quirk Query: A member inquired about inter-warp and intra-warp operation behaviors in NVIDIA GPUs, given independent thread scheduling.
- The confusion stems from scenarios like multi-CTA matmuls where SMs access each other’s smem without guaranteed full warp execution due to thread scheduling.
GPGPU Architecture Interest Sparked: A member expressed interest in GPGPU architecture, particularly its use in PINNs.
- They reminisced that GPU Mode was initially set up as a PMPP reading group.
GPU Mode’s Origins: A member inquired whether GPU Mode was initially set up as a PMPP reading group.
- Another member confirmed that this was the original goal, though the group got a bit distracted over time.

GPU MODE ▷ #pmpp-book (1 messages):

PTX, Triton, NCCL, NCU profiling, PTX memory fencing

PTX, Triton and NCCL Exploration Commences: A member began exploring PTX, Triton, and NCCL, gaining insights into the requirements for practical application in a job.
- They feel like they are finally starting to see what it takes to actually do this in a job.
NCU Profiling & PTX Dominate Industry Blogs: According to a member, industry blogs emphasize NCU profiling skills, PTX memory fencing, and warp MMA topics.
- He contrasted that to the book he was reading, and he guessed the book gives an intern level knowledge.

GPU MODE ▷ #triton-puzzles (1 messages):

puzzle difficulty

Puzzlers Probe Past Puzzle Performance: A member inquired about the time others spent on previous puzzles, seeking insight into the magnitude of the challenge.
Difficulty Discussions: The member aimed to gauge the difficulty level by comparing experiences.

GPU MODE ▷ #rocm (3 messages):

pytorch rocm, NPU, iGPU

PyTorch ROCm woes on Framework Desktop: A member asked if another member managed to run pytorch rocm successfully on the Framework Desktop, noting crashes with torch.randn(1).cuda() despite having a good setup and using Arch.
- Another member responded that they also had issues on Ubuntu, even after following all tutorials exactly, suspecting that the iGPU might be special or weird.
Focus shifts to NPU Hacking: One member stated they have been focused pretty much exclusively on the NPU.
- Another member followed up with a question of How do you hack on npu stuff?

GPU MODE ▷ #self-promotion (3 messages):

LLM Profit Margins, GPU Stand Design

Embeddings are so Cheap, a Kernel Dive: A member shared a Substack article on profiling and investigating kernels to understand the profit margins of serving LLMs.
Heavy-Duty GPU Stand Showcased: A member designed a heavy-duty GPU stand for a collection of old GPUs, including dual-slot models, noting it is more robust than existing designs on Thingiverse.
- They indicated they might share the design online later if there’s interest.

GPU MODE ▷ #🍿 (6 messages):

Code generation, Two-stage approach, CWM paper citation

Code Gen loses Natural Language: A member believes that code generation utilizes raw syntax, without constraints, which means you lose the natural language component of it.
- They suggested that humans typically don’t code by focusing on the underlying grammar expected by the compiler, and you could train a model to do this.
Code Gen Two Stage Approach?: Another member suggests a two-stage approach: pseudo-code generation, and then formal grammar translation.
- They didn’t think of the impact on model performance with the added constraint and hence reduction of the degrees of freedom for code generation.
GPU MODE cited in CWM Paper!: A member announced that we got cited in the CWM paper with a screenshot of the citation (link to discord attachment).
- No additional context or information was provided about the specifics of the citation.

GPU MODE ▷ #thunderkittens (17 messages🔥):

H100 matmul kernel runtime error, nvshmem usage rationale, RDMA implementation, PyTorch support for rocm symmetric memory

H100 Matmul Kernel Suffers Runtime Error: A user reported a runtime error with the H100 matmul kernel, specifically an “Error in tile TMA descriptor creation: unspecified launch failure” when running on Ubuntu 24.04, CUDA 12.8, PyTorch 2.7.0a0+nv25.03, and TensorRT 10.9; full logs are available here.
Debate Erupts Over nvshmem Omission: Discussion arose regarding the absence of nvshmem in a blog post, with the author clarifying that the post focuses on intra-node communication, while inter-node communication will be covered in a forthcoming paper.
- An NVIDIA colleague pointed out that they provide support for multinode nvlink and will soon add caching, making symmetric tensors easier to use compared to TKParallelTensor.
RDMA Implementations Stir Controversy: A member suggested that the rationale for implementing custom RDMA might stem from the built-in overheads of SHMEM libraries.
- They cited the DeepEP library (github.com/deepseek-ai/DeepEP) which modifies nvshmem internals for performance gains, and the trend among major players to develop their own GPU Direct Async implementations.
ROCm Symmetric Memory Support in PyTorch: A user inquired about plans to add built-in PyTorch support for ROCm symmetric memory.
- Another user quipped that they typically wait for someone to complain before prioritizing it.

GPU MODE ▷ #submissions (24 messages🔥):

MI300x8, amd-all2all leaderboard, amd-gemm-rs leaderboard

MI300x8 scores improve on amd-all2all: A user achieved a personal best of 25.2 ms on MI300x8 in the amd-all2all leaderboard with submission ID 43505.
amd-all2all leaderboard sees blazing fast times: A user reached 1510 µs on the amd-all2all leaderboard with submission ID 43934 using MI300x8.
amd-gemm-rs leaderboard shows improvement: A user submitted a personal best of 741 µs on MI300x8 to the amd-gemm-rs leaderboard, with submission ID 44060 then subsequently improved to 598 µs with submission ID 44069.

GPU MODE ▷ #hardware (4 messages):

Voltage Park H100 donation, Nebius Exclusive Sponsorship, Future Hackathon Event

Voltage Park proposes H100 Donation: Voltage Park offered to donate H100s for an upcoming hackathon, expressing interest in supporting the event.
- A member thanked Voltage Park but explained that Nebius has an exclusive sponsorship deal for this hackathon.
Nebius Secures Exclusive Sponsorship: Due to a deal with Nebius, they are the exclusive sponsors for the current hackathon.
- Despite this, a member expressed interest in discussing potential collaborations for future events with Voltage Park and proposed a private voice chat to explore options.

GPU MODE ▷ #factorio-learning-env (2 messages):

FLE Eval System Prompt, Agent0 System Prompt, PR Submission

FLE System Prompt Incoming: A member delivered a system prompt for FLE eval via attached file: agent0_system_prompt.txt.
- This prompt is intended for use with the Agent0 system.
Pending PR Submission: A member mentioned they would submit their PR the following day.
- They noted it was getting late, indicating the submission was near completion.

GPU MODE ▷ #amd-competition (11 messages🔥):

gemm-rs optimizations, atomic operations, GPU rentals for debugging

Gemm-rs Optimizations Prove Elusive: A member tested three basic variations of gemm-rs optimizations where bias is None, but they exhibited similar runtimes to the default submission, despite expectations of errors for configurations with bias.
- The poster attached a bias.txt file and noted that the PR is merged and almost ready for release with an example.
Atomic Add API Questioned: A member inquired about the need for an atomic load/store API, similar to HIP’s __hip_atomic_load/store, or if the discussion was centered on atomic adds.
- Another member clarified they were referring to atomic adds and are seeking smarter ways to handle heap pointers between ranks for less contention, while being advised to avoid atomics as much as possible.
GPU Rentals Recommended for Debugging: A member inquired about recommendations for providers to rent GPUs for debugging purposes, without further context in the provided messages.
- The suggestion was made in the context of troubleshooting gemm-rs problems and potentially optimizing atomic operations.

GPU MODE ▷ #cutlass (4 messages):

TmemAllocator vs cute.arch.alloc_tmem, TMEM load/stores in cutedsl, SMEM -> TMEM copy, TMEM -> SMEM copy, Blackwell dense blockscaled GEMM example

TmemAllocator Simplifies cute.arch.alloc_tmem: TmemAllocator offers utilities built around cute.arch.alloc_tmem, which is lower level, helping reduce boilerplate code.
- The discussion stemmed from a user’s inquiry about the difference between creating an instance of TmemAllocator and using allocate from there versus using cute.arch.alloc_tmem directly.
SMEM to TMEM Tiled Copy: To copy SMEM to TMEM, use cutlass.cute.nvgpu.tcgen05.make_s2t_copy(copy_atom, tmem_tensor) followed by cute.copy(), demonstrated in the Blackwell dense blockscaled GEMM example.
TMEM to SMEM Copy Operation: For copying TMEM to SMEM, use tcgen05.make_tmem_copy(...).
- A helper function is available to determine a performant copy operation, as detailed in blackwell_helpers.py.

GPU MODE ▷ #mojo (2 messages):

Metal GPU target, custom bitcode writer, mojo assembly

Metal GPU Target Excites Mojo Community: A member expressed excitement about the recent Metal GPU target in Mojo and inquired about the availability of code for the custom bitcode writer.
- They noted the interest in targeting certain DSLs at Metal GPUs and wondered if any of the existing work could be reused.
Emit Mojo Assembly via Command Flag: A member advised that you can emit mojo assembly via the mojo -emit flag if interested.
- This provides a means to inspect and potentially reuse the generated assembly code.

GPU MODE ▷ #low-bit-training (1 messages):

Modern QAT Papers, FP8 Training, MXFP4/NVFP4

Modern QAT Paper Search Initiated: A member inquired about papers on modern QAT, with BitNet being the main one that comes to mind, but noted it was linears only.
- The member is considering FP8 training with QAT to do MXFP4/NVFP4.
FP8 and MXFP4/NVFP4 Explored for Training: The user is looking into the possibility of using FP8 training with QAT to implement MXFP4/NVFP4 quantization.
- The original message specified that previous approaches, such as BitNet, were limited to linear layers, suggesting a desire for broader applicability.

Yannick Kilcher ▷ #general (105 messages🔥🔥):

Sinusoidal Positional Embeddings, Sine vs Cosine in Positional Encodings, Distillation performance estimates

Sinusoidal Embeddings: Sine vs Cosine Debate: Members debated the necessity of using both sine and cosine in sinusoidal positional embeddings, with one suggesting that sine alone might suffice, sparking discussion around Fourier transforms and linear transformations. A towardsdatascience blogpost was linked to provide some context to the discussion.
- One member showed an experiment testing the max error on points which showed the max error was around 6e-12
Positional Encoding with Sine: A Lone Wolf?: Members discussed that using only sine for positional encoding might be viable if the position range is non-negative, as it can be linearly transformed into sine + cosine, but this becomes problematic with negative values, potentially requiring extra layers for a better representation. GPT-5 coding experiments demonstrated linear regression can approximate sine + cosine embeddings well with sine alone in the interval [0, a].
- One member shared an image showing, if i create PE using only sin and then ran cosine similarity between two pairsand the score was differentbut if i do the same using pair of sin and cos then no matter the pos if the distance between them is same then the score is same.
Paper Reading Recommendation: A member recommended a paper for reading: Synthesizing Programs for Images and Videos with Nearest Neighbor Examples.
Guesstimating Distillation Improvement: A member asked about estimating the expected improvement from distillation before investing effort, seeking ways to predict the performance gains.

Yannick Kilcher ▷ #paper-discussion (5 messages):

Applied math, arxiv paper, preprint paper

Arxiv Paper gets Peek: A member shared an Arxiv Paper for discussion.
- To improve the layout, they also shared a link to the same preprint.
CS Background Surfaces: A member asked another if they had an applied math background.
- The other member responded that their background was in Computer Science, for both their bachelor’s and master’s degrees.

Yannick Kilcher ▷ #ml-news (6 messages):

SWE-bench verified, AlphaEvolve, Sakana AI, Yann LeCun, RL TTS

SWE-bench Verified Denounced: A user linked to a tweet where Alexandr Wang says that people still using SWE-bench verified is a good indicator of brain damage.
AlphaEvolve: Sample Efficient?: In the same thread, someone mentioned that AlphaEvolve is much more sample efficient than other models.
- A user linked to Sakana AI Labs in response.
Yann LeCun Tweeted: A user linked to Yann LeCun’s tweet.
B200 Cloud Compute Spotted: A member noted that B200s are available for $0.94 USD on Prime Intellect.
RL TTS Paper Discussed: A user mentioned an interesting paper and its potential efficiency gains from using a mid training technique involving a bootstrapping RL TTS.
- They noted the biggest gains were for the trace tracking benchmark.

Latent Space ▷ #ai-general-chat (49 messages🔥):

Chrome DevTools MCP, Cursor CPU Usage, Meta Code World Model, Windsurf tab completion, ChatGPT Pulse

Chrome DevTools MCP goes Public: Google announced the public preview of Chrome DevTools MCP, a new server that lets AI coding agents control and inspect a live Chrome browser through CDP/Puppeteer via this tweet.
Cursor’s CPU Usage Concerns Users: Members reported insane CPU usage from Cursor, but were unsure if it was a VSCode/extension issue and another member confirmed they were not alone, attaching a screenshot showing high CPU usage.
Meta unveils Code World Model: Meta announced their Code World Model in this tweet.
Windsurf prioritizes Tab Completion: Windsurf is making tab completion a priority, using a mix of context engineering work plus custom model training, as part of a major evaluation before acquiring conew senpaipoast, Karpathy also comments on windsurf.
OpenAI launches ChatGPT Pulse, LMAO: OpenAI launched ChatGPT Pulse, causing members to comment that oai cloned huxe and linked to the launch announcement.

Latent Space ▷ #genmedia-creative-ai (1 messages):

swyxio: https://x.com/1littlecoder/status/1970624850386661766

Eleuther ▷ #general (11 messages🔥):

AI Psychology Project, Positional Embeddings, AI Future Predictions, Subtle Psychological Manipulation

AI Psychology Project Gets Musical Intro: A member presented a new project at the intersection of AI and psychology, sharing a musical introduction based on a recently written paper.
- Another member responded that this research could develop into a framework for interpreting how prompt language affects model behavior.
Prompt Language and Personality Traits Linked?: A member suggested seeding neutral prompts with established linguistic cues of personality traits and interpreting consequences for model performance, citing work on linking patterns in language use to personality traits.
- They noted that this could help assess to what extent personality shaping can be “subtle” while still meaningfully impacting model behavior, further informing prompt engineering practices.
Positional Embeddings Decoded: A member asked whether positional embeddings in transformers use a matrix of sine and cosine pairs instead of a single pair because wave functions are periodic.
- Another member confirmed this intuition by explaining how hour number is enough in smaller contexts, but in larger context, one also needs the day, month, or year number to avoid ambiguity.
AI Future will Sediment: A member inquired about thoughts on how the future would look like with respect to AI in the next 5 years.
- They believed that open-source models, ai agents, small language models, ai safety, and multi-modal will begin to sediment in the coming years, which prompted others to ask to define sediment (mature).

Eleuther ▷ #research (20 messages🔥):

CFG on Style Transfer, Knowledge Graph Completion, Evolutionary Algorithms for Kids, Super-Bias: Mask-Aware Nonlinear Combiner, LoRAs and Super Bias Combiner

CFG on Style Transfer Research Needed!: A member inquired about research on the effect of Context-Free Grammars (CFG) on style transfer, noting anecdotal evidence suggesting models lacking CFG perform worse.
- Another member argued that style transfer and closing knowledge gaps are distinct behaviors, so excelling at one doesn’t guarantee success with the other; this was refuted by another member who linked a relevant Twitter thread.
Knowledge Graph Completion as LLM Solution?: A member suggested formulating a solution from a knowledge graph completion perspective, where style transfer becomes a type of “shallow” inference.
- They proposed that complexity could be measured by the relational depth from established information, but bridging this to practical LLMs is challenging.
GPT-5 Can Guide Evolutionary Algo Research: A member sought recommendations for an “evolutionary algo for kids” paper/blog, or a survey paper on common patterns/techniques.
- Another member suggested learning the basics from GPT-5, and focusing on agenetic/LLM parts rather than classical papers, recommending the AlphaEvolve paper as a starting point.
Super-Bias: The Mask-Aware Nonlinear Combiner: A member introduced Super-Bias, a mask-aware nonlinear combiner for ensemble learning, allowing expert addition/removal with combiner-only retraining, achieving similar accuracy to full retrains with significantly reduced cost.
- The method trains a small MLP on expert outputs plus binary masks (with dropout) and can potentially hit the same (or better) performance as “proper” full fine-tuning or hard merges.
LoRAs Combined Via Super Bias - Genius!: A member suggests treating different LoRAs (or LoRA+base combos) as ‘experts,’ and using Super Bias as the combiner.
- This approach would allow swapping LoRAs in/out without retraining the base model and retraining just the combiner in seconds to adjust for new LoRAs.

Eleuther ▷ #lm-thunderdome (7 messages):

GSM8k evaluation results, flexible vs strict matching, merged models issue, reproducibility of errors

GSM8k Evaluation: Flexible Filter Fails: A member shared GSM8k evaluation results, showing that the flexible-extract filter performed worse than the strict-match filter, with exact_match scores of 0.3594 and 0.5742 respectively.
- Another member confirmed facing this issue, particularly with merged models.
Debugging Merged Model Mishaps: A member requested samples to debug an issue, and the original reporter lamented not saving the problematic examples to reproduce the error.
- They committed to figuring out the situation that caused the errors to reproduce it.

Eleuther ▷ #multimodal-general (1 messages):

VLM, Mech Interp, Sparse Autoencoder (SAE)

VLM and Mech Interp Unite!: A member proposed understanding Vision Language Models (VLMs) and Mechanistic Interpretability (Mech Interp) separately before integrating Mech Interp components into VLMs.
- The suggestion involves applying a Sparse Autoencoder (SAE) at each layer of the VLM to analyze layer attention.
SAE Application in VLMs: The idea is to apply Sparse Autoencoders (SAEs) at each layer of the Vision Language Model to decipher what each layer is focusing on.
- This approach starts simple and then increases complexity.

Nous Research AI ▷ #general (26 messages🔥):

Meta code generation with world models, Training AI with arXiv data, Granite 4 model, RMS_NORM and NORM implementations

Meta Crafts Code-Writing CWM: Meta introduced CWM, an open-weights LLM for research on code generation with world models.
- A member mentioned having a similar idea involving training on python interpreter traces.
Nous taps arXiv for training data: It was discussed whether Nous could train its AI using data from arXiv, mentioning they have an API to download any amount of papers.
- Teknium confirmed that it’s permissible, suggesting that it could be a viable option.
Granite 4 model brewing full-attention: There is a possibility of a full attention Granite 4 model and 8 privated models.
- The models mentioned in the image are quite old, with Hermes 4 and 3 being the latest.
RMS_NORM gets unified and supported by METAL: A pull request was made to unify the RMS_NORM and NORM implementations and extend support for more shapes in METAL.
- It’s anticipated this will help quantized models work more closely with their transformer based counterparts.

Nous Research AI ▷ #research-papers (3 messages):

AlphaXiv, Paper frustrations

AlphaXiv to the rescue: A member shared a paper link from AlphaXiv, seemingly bypassing a login wall.
Login wall frustrations shared: A member expressed frustration about papers requiring login, refusing to log in just to see the paper link.

Nous Research AI ▷ #interesting-links (3 messages):

Manifest AI, Open Source Integration

Manifest AI Hype Check: A member shared a link to Manifest AI asking is it as groundbreaking as they are making it seem?
- Another member suggested checking out their open-source integration on their Git, stating this is what a model said about integration with their oss on their git.
Model’s Opinion on Open Source Integration: A model expressed a favorable opinion regarding Manifest AI’s integration with their open-source components, sparking interest within the community.
- The discussion emphasized the importance of evaluating the actual implementation and capabilities rather than solely relying on marketing claims.