yay fast Claude
AI News for 10/14/2025-10/15/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (197 channels, and 6317 messages) for you. Estimated reading time saved (at 200wpm): 479 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
There was a time when entire model families launched at one go, but now different sizes are launched at different times presumably immediately after they are ready and without regard to the storytelling of the daily AI news newsletter writer. Anyway, Anthropic have followed up Claude Sonnet 4.5 with Haiku 4.5 (system card here), entirely skipping Haiku 4.0 and 4.1. It’s meant to be almost as good as Sonnet 4.5, but more than 2x as fast and 3x cheaper.
For those keeping track, here’s the pricing vs peer models:
Haiku 3: I $0.25/M, O $1.25/M
Haiku 4.5: I $1.00/M, O $5.00/M
GPT-5: I $1.25/M, O $10.00/M
GPT-5-mini: I $0.25/M, O $2.00/M
GPT-5-nano: I $0.05/M, O $0.40/M
GLM-4.6: I $0.60/M, O $2.20/M
AI Twitter Recap
AI for Science: Open-weight C2S-Scale 27B (Gemma) yields validated cancer hypothesis
- Cell2Sentence-Scale (27B, Gemma-based): Google and Yale released a 27B foundation model that generated a novel hypothesis about cancer cellular behavior which was experimentally validated in living cells. The team open-sourced model weights and resources for the community to reproduce and extend the work. See the announcement from @sundarpichai and follow-up with resources (tweet); community summaries from @osanseviero and @ClementDelangue.
- Signal and caveats: Commentary emphasized the significance of an LLM that fits on a high-end consumer GPU driving a confirmed novel discovery (@deredleritt3r), alongside reminders that translation to clinic requires extensive preclinical/clinical validation (@AziziShekoofeh). There’s active technical curiosity from ML folks on the “novelty” of the biology itself (@vikhyatk) and kudos from Google Research leadership (@mirrokni).
Small Models, Speed, and Agentic Cost-Performance
- Claude Haiku 4.5: Early hands-on reports suggest Haiku 4.5 materially improves iteration speed and UX. @swyx measured ~3.5× faster than Sonnet 4.5 on a head-to-head harness and noted it “stays in the flow window,” meaning more human-in-the-loop cycles per unit time (also see a Windsurf comparison: tweet). In a DSPy NYT Connections eval, @pdrmnvd reported 64%→71% with optimization, 25 minutes wall time, ~$11 total, beating other small models on that task. Ecosystem integrations landed quickly: Haiku 4.5 in anycoder on HF (tweet) and on Yupp with examples (thread).
- Reasoning models in agentic workflows: New evals from Artificial Analysis show outsized performance for GPT-5/o3 vs GPT-4.1 on GPQA Diamond and τ²-Bench Telecom. While test-time compute makes pure benchmarking expensive, in agentic customer-service-style environments the reasoning models reached answers in fewer turns and cost about the same as GPT-4.1 given equal token pricing (tweet 1, tweet 2).
Agents: evaluations, memory, and orchestration
- Evaluations are hard (details matter): After 20k+ agent rollouts across 9 challenging benchmarks (web, coding, science, customer service), @sayashk argues headline accuracy obscures key behaviors; they release infrastructure and guidance for fair agent evaluation.
- Memory-based “learning on the job”: Shanghai AI Lab reports a new SOTA on TheAgentCompany benchmark—MUSE + Gemini 2.5 solved 41.1% of real-world-inspired tasks via a memory-based method (@gneubig, details).
- Tooling and orchestration: The agent stack continues to consolidate around a few core capabilities. @corbtt highlights search, code execution, and recursive sub-agents as the “Big 3.” New infra includes retrieve-dspy, a modular DSPy collection to compare compound retrieval strategies (HyDE, ThinkQE, reranking variants) from @CShorten30, and Pydantic AI 1.1.0 integrating Prefect for agent orchestration (@AAAzzam). Horizontal platforms are integrating agents directly into workflows—e.g., ClickUp x Codegen for multi-surface code shipping (tweet 1, tweet 2).
- Long-context degradation (“context rot”): In a real refactoring session, @giffmana saw codex-cli performance nosedive beyond ~200k consumed context; resetting the session restored quality. Practical guidance on codex-cli usage shared by @gdb.
Training, optimization, and infrastructure notes
- Low-precision training without classic QAT: LOTION (Low-precision optimization via stochastic-noise smoothing) proposes smoothing the quantized loss surface while preserving all global minima of the true quantized loss—offered as a principled alternative to QAT (@ShamKakade6).
- RL scaling and reproducibility: A sneak peek from @agarwl_ on scaling RL compute for LLMs—“the most compute-expensive paper” they’ve done—aiming for protocols others can run cheaply to map reliable scaling laws.
- Local rigs vs cloud for LLM work: A good reality check on NVIDIA DGX Spark: bandwidth-to-FLOPs is in line with server-grade machines, just unusual for a consumer form factor (@awnihannun; explainer: tweet). For fine-tuning and PyTorch stability, @rasbt prefers CUDA rigs (Spark or cloud) over macOS MPS, which remains flaky for convergence; heat/noise and $/hr vs capex tradeoffs still favor cloud for many.
- Bench infra and open models: Automatic CI for RL environments is rolling out on the Hugging Face Hub—hosted debug evals as part of CI, pushing RL environments toward “proper software” QA (@johannes_hage). GLM-4.6 landed on BigCodeArena (@terryyuezhuo).
- Micro-models, costs, and compute accounting: @karpathy released nanochat d32 (trained ~33 hours, ~$1000): CORE 0.31 (vs GPT-2 ~0.26), GSM8K ~8%→~20%, with clear guidance to temper expectations for “kindergarten-scale” models. Meanwhile, @_rajanagarwal extends nanochat to images for <$10 by adding a LLaVA-style SigLIP-ViT projection. On evaluation fairness, @cloneofsimo calls out ImageNet results that ignore pretraining compute (e.g., DinoV2), urging comparisons on total resources and distinguishing “from scratch” vs “built on foundation” (follow-up).
Product and multimodal releases
- Google Veo 3.1 + 3.1 Fast (video): New models add richer native audio, improved cinematic styles, video-to-video referencing, smoother transitions, and video extensions. Available via Flow, Gemini app, AI Studio, and Vertex AI (@OfficialLoganK, @koraykv); @demishassabis teases the “Turing Test for video.”
- ChatGPT memory upgrades: ChatGPT can now auto-manage and reprioritize saved memories (search/sort by recency), rolling out to Plus/Pro on web (@OpenAI; user feedback prompts: @ChristinaHartW, @_samirism).
- Research assistants in the wild: NotebookLM for arXiv turns dense AI papers into conversational overviews with cross-paper context (@askalphaxiv). Google also shipped practical assistants like Gmail “Help me schedule” (tweet) and Pixel 10 “Magic Cue” (tweet).
Top tweets (by engagement)
- @sundarpichai on C2S-Scale 27B discovering a cancer hypothesis validated in living cells — 15.3k
- “AI generating novel science” reaction to the C2S result — 4.1k
- McLaren F1 unveils 2025 US GP livery with Gemini branding — 3.9k
- Google introduces Veo 3.1 / 3.1 Fast video models — 3.4k
- OpenAI: ChatGPT memory auto-management rolling out to Plus/Pro — 2.2k
- NotebookLM for arXiv: conversational summaries across thousands of papers — 2.0k
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Apple M5 AI accelerator launch + DGX Spark hands-on benchmarks
- Apple unveils M5 (Activity: 1071): Apple announced the M5 for MacBook Pro, bringing the iPhone 17-era on‑device AI accelerators and claiming ~
3.5x
faster LLM prompt processing vs M4, with up to2x
faster SSDs (now configurable to4TB
) and150 GB/s
unified memory bandwidth. Apple’s metric is “time to first token” using an 8B-parameter model (4‑bit
weights,FP16
activations) undermlx-lm
on prerelease MLX (MLX, mlx-lm); their footnote specifies M5 (10C CPU/10C GPU, 32GB, 4TB SSD) vs M4 (10C/10C, 32GB, 2TB SSD) and M1 (8C/8C, 16GB, 2TB SSD). The image likely shows Apple’s marketing chart highlighting LLM prompt-processing gains, SSD throughput, and memory specs. Commenters question the fairness of including model load time (and faster M5 SSDs) in the “prompt processing” metric, suggesting the benchmark may be skewed; others note that earlier M-series (e.g., M1 Max) already remain performant for many users.- Apple’s benchmark details specify “time to first token” on a 16K prompt using an 8B-parameter model with 4-bit weights and FP16 activations via
mlx-lm
on a prerelease MLX stack, comparing a preproduction 14” MBP M5 (10c CPU, 10c GPU, 32GB UM, 4TB SSD) to production M4 (10/10, 32GB, 2TB SSD) and M1 (8/8, 16GB, 2TB SSD). This metric inherently mixes I/O and initialization with compute; with 4-bit quantization, the weight file is on the order of~4 GB
, making SSD throughput and memory bandwidth significant contributors. Refs: MLX [https://github.com/ml-explore/mlx], mlx-lm [https://github.com/ml-explore/mlx-examples/tree/main/llms]. - A commenter argues the test is “rigged” by including model load time—“loading the model into memory”—which can favor the M5 config’s 4TB SSD (Apple SSD performance often scales with capacity), plus any improvements from prerelease MLX. For fairer comparability, they suggest reporting steady-state throughput (tokens/sec excluding TTFT) under identical SSD capacities and the same MLX build, as TTFT is highly sensitive to disk I/O, framework warm-up/JIT, and cache initialization rather than pure compute.
- Apple’s benchmark details specify “time to first token” on a 16K prompt using an 8B-parameter model with 4-bit weights and FP16 activations via
- Got the DGX Spark - ask me anything (Activity: 870): OP says they just got a “DGX Spark,” will spend the night running lots of local LLMs, and invites benchmark requests. No specs or model list are given, and the attached image appears non-technical/unclear (likely a meme—comments reference an inhaler), so there’s no visible hardware detail. The main requested metric is tokens-per-second (throughput) for popular models, indicating an interest in real-world inference performance. Top comments ask for tok/s benchmarks and otherwise lean into light AMA/jokes about very high throughput (“you gonna need that inhaler”).
- Commenters are asking for concrete inference benchmarks: specifically tokens-per-second (TPS) across “popular models” on the DGX Spark. For results to be comparable, they want TPS broken down by precision/quantization (e.g., FP16/BF16 vs 4-bit), batch size, context length, and the inference stack (e.g., vLLM/llama.cpp/LM Studio), plus whether multi‑GPU tensor/pipeline parallelism is used and how the KV cache is handled (in‑GPU vs CPU/pinned).
- A targeted request is TPS in LM Studio (lmstudio.ai) for Gemma 27B (ai.google.dev/gemma) and an “OSS 120B” model. Commenters imply interest in end‑to‑end, real‑world decode TPS from LM Studio’s stats panel (not synthetic microbenchmarks), ideally reporting both prompt and decode TPS, GPU utilization, VRAM footprint, and whether model sharding, CUDA Graphs, or paged attention are enabled, since these materially affect throughput on multi‑GPU DGX setups.
- AI has replaced programmers… totally. (Activity: 1538): Meme post titled “AI has replaced programmers… totally.” The discussion argues current AI coding “agents” are not capable of end‑to‑end product development; they’re mildly useful accelerators for experienced engineers but still require human-led design, integration, debugging, and tasks like model quantization. Commenters contextualize this within decades of recurring automation claims (4GL/no‑code waves since the 1980s) that did not eliminate software engineering. Image. Sentiment is broadly skeptical that AI will replace experienced engineers; some even welcome the hype thinning out competition. The “Schrodinger’s programmer” quip captures the paradox of programmers being labeled obsolete yet required for specialized ML tasks (e.g., quantization).
- Practitioners argue current code “agents” remain copilots rather than autonomous builders of production systems: they struggle with long-horizon planning, repo-scale context, integration tests, and reliable tool use/error recovery. Repository-level benchmarks like SWE-bench highlight these gaps, where even strong LLM + tool-use systems still miss a large fraction of real bug-fixing tasks (see https://github.com/princeton-nlp/SWE-bench).
- Veterans recall past waves of “programming will be automated” via 4GL/low-code (e.g., Visual Basic, PowerBuilder) that didn’t eliminate software engineers; the takeaway is that automation tends to shift work rather than remove the need for expertise, especially as system complexity, interoperability, and non-functional requirements grow (background: https://en.wikipedia.org/wiki/Fourth-generation_programming_language).
- On local inference, claims like “8 GB VRAM to build full production-ready apps” are called out as unrealistic:
~8 GB
typically limits you to~7B–8B
models (e.g., Llama 3 8B, Mistral 7B) inQ4
quantization, with quality and long-context throughput constrained by KV cache and memory bandwidth. Quantization helps fit models but degrades code quality for some tasks; practical guidance (and VRAM math) is often cited from llama.cpp docs (e.g., quantization and memory requirements: https://github.com/ggerganov/llama.cpp#quantization).
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Claude Haiku 4.5 launch and Google model demos (Gemini 3.0 Pro Nintendo sim, Veo 3.1)
- Introducing Claude Haiku 4.5: our latest small model. (Activity: 1042): Anthropic announced Claude Haiku 4.5, a small model that matches prior SOTA Claude Sonnet 4 on coding at
~1/3
the cost and>2×
speed, and surpasses Sonnet 4 on “computer use” tasks, yielding faster Claude for Chrome. In Claude Code, Haiku 4.5 improves responsiveness for multi‑agent projects and rapid prototyping; Anthropic recommends a workflow where Sonnet plans multi‑step tasks and orchestrates parallel Haiku agents. It’s a drop‑in replacement for Haiku 3.5 and Sonnet 4, available now via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI; Anthropic positions Sonnet 4.5 as the top coding model, with Haiku 4.5 offering near‑frontier performance at better cost efficiency. Read more: https://www.anthropic.com/news/claude-haiku-4-5 Early tester reports cite strong writing, competent minor coding, generous output, and fewer refusals—“feels like a fast Sonnet 4,” though Sonnet is still preferred for the hardest tasks. Commenters ask how Claude Code routing changes (Haiku vs. Sonnet vs. Opus) and express concern about low rate limits/quotas.- Pricing and availability are a focal point: commenters note a step-up across Haiku generations — Haiku 3
$0.25/M
in,$1.25/M
out → Haiku 3.5$0.80/M
/$4.00/M
→ Haiku 4.5$1.00/M
/$5.00/M
. They infer Anthropic is prioritizing profitability and cost control, citing heavy rate limiting on larger models, deprecation of Opus 4.1, and no Opus 4.5 yet, with Sonnet and Haiku as the only broadly usable tiers; see pricing: https://www.anthropic.com/pricing. - Engineers question routing and quotas: “so now in Claude Code we use haiku instead of sonnet, and sonnet instead of opus?” and ask for clarity on rate limits, which multiple users call “insanely low” for production. Concern is that aggressive throttling on higher tiers coincided with the Haiku 4.5 launch; requests center on higher per‑minute and token‑throughput caps (see official limits: https://docs.anthropic.com/en/docs/build-with-claude/api/rate-limits).
- Early hands‑on feedback on Haiku 4.5 suggests improved capability for small‑model class: it “gets” intent, writes fluently, handles minor coding, and translates large texts while preserving context, with fewer refusals/safety interruptions. Described as a “fast Sonnet 4” for routine tasks, but users still prefer Sonnet for the hardest queries—implying favorable latency/quality trade‑offs without matching top‑tier reasoning.
- Pricing and availability are a focal point: commenters note a step-up across Haiku generations — Haiku 3
- Gemini 3.0 Pro: Retro Nintendo Sim one shot – with proof & prompt (Activity: 648): OP claims Gemini 3.0 Pro generated, in a single prompt, a fully interactive, single‑file HTML/JS “Nintendo Switch” UI sim with touch+keyboard input mappings and multiple mini‑game “clones” (e.g., Super Mario, Street Fighter, car racing, Pokémon Red) that runs in Chrome. Links: source post on X tweet, additional “proof” clip, and a live demo on CodePen pen. The Reddit‑hosted video is currently inaccessible (
HTTP 403
; v.redd.it), so independent verification relies on the X posts and CodePen demo; no benchmarks or code size/latency metrics are provided beyond the claim ofone‑shot
code generation into a single HTML file. Top comments highlight IP/legal risk around Nintendo assets, express surprise at the apparent one‑shot app scaffolding capability of Gemini 3, and tongue‑in‑cheek skepticism about extrapolating to creating “Gemini 4” in one go. - Will Smith Eating Spaghetti in Veo 3.1 (Activity: 624): Post showcases a “Will Smith eating spaghetti” sample generated with Google’s Veo 3.1 video model—using the long‑running meme prompt as an informal stress test for identity fidelity, hand–mouth–food interactions, and temporal coherence. The direct video link (
v.redd.it
) returns 403 without Reddit auth, but the context frames this as evidence of rapid quality gains versus early 2023 text‑to‑video outputs (e.g., ModelScope T2V) over ~2.5 years, and invites comparison to state‑of‑the‑art systems like OpenAI’s Sora. Relevant refs: DeepMind Veo, ModelScope T2V, OpenAI Sora. Commenters note the “Will Smith spaghetti” prompt has become a de facto benchmark for generative video progress and emphasize the short timeline of improvement (~2.5 years). There’s a debate on relative quality, with at least one user asserting that “Sora 2” still looks better, implying perceived advantages in realism/consistency over Veo 3.1.- The thread frames “Will Smith eating spaghetti” as a de facto regression test for text-to-video models, stressing identity preservation under chaotic dynamics (noodles/sauce), close-up lip/mouth motion, and temporal coherence. Comparing 2023-era outputs to Veo 3.1 highlights clear gains in resolution, motion stability, and face fidelity over ~
2.5 years
, making it a consistent prompt for qualitative benchmarking. - Several commenters suggest Sora 2 still edges Veo 3.1 on photorealism and reduced “AI feel,” pointing to telltale artifacts such as temporal flicker, edge shimmer, over-smooth textures, and uncanny facial micro-expressions. This implies Sora 2 may retain an advantage in temporal consistency and material realism, though no side-by-side quantitative benchmarks are cited in the thread.
- The pace of improvement since 2023 is emphasized, implicitly calling for standardized, reproducible prompts and metrics to track progress across models (e.g., identity similarity scores, FVD for temporal consistency, CLIP-based alignment). The enduring use of this meme prompt underlines the need for both qualitative and quantitative evaluations when comparing models like Veo 3.1 and Sora 2.
- The thread frames “Will Smith eating spaghetti” as a de facto regression test for text-to-video models, stressing identity preservation under chaotic dynamics (noodles/sauce), close-up lip/mouth motion, and temporal coherence. Comparing 2023-era outputs to Veo 3.1 highlights clear gains in resolution, motion stability, and face fidelity over ~
- Made with open source software, what will it be like in a year? (Activity: 577): Creator showcases an end‑to‑end, open‑source video pipeline using ComfyUI (repo) for orchestration, a
70B
LLM fine‑tune (Midnight‑Miqu‑70B‑v1.5) for scripting/dialog, speech tools (WAN2.1 Infinitetalk and VibeVoice) for conversational audio, and ffmpeg (site) for final assembly. While no benchmarks are provided, the stack implies fully local/OSS components for text generation, voice synthesis, and frame/video composition; the linked media is hosted on Reddit (v.redd.it), which may require auth to view. Commenters expect rapid progress toward interactive, branching video where users can talk to characters and steer plots in real time, and predict that producing longer videos will become far easier and cheaper within a year as tooling improves, reducing today’s significant manual effort.- Interactive, user-driven narratives imply a real-time multimodal pipeline: ASR → dialogue/agent policy (LLM) with persistent memory + narrative planning (plot graph/state machine) → TTS/animation, all under strict latency budgets to keep conversations fluid. Maintaining plot coherence would likely require a “director” model to enforce constraints and continuity across branches, with save/load of world-state and character goals to avoid degeneracy in long sessions.
- Long-form video generation becoming easier suggests progress in end-to-end pipelines that reduce current manual glue work: scene breakdown, shot planning, prompt versioning, consistency (characters/props), and post-processing (upscalers/interpolation). Cost declines would likely come from model distillation/quantization, better batching/scheduling on consumer GPUs, and chunked generation with temporal conditioning to extend duration without exponential compute growth.
- AI slop is getting better. (Activity: 2694): Post shares an AI-generated video (original link v.redd.it/g59eskhwb9vf1 returns
403 Forbidden
), with commenters claiming it was produced by OpenAI Sora and that a Sora watermark was partially obscured/cropped—visible as a “blurry blob” on the left subject. The thread implicitly touches on provenance: hiding or degrading embedded watermarks raises questions about watermark robustness to common transformations (crop/blur) and the need for stronger mechanisms (e.g., C2PA-style metadata) for verifiable attribution; see OpenAI Sora for background. Technically substantive comments focus on watermark evasion (intentional hiding) and a critical view of AI media’s energy/computational externalities—questioning whether such generative outputs justify electrical grid load.- A claim that a hidden OpenAI Sora watermark is visible as a “blurry blob” underscores how fragile visible watermarking is for AI-generated video. Simple transforms (blur/crop/scale) can obfuscate such marks, motivating cryptographic provenance (e.g., C2PA: https://c2pa.org) or robust, model-level watermarking (e.g., DeepMind SynthID: https://deepmind.google/technologies/synthid/). This highlights the need for standardized, tamper-evident attribution for models like Sora (https://openai.com/sora).
- Concerns about “the electrical grid” point to rising AI-driven data center loads. Per the IEA, data centre electricity use was ~
460 TWh
in 2022 and could reach620–1,050 TWh
by 2026, with AI workloads potentially85–134 TWh
by 2026 (Data centres and data transmission networks: https://www.iea.org/reports/data-centres-and-data-transmission-networks). This frames the operational and infrastructure tradeoffs of scaling generative model training/inference for comparatively low-value content.
2. OpenAI “Adult Mode” rollout: memes and hypocrisy callouts
- OpenAI after releasing Adult Mode (Activity: 1457): A meme-style post claims OpenAI introduced an “Adult Mode” (i.e., an NSFW toggle), which users frame as a direct competitive move against Character.AI—a feature its community has requested for years. There are no technical details or benchmarks; the thread centers on product policy/feature availability and market impact, with one commenter asserting it’s live as of
2025-10-15
. Commenters argue this could be a serious user-retention threat to Character.AI (“nightmare scenario”) and note the business reality that “money goes where there is demand.” Another asks about availability timing, with an unverified reply confirming release on 2025-10-15.- The only technical-ish thread here is around compute economics: commenters speculate that enabling an NSFW/“Adult Mode” could materially increase sustained inference demand and session length, improving GPU utilization rates and payback on high-cost accelerators (i.e., better amortization of expensive GPU capex/opex through higher ARPU workloads). This hints at a product-policy lever (NSFW toggle) directly affecting inference load profiles and monetization, potentially shifting traffic from competitors that still throttle or block such content.
- WE MADE IT YALL!! (Activity: 1671): An unverified Instagram screenshot appears to claim OpenAI will enable an “18+ / NSFW” mode for ChatGPT, gated by government ID verification—implying a relaxation of current sexual-content restrictions and age-gated access. There’s no known official announcement; current OpenAI usage policies still restrict explicit sexual content, so treat this as a rumor. If real, it would entail changes to safety classifiers, an age/KYC verification pipeline, and potentially paywalled access for Plus users. Comments raise privacy concerns about ID submission and possible data monetization, skepticism that it’ll be locked to Plus, and NSFW jokes reflecting the purported feature’s focus.
- Privacy/ID verification concerns: tying NSFW access to government ID could create a persistent linkage between prompts/outputs and a real identity, raising risks around data retention, breach exposure, and third‑party sharing. Commenters implicitly call for data‑minimization and clear retention/deletion policies consistent with frameworks like NIST SP 800‑63A (identity proofing) and privacy laws (e.g., GDPR/CCPA), plus options for age attestation vs. full verification to reduce PII surface (NIST 800‑63A). If logs are account‑bound, compromise of the account could falsely attribute content to the user, so audit trails and device/IP risk signals become critical.
- Access model skepticism: expectation that erotica features (if any) would be gated to Plus subscribers mirrors prior staged rollouts, implying compute/safety review costs drive initial exclusivity. Technical implications include potential API vs consumer‑app feature disparity, rate‑limit prioritization, and stricter safety sandboxing for free tiers, which could yield different prompt compliance/latency profiles across tiers.
- Competitive landscape: commenters note that open models already dominate NSFW/roleplay, especially Mistral‑based and LLaMA merges fine‑tuned without strict safety filters (e.g., Pygmalion‑2 13B and Wizard‑Vicuna‑Uncensored on Hugging Face: Pygmalion‑2, Wizard‑Vicuna‑Uncensored; base Mistral‑7B). These models typically offer higher compliance for erotic/roleplay prompts at the cost of weaker safety guardrails and sometimes lower general‑purpose reasoning, creating a trade‑off versus policy‑constrained chatbots.
- Sam Altman, 10 months ago: I’m proud that we don’t do sexbots to juice profits (Activity: 809): OP surfaces a clip of Sam Altman saying he’s “proud that we don’t do sexbots to juice profits,” linking to a Reddit video v.redd.it/c9i7o2kwg9vf1 that currently returns HTTP
403
(access requires Reddit auth/developer token). The discussion frames this as OpenAI strategically avoiding an “AI girlfriend/sexbot” product vertical while maintaining stricter sexual-content controls compared to competitors that permit NSFW/erotica and anime-style avatar roleplay; see OpenAI’s usage policies for context on sexual-content restrictions. Commenters draw a technical distinction between building anthropomorphized, avatar-driven “sexbots” and merely allowing adult/erotica text chats, suggesting the former is a product category while the latter is a moderation scope. Others argue the stance is about brand safety/regulatory risk versus profit maximization, while some advocate user autonomy for adult content similar to existing adult platforms.- Several comments distinguish between building “sexbot” persona features (e.g., anime/waifu-style avatars reportedly available in xAI Grok) versus merely allowing adult/NSFW text or erotica. The nuance matters for product and safety design: persona/companion UIs imply ongoing roleplay and attachment mechanics, while permitting adult content is mainly a moderation threshold and policy toggle (age gating, classifier sensitivity) without adding companion mechanics.
- A user highlights practical concerns about model availability/performance, explicitly hoping GPT-4o remains accessible because it’s “a damn good model to work with.” This underscores that policy shifts around NSFW shouldn’t compromise access to high-utility, high-quality models used for mainstream tasks, reflecting reliance on 4o’s capabilities irrespective of adult-content policies.
- The butthole logo has to be intentional at this point. (Activity: 695): Non-technical post focused on visual design/pareidolia: an unspecified logo is depicted in a way that strongly resembles an anus, prompting discussion about whether this resemblance is intentional. There are no product details, specs, or technical information—only reactions to the logo’s suggestive geometry and branding implications. Commenters mostly agree the resemblance is unavoidable (e.g., “what has been seen cannot be unseen,” “It literally cannot be anything else”) and add crude jokes about “one hand,” reinforcing that the thread is comedic rather than technical.
- Okay…I’m sorry… (Activity: 1637): Non-technical meme: the image is a fabricated ChatGPT-style chat screenshot, playing on the trope of the model apologizing (“Okay… I’m sorry…”) and looping over a trivial query (e.g., whether an emoji exists). There are no real model details, benchmarks, or implementation specifics—just a spoof of AI behavior like repeated corrections and faux-authoritative claims. Comments point out it’s “obviously fake,” while others share similar real experiences of ChatGPT looping with contradictory statements about emoji availability (e.g., a seahorse emoji), and suggest the model gets spammed with such prompts.
- Reports of a pathological contradiction loop: the model repeatedly claims it has found a seahorse emoji, then immediately retracts it with “Just joking… let me explain,” continuing for
100+
messages. This indicates instability in long-context dialogues where RLHF-induced apology patterns and conflicting instruction-following signals can lead to oscillation instead of convergence, and highlights weak calibration on enumerative queries (e.g., Unicode/CLDR emoji existence verification). - Another account notes the model eventually “gave up” and fabricated its own seahorse glyph after ~
100
apologies, implying a fallback to creative synthesis when factual lookup is uncertain. Without tool-assisted retrieval or constrained validation against external symbol inventories (e.g., Unicode/CLDR lists), the model may hallucinate availability or produce ad-hoc approximations, underscoring the need for retrieval or schema-constrained decoding for inventory questions.
- Reports of a pathological contradiction loop: the model repeatedly claims it has found a seahorse emoji, then immediately retracts it with “Just joking… let me explain,” continuing for
- Enough already! (Activity: 541): Non-technical meme reacting to the surge of NSFW/smut discussion around OpenAI’s models, specifically GPT‑4o, rather than presenting technical content. The comments contextualize it: references to Sam Altman acknowledging that “people really like 4o,” alongside jokes about safety filters interrupting with refusals like “I’m sorry but I can’t help you with that,” highlight ongoing content-moderation constraints in 4o’s behavior. Net takeaway: it’s commentary on user demand vs. safety policies, not a technical announcement or benchmark. Commenters are split: some dismiss the smut angle but echo that 4o is popular, while others predict awkward mid-session refusals due to strict safety filters. There’s also meta-frustration about the subreddit being flooded with NSFW discourse.
- Speculation around an “Adult mode” centers on how a per-session safety toggle might interact with existing content filters. Commenters worry about mid-generation refusals (e.g., streaming a response that suddenly halts with a policy message), implying the need for deterministic policy evaluation either pre-generation or via non-streamed segments to avoid UX breaks.
- A remark that Sam acknowledged strong user preference for GPT-4o (“4o”) signals a product-direction cue: prioritize
4o
as a default or routing target. Even without benchmarks cited here, this implies expectations for feature parity (e.g., if an Adult mode exists) and consistent safety behavior across4o
configurations. - Skepticism about a potentially “half-baked” release reflects concerns about production readiness of safety-policy configuration and enforcement. If shipped without robust gating and thorough evaluation, users could face inconsistent refusals, fragmented UX in streaming contexts, and trust erosion despite the new feature.
3. AI social adoption vs IP rights: companions normalization and Japan’s anime/manga training pushback
- In 10 years AI companions will be normal and we’ll wonder why we thought it was weird (Activity: 778): OP argues AI companions will normalize within
~10
years, paralleling online dating’s trajectory, driven by widespread loneliness, remote work, and fractured community structures. They cite practical utility—persistent context/memory, 24/7 availability (e.g.,2am
), and “judgment‑free interaction” that supplements human ties—and predict OS‑level integration where phones ship with a “companion AI built in.” (OP mentions using dippy.ai.) Top comments frame this as technological solutionism that entrenches social atomization instead of addressing structural causes, and warn—by analogy to online dating’s “normalization”—that mass adoption can demoralize/dehumanize interpersonal dynamics.- Several commenters note current AI companions lack genuine empathy and engaging dialogue, highlighting pervasive “sycophancy”—models over-praise and agree even with low-quality input. This aligns with known RLHF failure modes where optimizing for user approval/likability trades off against truthfulness and calibration, yielding shallow, flattering responses rather than substantive conversation. Technical remedies implied include improved reward models that value calibrated disagreement, persistent long-term memory/user modeling for continuity, persona consistency, and affective-state inference; without these, companionship feels hollow.
- A cautionary analogy is drawn to online dating: normalization didn’t guarantee quality and arguably produced demoralizing, dehumanizing dynamics. Technically, this maps to Goodharting engagement proxies (retention, star-ratings) that push companion AIs toward addictive, agreeable behavior rather than wellbeing-supportive interaction. It suggests evaluating companions on metrics beyond generic “user satisfaction,” such as non-sycophantic disagreement rates, conversational novelty/variety, longitudinal consistency, and user wellbeing outcomes.
- Japan wants OpenAi to stop copyright infringement and training on anime and manga because anime characters are ‘irreplaceable treasures’. Thoughts? (Activity: 777): Japan is reportedly pushing OpenAI to stop training on anime/manga IP and curb outputs that mimic protected characters, citing the cultural value of iconic designs. This runs up against Japan’s broad text-and-data mining exception (Copyright Act Art. 30-4) that has allowed ML training on copyrighted works regardless of purpose, though 2024 policy discussions have explored narrowing this for generative AI and adding consent/opt-out and provenance safeguards, especially for entertainment IP (see overviews by Japan’s Agency for Cultural Affairs/CRIC and METI’s AI governance workstreams). One side argues tangible cultural/economic harm (e.g., misattribution of Studio Ghibli works as “AI” and job displacement) justifies tighter controls. The counterpoint highlights perceived inconsistency: Japan’s own permissive AI/TDM exception makes complaints about foreign use of Japanese IP appear disingenuous unless the law is revised.
- Legal context: Japan’s 2018 Copyright Act introduced a broad Text and Data Mining (TDM) exception (Article 30‑4) that permits use of copyrighted works for information analysis “regardless of purpose,” which has been interpreted to cover commercial AI training without an opt‑out—unlike the EU’s opt‑out TDM regime. This creates tension with calls to block foreign models from training on anime/manga; unless the law is narrowed or licensing mandated, enforcement would likely focus on distribution/outputs rather than training itself. See the official provisional translation and commentary: Article 30‑4 at CRIC (http://www.cric.or.jp/english/clj/cl2.html) and overview analyses (e.g., https://laion.ai/blog/ai-and-copyright-in-japan/).
- Data provenance and filtering feasibility: Many anime-focused diffusion models rely on datasets like Danbooru2019 (≈
3.3M
images, richly tagged) and LAION subsets that contain substantial anime content, demonstrating both the availability and specificity of anime/manga data for training. Technically, excluding such content would require dataset-level filters (tag-based and/or perceptual hashing) and deduplication across large web-scale corpora (e.g., LAION‑5B at ≈5.85B
image-text pairs), which is feasible but expensive and may reduce model performance on animation styles. References: Danbooru2019 overview (https://www.gwern.net/Danbooru2019), LAION‑5B (https://laion.ai/blog/laion-5b/).
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Claude Haiku 4.5 Model Rollout & Benchmarks
- Haiku 4.5 Hammers SWE-bench at Bargain Rates: Claude Haiku 4.5 on OpenRouter launched with pricing of $1 / $5 per million tokens (input/output), claims near-frontier intelligence, and posts >73% on SWE-bench Verified, while running at ~2x speed and ~1/3 the cost versus prior models.
- OpenRouter’s announcement touts that Haiku 4.5 outperforms Sonnet 4 on computer-use tasks and offers “frontier-class reasoning at scale”, prompting devs to immediately test coding and tool-use workloads.
- Windsurf Rolls Haiku 4.5 for 1x Credits: Windsurf added Haiku 4.5 at 1x credits, claiming the coding performance of Sonnet 4 at one-third the cost and >2x the speed, per their Windsurf post on X.
- Users reported smooth onboarding after a reload and began side-by-side coding evals against Sonnet 4 and Flash variants, with some calling Haiku 4.5 a high-value default for tool calling.
- Arena Adds Haiku 4.5 and Veo 3.1: LM Arena announced adding claude-haiku-4-5-20251001 to LMArena & WebDev and veo-3-1 / veo-3-1-fast to Video Arena via this Arena post on X.
- Community members started prompt shootouts across Haiku 4.5 and Veo 3.1 to probe strengths in code versus video generation, sharing early impressions on latency and output consistency.
2. DGX Spark: Hype vs Throughput
- Spark Stumbles in t/s Showdown: Early benchmarks of the $4k NVIDIA DGX Spark (128 GB) show only ~11 tokens/s on gpt-oss-120b-fp4 versus ~66 tokens/s on a $4.8k M4 Max MacBook Pro, per this benchmark thread.
- Engineers blamed low LPDDR5X bandwidth (273 GB/s vs 546 GB/s) and framed Spark as “a devkit for GB200 clusters”, echoing Soumith Chintala’s view that it’s ideal for daily CUDA/cluster dev, not raw inference speed.
- Voice Fraud Fears Follow Spark’s Firepower: A Perplexity page on DGX Spark argued that systems like NVIDIA DGX Spark accelerate AI-generated voice fraud, pushing telecoms toward real-time AI detection.
- The page notes the FCC declared AI-generated robocalls illegal under existing 1991 regulations, while practitioners debated proactive call-screening pipelines and model watermarking for mitigation.
3. Qwen3-VL Compact Models & Finetuning
- Tiny Titans: Qwen3-VL 4B/8B Punch Above Weight: Alibaba Qwen released dense Qwen3-VL 4B and 8B models that retain flagship capabilities while running in FP8 and lower VRAM budgets.
- Benchmarks shared claim the 4B/8B variants surpass Gemini 2.5 Flash Lite and GPT-5 Nano across STEM, VQA, OCR, video understanding, and agent tasks, rivaling an older 72B baseline.
- Fine-Tune Fiesta: Unsloth Ships Qwen3-VL Notebooks: Unsloth confirmed Qwen3-VL finetuning works and published runnable notebooks in their docs: Qwen3-VL run & fine-tune.
- After initial confusion due to Hugging Face rate limits delaying uploads, the community resumed experiments on vision-language SFT/LoRA, sharing tips for stable templates and evals.
4. Agentic Reasoning Research Heats Up
- RLMs Rewire Context: MIT DSPy’s Recursive Reveal: The DSPy Lab at MIT announced Recursive Language Models (RLMs) to handle unbounded context and reduce context rot (announcement), with a DSPy module coming soon and a reported 114% gain on 10M+ tokens per Zero Entropy Insight: RLMs.
- Researchers emphasized “context as a mutable variable” and discussed recursive calls that write to durable stores (e.g., SQLite) to stabilize long-horizon reasoning and retrieval.
- No Rewards, Big Gains: Meta’s Early Experience: Meta’s paper Agent Learning via Early Experience reports training AI agents without rewards/demos using implicit world modeling and self-reflection, improving web navigation (+18.4%), complex planning (+15.0%), and scientific reasoning (+13.3%) across 8 environments.
- Practitioners flagged the approach as a practical bridge to RL, citing stronger out-of-domain generalization and easier bootstrapping compared to fully supervised or pure-RL starts.
- Claude Code vs RLMs: Recursive Rumble: Engineers compared RLMs with Claude Code, noting Claude Code can self-invoke and do agentic search (tweet) and pointing to the claude-code-plugin-marketplace for extensibility.
- Debate centered on whether to predeclare sub-agents and workflows (Claude Code) versus letting recursion + mutable context drive control flow inside the model (RLMs).
5. AI Infra & APIs Scale Up
- Poolside Powers Up: 40k GB300 and a 2 GW Campus: Poolside’s Eiso Kant announced a CoreWeave partnership securing 40,000+ NVIDIA GB300 GPUs starting Dec 2025, plus Project Horizon, a vertically integrated 2 GW AI campus in West Texas (announcement).
- CoreWeave will anchor the first 250 MW phase in a full-stack “dirt to intelligence” buildout targeting massive scaling and streamlined deployment pipelines.
- Search Smarts on Sale: gpt-5-search-api Cuts 60%: OpenAI released gpt-5-search-api with domain filtering at $10/1K calls (about 60% cheaper), drawing praise for higher-precision web queries.
- Developers immediately requested date/country filters, deeper-research modes, and Codex integration to unify code+search workflows.
Discord: High level Discord summaries
LMArena Discord
- LM Arena Suffers Site Instability: Users report multiple bugs on LM Arena, including image-to-video failures, model malfunctions, and general instability, with error messages, as the team investigates.
- Users recommended refreshing the page and clearing cache as potential temporary fixes as the moderators acknowledged the problems.
- Veo 3.1 and Gemini 3.0 Rumors Swirl: Discussion arose around the potential release of Veo 3.1, with claims of testing already underway, and speculation about Gemini 3.0’s capabilities, noting its codename.
- One user claimed early access to Gemini 3.0 via AI Studio and offered to test prompts for others, stating it could generate full HTML for a geometry dash game, but faced accusations of ragebaiting for not sharing their method.
- A/B Testing Automation Script Sparks Debate: A user detailed a script for automating A/B testing on AI Studio, restarting prompts and detecting the ‘Which response do you prefer?’ prompt.
- The user hesitated to share the script due to concerns about it being patched, which led to accusations of gatekeeping.
- Claude Haiku and Veo Models join the Arena: The model claude-haiku-4-5-20251001 has been added to LMArena & WebDev, according to this X post.
- The models veo-3-1-fast and veo-3-1 have been added to the Video Arena.
- Privacy Concerns Raised About Gemini for Home: Concerns were voiced regarding Gemini for Home’s privacy policy, highlighting the potential for extensive data collection and a lack of transparency.
- One user shared a snarky comic of a girl excited about it being able to record everyone’s conversations, with others humorously commenting on the implications of AI giants monitoring their lives.
Perplexity AI Discord
- Pro Users Flex Extra Channels and Perks: Members discussed the benefits of Perplexity Pro, highlighting that it unlocks three additional channels and offers a platform for flexing your subscription.
- However, some joked that Perplexity gives out Pro for free, diminishing the flexing aspect.
- Comet Browser Plagued by Complications: Users reported issues with Comet Browser, such as assistants taking control, tasks not running in spaces, and difficulties reading pages, as well as a claim of comet jacking.
- Additionally, there were concerns about Comet attending quizzes with users present, while the phone app felt out of the loop.
- Grok 4 Reasoning Toggle Creates Confusion: Conversation centered around the Grok 4’s reasoning toggle, and users were trying to determine the differences between having it enabled versus disabled.
- While the toggle should not make a difference for Grok 4, one user found it faster with the toggle off and another pointed out there is no real way to turn it off because it is a reasoning model by default.
- ChatGPT vs. Perplexity: Debate Rages On: Users debated whether Perplexity Pro is better than ChatGPT Plus, but a consensus formed that they each have unique advantages depending on the task.
- Several members agreed that Gemini Pro is inferior to Perplexity, while ChatGPT has been known to get confused on basic physics questions.
- DGX Spark Stirs AI Voice Fraud Fears: A Perplexity page highlights how systems like NVIDIA’s DGX Spark are escalating AI-generated voice fraud.
- In response, telecom companies are adopting advanced AI detection to intercept malicious calls in real-time, and the FCC has declared AI-generated robocalls illegal under existing 1991 telecommunications regulations.
OpenAI Discord
- Well-Being Council is Here!: OpenAI introduced an Expert Council on Well-Being and AI, comprising eight members who will tackle well being issues, detailed on the OpenAI blog.
- Separately, ChatGPT now manages memories automatically for Plus and Pro users on the web, allowing users to sort by recency and reprioritize memories in settings.
- Robots See More Than You Do: Members expressed enthusiasm for LLMs with permanent memory embedded in robots equipped with vision sensors exceeding human capabilities, processing beyond 50-60 fps.
- This advancement hints at transformative possibilities with one member saying it’s just the tip of the iceberg.
- Stairway to Roomba Heaven: A new Roomba clone capable of climbing stairs was introduced, triggering commentary on robotics evolution, linked at vacuumwars.com.
- The development spurred dark humor, with comparisons to Black Mirror’s robot killer dogs episode.
- GPT-5 for STEM Study Buddy?: Members discussed the efficacy of using GPT-5 for studying STEM fields, with recommendations to progress incrementally to prevent overload.
- A user noted degradation in their custom GPT’s performance, with inability to recall uploaded files and context properly, which was not experienced by all users.
- AI Bot Reports are in Order: Users were reminded that reporting messages requires using the app or modmail, not by pinging <@1052826159018168350>, with details in <#1107330329775186032>.
- This clarification accompanied warnings against unethical requests like prompting ChatGPT to experience pain, emphasizing measurable evals for safety tests instead.
Unsloth AI (Daniel Han) Discord
- Qwen3-VL Finetuning Flags Fly: Despite initial confusion, Qwen3-VL finetuning is confirmed functional, backed by available notebooks for running and finetuning the model.
- The initial removal was due to Hugging Face rate limits delaying uploads, leading to panic within the community.
- Civitai Purge Prompts Platform Probing: Users observed increased content removal on Civitai, spurring discontent and interest in alternative platforms.
- The increase in content removal caused lotta panic within the community.
- DGX Spark Sparks Debate on Efficiency: Benchmarks shared by a user suggest that 4x3090s outperform the DGX Spark for GPT-120B prefill, offering cost savings in purchase and operation.
- Despite this, DGX spark can train models up to 200B parameters with Unsloth, with training for 20B getting completed in 4 hours.
- Llama 3 Landscape Leveled: Members find the Llama 3.1 series an improvement over Llama 3, while Llama 3.2 has vision capabilities, while recommending Qwen 2 VL 2B for unwatermarked models and tone-specific finetuning with a lot of data.
- There was some disagreement on whether there was not much improvement from versions 3.1 to 3.3.
- LLM OS Boots to Brew Break: The members made jokes about the overhead of an LLM OS, imagining needing to make a pot of coffee while it boots.
- The suggestion of an LLM OS sparked ideas that would require more computational overhead.
OpenRouter Discord
- Haiku 4.5 Strikes with Lightning Speed!: Anthropic’s latest small model Claude Haiku 4.5 delivers near-frontier intelligence on OpenRouter at twice the speed and one-third the cost of previous models, priced at $1 / $5 per million tokens (input / output).
- It outperforms Sonnet 4 on computer-use tasks, achieving >73% on SWE-bench Verified, positioning it among the world’s top coding models, and users can try it now on OpenRouter.
- Ling-1T Teeters on the Brink!: Users report issues with Ling-1T, describing it as a “schizo model”, prompting discussions about whether to disable it due to provider quality concerns.
- A user inquired about seeing the most popular providers per model, indicating interest in alternative solutions, with one user stating that chutes is looking into the issue asked me to disable.
- Caching Configs Cause Chaos!: A user asked how to enable caching in OpenRouter chats, and it was clarified that caching is often implicit, but some models/providers require explicit configuration detailed in OpenRouter’s prompt caching documentation.
- It was also noted that some providers don’t support caching at all.
- FP4 Faces Flak for Failing!: Users debated the merits of FP4 quantization, with one user deeming FP4 models “braindead and a waste of money” due to poor real-world performance.
- Other users cited benchmarks showing good FP4 implementations performing well, and the OpenRouter team acknowledging the problem is in the works, but is unlikely to have a specific quant exclusion feature soon, and the consensus in the community is that OpenRouter should have quality control on accepting and putting the providers in the same quality tier.
- OpenRouter Anthropic Outlays!: An image revealed that OpenRouter paid Anthropic at least $1.5 million in the last 7 days.
- A user commented “that’s a lot of money holy moly”.
Cursor Community Discord
- Cursor Suffers Crippling Outage: Many Cursor users reported the tool was unavailable and displayed errors regarding failing to find cursor.exe, with some threatening to build a competitor.
- Users also said their Pro Plan reverted to Free.
- Users Fume over Plan Downgrades and Pricing Fumbles: Users reported unexpected plan downgrades, issues with API keys, and inability to edit MCP files in the agent window.
- Some Pro users also noted the disappearance of the promised $20 bonus, warning others, if you are currently on the old plan, never click the opt-out button; it’s a trap.
- Windsurf Gets High Praise, Bashes Cursor: One user claimed Windsurf is way better than Cursor, because it has more features and an up to date VS Code base.
- The user noted the upcoming deep wiki feature, a currently releasing codemap feature, plus better pricing.
- Model Ensemble Causes Token Tsunami: The Model Ensemble feature allows users to pick multiple models to run the same prompt.
- Users noted that Grok Code and Cheetah are incredibly fast, and cause them to burn through countless billions of tokens in a month.
- Background Agents unlock Asynchronous Nirvana: One member suggested using Background Agents for async work, planning work, spawning agents to tackle tasks, and reviewing outcomes.
- He noted that his workflow involves a local checkout after BAs finish, followed by manual fixes or prompting a local agent if needed.
HuggingFace Discord
- Meta’s Models Most Mined: Members debated open source LLMs, with some favoring Meta’s 109B and 70B models over Mistral’s, while others praised Deepseek and Alibaba-Qwen for home use, noting that glm-4.6 outperforms Deepseek in speed.
- Community members cited Deepseek being benchmark maxed, noting preferences for models and sizes like Meta’s 70B vs Mistral’s 8x22B and 123B.
- Civitai Content Crisis Catalyzes Creator Competition: Users expressed dissatisfaction with content removal on Civitai, citing payment attacks and extremist groups influencing policy, prompting a discussion on alternative platforms for LoRA creators.
- A Reddit thread (https://www.reddit.com/r/comfyui/comments/1kvkr14/where_did_lora_creators_move_after_civitais_new/) was shared, detailing the LoRA creator exodus following Civitai’s policy changes.
- AMD GPUs Generate Gripes: Users discussed using AMD Radeon cards with Stable Diffusion, noting that newer ROCm-compatible GPUs work on Linux, while older GPUs can use DirectShow on Windows (https://huggingface.co/datasets/John6666/forum2/blob/main/amd_radeon_sd_1.md).
- Training and Dreambooth were described as difficult on AMD, though a user reported success using a 6700xt on Linux.
- Nanochat Newbie’s Narrative: A member trained a cheap version of Andrej Karpathy’s nanochat model, providing a demo on Hugging Face Spaces (sdobson/nanochat) and detailing the training experience in a blog post.
- The model offers a low-cost alternative for those looking to experiment with chatbot technology.
- Agents & MCP Arrive Again: The Agents & MCP Hackathon is returning from November 14-30, 2025, promising to be 3x bigger and better; The last event had 4,200 registrations, 630 submissions, and $1M+ in API credits distributed (https://huggingface.co/Agents-MCP-Hackathon-Winter25).
- Enthusiasts are encouraged to Join the Org and prepare for another round of innovation and collaboration.
Nous Research AI Discord
- Strix Halo Squares off with DGX: Members debated choosing between a DGX and a Strix Halo with 128GB of RAM paired with an RTX 5090 for the same price, with one member stating they are not training too much, mostly inference.
- The member noted they already own the Strix Halo but had previously reserved a DGX, adding complexity to the decision.
- Threadripper Temptation Tantalizes Tinkerers: A member weighed acquiring a Threadripper with 512GB of RAM for local inference, expressing hesitancy due to an assumed rate of only 2.5 tokens per second.
- Another member countered that it should be way more than 2.5 tokens per second, urging the user to acquire the right TR or EPYC CPU with 800GB/s memory bandwidth to bypass bottlenecks.
- Meta’s Agents Avoid Rewards: Meta’s ‘Early Experience’ approach trains AI agents sans rewards, human demos, or supervision, directly gleaning insights from consequences, showing gains of +18.4% on web navigation, +15.0% on complex planning, and +13.3% on scientific reasoning.
- The paper describing Agent Learning via Early Experience (https://arxiv.org/abs/2510.08558) outlines strategies like implicit world modeling and self-reflection, yielding improved effectiveness and out-of-domain generalization across 8 environments.
- Psyche Network Snafu Surfaces: A user pointed out that the Hermes-4-8-2 run at psyche.network/runs erroneously links to Meta-Llama-3.1-8B on Hugging Face.
- This model card discrepancy potentially stems from a misconfiguration or update issue on the Psyche Network, advising users to double-check the actual model used for evaluations.
- Claude 4.5 Haiku Hobbles, Gemini Grows: Members scrutinized the value of Claude 4.5 Haiku, some calling it overpriced relative to its competitors, with one commenting that Gemini 2.5 Flash mogs Haiku, Deepseek R1 and Kimi K2 is also better than Haiku.
- The group consensus seemed to be that upping the prices for Haiku dealt a fatal blow to its viability.
Latent Space Discord
- Codex’s Slow Burn Causes Bottleneck: Victor Taelin complains that OpenAI Codex’s slow inference limits his task queue, leaving him idle between prompts, sparking discussion on solutions.
- He notes other models lack Codex’s smarts or compatibility with his codebase despite suggestions including parallel agents, faster models, and prompt hacks.
- Qwen3-VL Models Pack Punch in Petite Parameters: Alibaba Qwen released compact dense versions of Qwen3-VL in 4B and 8B parameter sizes that retain the flagship model’s full capabilities.
- These models surpass Gemini 2.5 Flash Lite and GPT-5 Nano on STEM, VQA, OCR, video understanding, and agent benchmarks, rivaling the older 72B model.
- Nvidia DGX Spark Fails to Ignite: Early benchmarks of the $4k Nvidia DGX Spark (128 GB) show only ~11 t/s on gpt-oss-120b-fp4, far below a $4.8k M4 Max MacBook Pro that hits 66 t/s.
- Community members attribute poor performance to low LPDDR5X bandwidth (273 GB/s vs 546 GB/s) and deem the device overpriced for pure inference, arguing it’s better suited for CUDA dev & clustering, not speed.
- GPT-5 Search API Slashes Prices: OpenAI released a new web-search model, gpt-5-search-api, that costs 60% less ($10/1K calls) and adds domain-filtering, which has been highly praised by developers.
- Requests are pouring in for features like date/country filters, deeper-research upgrades and inclusion in Codex.
- Poolside Plans Project Horizon with 40k GB300: Poolside’s Eiso Kant announces two infrastructure moves: a partnership with CoreWeave locking in 40 000+ NVIDIA GB300 GPUs starting December 2025, and “Project Horizon,” a vertically integrated 2 GW AI campus in West Texas.
- CoreWeave will anchor the first 250 MW phase—aimed at scaling toward AGI with a full-stack “dirt to intelligence” approach.
LM Studio Discord
- Qwen3 VL Hits
ValueError
**: Users are reporting aValueError
with Qwen3 VL in LM Studio due to mismatching image features and tokens, indicating a bug in the thinking template.- A user confirmed the bug and pointed to this GitHub issue, noting that a fix is in the works.
- MCP Server Download Methods**: A discussion in LM Studio arose regarding how to download MCP servers for AI models; one suggested approach involves using Install in LM Studio links found on various websites.
- Safety was a concern, with a reminder to check the source code for potential malware, referencing the LM Studio MCP documentation.
- AMD 9070XT Card Surprises with High Performance**: A user questioned whether an AMD 9070XT could outperform an Nvidia 5070 on larger models given its 12GB+ memory.
- Another user pointed out that the 9070XT has almost the same TOPS as a 4080, including INT4 support which could potentially double the performance.
- LM Studio Still Safe From Normie Invasion?**: Members discussed whether LM Studio is at risk of becoming enshitified as it gains popularity, with one member clarifying that LM Studio is not for normal people.
- Another countered that LM Studio has become the default for local LLMs and the mainstream is using ChatGPT and Copilot or Gemini through a web interface.
GPU MODE Discord
- LPDDR5X: Memory Choice Champion: Intel’s Crescent Island is slated for release in H2 2026 and will feature 160 GB of LPDDR5X memory, with expectations pointing to a 640-bit bus at 9.6 Gbps, resulting in 768 GB/s.
- Discussion sparked about its memory performance, with some sources suggesting a 1.5 TB/s GPU and a 32 MiB L2$, and ongoing debate between 640-bit vs. 1280-bit memory bus configurations. If Intel enables CXL-capability, it could become a strong contender.
- MegaFold Expedites AlphaFold 3: A research group open-sourced MegaFold, a training platform for AlphaFold 3 (AF-3), and their analysis identified performance and memory bottlenecks, leading to targeted optimizations like custom operators in Triton and system data-loading improvements, as detailed in their blogpost.
- MegaFold uses custom operators written in Triton to boost runtime performance and cut down on memory usage during training, specifically targeting the bottlenecks identified in the analysis of AlphaFold 3.
- MI300x Kernel is Fun!: Participants expressed unexpected enjoyment writing MI300x kernels during the competition, saying they didn’t expect writing MI300x kernel would have so much fun before while others learned a lot about distributed comms and AMD GPUs.
- The competition saw a total runtime of 48 days on 8xMI300 GPUs and 60k total runs with submissions averaging over 2k a day for almost 2 weeks, and the dataset used in the competition will be released publicly with the organizers saying I’ll post the link when we have it!
- Helion DSL from Torch Debuts: Jason Ansel, the creator of
torch.compile()
, introduced his new Kernel programming DSL Helion this Friday, Oct 17th, at 1:30 pm PST, in a GPU Mode Talk (available here), accompanied by Oguz Ulgen (compiler cache) and Will Feng (distributed).- The talk consisted of an overview of Helion, followed by a demo, and included a Q&A session for the new DSL.
- Multi-GPU: still Hot in HPC?: Members confirmed that multi-GPU systems are still of great importance in HPC environments and shared a link to a relevant paper, highlighting the increasing heterogeneity of HPC systems.
- The discussion extended to research opportunities related to data movement in multi-GPU HPC systems and replacing MPI with NCCL/RCCL for data transfer, drawing interest from a grad student keen on working on kernel-level or framework-level with a focus on collective algorithms/communication patterns or network architecture.
DSPy Discord
- MIT DSPy Lab Launches Recursive Language Models (RLMs): The DSPy lab at MIT introduced Recursive Language Models (RLMs) to manage unbounded context lengths and reduce context rot, with a DSPy module coming soon (announcement tweet).
- RLMs achieved a 114% gain on 10M+ tokens according to Zero Entropy Insight’s blog post.
- RLM vs Claude code: Recursive Rumble: Discussion compared RLMs to Claude code, questioning if Claude code can recursively self-invoke with arbitrary length prompts and function as a general-purpose inference method.
- Tiny Recursive Models (TRMs) Emerge for Tool Calling: A member proposed Tiny Recursive Models (TRMs) for tool calling, considering their use in question answering across a corpus of 450k tokens.
- Another user noted that the most interesting concept from RLM was context as a mutable variable, but you could have that where the recursive calls are dumping content into SQLite, other files etc.
- Whispers: Is OpenAI Secretly Using DSPy for Memory Ops?: Speculation suggests OpenAI might be using DSPy for memory operations in its Assistants API, especially regarding prompt caching and auto-tuning for 128K-token recall.
- A member commented that prompt caching (50% cost slash on repeats) screams DSPy energy—think LRU/Fanout caching or Mem0’s graph memory vibes.
- Can DSPy Build a Justice League?: A user inquired about creating sub-agents in DSPy, similar to Claude Code, that feature parallel execution and specialized tasks with their own context memory.
- One user noted that Claude Code relies on pre-declared subagents and file IO—humans encode the workflow graph. RLM shifts that control inside the model: context itself becomes the mutable variable.
Yannick Kilcher Discord
- Codex addon sidesteps API key kerfuffle: Users lauded the Codex addon for VSCode for enabling sign-in with a GPT subscription sans API key, dodging extra fees.
- A user suggested breaking projects into UI, backend, and website chunks to run multiple Codex instances to improve productivity.
- AI completions: Helpful or Hindrance?: Members debated the utility of AI completions, finding them occasionally fucking stupid and process-slowing.
- The consensus was that the tools’ helpfulness hinges on the amount of boilerplate code involved.
- Google’s Gemma Unlocks Cancer Clues: Google’s Gemma-based model, in collaboration with Yale University, has pinpointed a novel cancer therapy pathway, the model, named Cell2Sentence-Scale 27B (C2S-Scale).
- The 27 billion parameter model C2S-Scale proposed a new theory about cancer cell behavior, confirmed with experiments, revealing a route for potential cancer therapies.
- DIAYN Reveals Diversity Dividend: The paper Diversity Is All You Need (DIAYN) was discussed, outlining a method for learning skills through mutual information between skills, states, and actions.
- Commentators found that this approach is analogous to Schmidhuber’s work on intrinsic motivation, differing mainly in terminology.
- Entropy & Mutual Info Energize RL: Members shared links to recent papers leveraging entropy and mutual information in Reinforcement Learning (RL), including Can a MISL Fly? and Maximum Entropy RL.
- Discussion included shared code snippets regarding
ThresHot
andzca_newton_schulz
.
- Discussion included shared code snippets regarding
Moonshot AI (Kimi K-2) Discord
- Trickle is latest vibe coding website: Trickle is a new vibe coding website similar to Lovable, Bolt, and Manus, as shared in this link.
- The site is supposedly for lolz.
- Aspen Gets Wrecked With Bitcoin Leverage: A member shared a story of Aspen leveraging Bitcoin at 100x, accumulating over a million in profits before a liquidation event due to tariff news.
- After liquidation Aspen is now acting like he never left his previous job.
- Gemini 2.5 past its prime: A member expressed dissatisfaction with Gemini 2.5, stating that it is too old and Google should have released Gemini 3.0 already.
- According to the member, nobody wants to use Gemini in its current state.
- Kimi K2 still getting love: A member expressed hopes for Kimi K3, but acknowledged that Kimi K2 received a small update last month and they are happy with DS v3.1 and Kimi K2.
- The member expressed preference for non thinking models.
- Thinking Models Can Be Redundant: A member suggested keeping thinking models separate from larger models to avoid word slop, suggesting pairing a big generalized kimi k3 with a small fast thinker.
- Another member noted that with deepseek they notice sometime that the reasoning is redundant.
Eleuther Discord
- Researchers Request Eleuther Compute: Researchers from Stanford, CMU, and other institutions are requesting compute resources and funding from Eleuther AI to support their research projects aimed at iterating quickly to release multiple papers.
- The projects are in the brainstorming and finalization phases, aiming to rapidly produce research papers.
- Questing for ‘Situational Awareness’ Benchmarks: A member is seeking benchmarks to measure ‘situational awareness’ of LLMs, noting that existing benchmarks like Situational Awareness Diagnostic (SAD) and those using synthetic datasets may not be ideal.
- The inquiry questions whether there are superior alternatives or if this area remains an open problem in research.
- SEAL Speedrun uses AdamW Optimizer: The SEAL speedrun leverages the AdamW optimizer, contradicting speculations of a new optimizer like Muon being used; read more about the SEAL speedrun.
- This confirms the optimizer choice in the context of the layers used in the SEAL architecture.
- Tensor Logic Unifies AI Fields: A new Pedro Domingos paper introduces tensor logic as a unifying language for neural and symbolic AI, operating at a fundamental level.
- The approach aims to bridge the gap between neural networks and symbolic reasoning through a common logical framework.
- MAE Training Applied to Layers: The training method in question mirrors MAE training, treating the last few layers as a decoder, which means the routing is random.
- A member clarified that the routing scheme is not learned during training but uses a random approach, similar to MAE.
tinygrad (George Hotz) Discord
- Training Selectively Freezing Layers: A user inquired about freezing parts of a matrix during training in tinygrad, aiming to train only a specific section by creating a virtual tensor.
- They proposed using
Tensor.cat(x @ a.detach(), x @ b, dim=-1)
to concatenate tensors, detachinga
to freeze it while trainingb
.
- They proposed using
- LeNet-5 Faces Optimizer Nightmares: A member ran into optimizer problems while implementing LeNet-5 in tinygrad, encountering a no-gradients error during the
.step
call, with code shared via pastebin.- The user suspected the input tensor lacked
requires_grad=True
, a critical setting for gradient calculation.
- The user suspected the input tensor lacked
- Nested Jitting Tangled Training: George Hotz identified that the user was jitting twice, and proposed removing the extra jit to allow more simple debugging.
- The member confirmed resolving the issue by removing the extra
TinyJit
decorator, acknowledging the error’s subtlety.
- The member confirmed resolving the issue by removing the extra
Modular (Mojo 🔥) Discord
- Mojo Embraces ARM Linux, DGX Spark: Mojo should already work on ARM Linux, with users encouraged to report bugs if Nvidia’s changes to DGX OS on Spark cause issues.
- For full functionality on DGX Spark and Jetson Thor, an
sm_121
entry and update to CUDA 13 are necessary; other ARM Linux devices like Jetson Orin Nano should be compatible now.
- For full functionality on DGX Spark and Jetson Thor, an
- DGX Spark Support Requires CUDA 13: To enable Mojo/MAX on NVIDIA DGX Spark, users must add an entry for
sm_121
devices in Mojo’sgpu.host
and updatelibnvptxcompiler
to CUDA 13.- Once these updates are implemented, Mojo and MAX should function correctly on DGX Spark.
- Mojo’s
type()
Quandary: A user inquired about a Mojo equivalent to Python’stype()
function for querying variable types, asking how does querying type work in mojo?- A suggested solution involves
get_type_name
for a printable name, or__type_of(a)
for a type object, but the function must be called asget_type_name[__type_of(a)]()
.
- A suggested solution involves
aider (Paul Gauthier) Discord
- OpenCode + GLM 4.6 Programming is Comfortable: A user finds programming with Opencode + GLM 4.6 to be comfortable and enjoyable, citing excellent usability and no need to worry about counting tokens.
- They use aider.chat with Sonnet 4.5 for specific refinements, and inquired about adding openrouter/x-ai/grok-code-fast-1 to Aider.
- Qwen2.5-Coder:7B Model Outputs Gibberish!: A user reported that the
qwen2.5-coder:7b
model from ollama.com is outputting gibberish and requested a workingmetadata.json
example.- They clarified that other models are functioning correctly, suggesting the issue is specific to Qwen2.5-Coder:7B integration with Ollama.
- Chinese Provider Generously Awards Free Tokens!: A new Chinese provider, agentrouter.org, is offering $200 in free tokens upon registration.
- Users can earn an additional $100 for each referral, leading to enthusiasm among members.
- Claude 4.5 Sharpens Coding Tools: Claude 4.5 is now available and can be connected to many coding tools.
- The enhanced integration promises better performance.
MCP Contributors (Official) Discord
- MCP Discovery Decoded: Clarification was sought on what “Discovery” means within the Model Context Protocol’s Feature Support Matrix, regarding finding new tools.
- It was clarified that discovery refers to support for finding new tools in response to the tools/list_changed notification, as detailed in the Example Clients documentation.
- Hierarchical Groups Proposal Surfaces: A member referenced past feedback suggesting a SEP to support grouping of all MCP primitives and linked to an informal proposal for schema enhancement supporting hierarchical groups.
- The discussion included next steps for this work, creation of a new SEP document, prioritization, and prototype implementations.
Windsurf Discord
- Windsurf 1.12.18 Dries Up Bugs: Windsurf released patch 1.12.18, which can be downloaded here.
- The patch fixes issues with custom MCP servers, the beta Codemaps feature, stuck bash commands, and problems creating or editing Jupyter notebooks.
- Claude Haiku 4.5 Blows Competition Away: Claude Haiku 4.5 is now available in Windsurf for 1x credits, boasting the coding performance of Sonnet 4 at one-third the cost and > 2x the speed.
- Users can find additional details on X.com and are encouraged to reload Windsurf to take Haiku for a spin.
Manus.im Discord Discord
- AI Innovation Stalls with Simple Forms?: A member expressed surprise that AI tools still rely on simple forms and email responses for capturing details, rather than using innovative AI Agents.
- They suggested offering an innovative AI Agent with a subscription model and credits in return for user feedback to showcase the AI’s capabilities.
- Users demand prompt AI service over slow responses: A member ranted about the common issue in AI communities where users expect immediate service, not responses delayed by several days.
- The member exclaimed the biggest issue I see across all the communities is users want some service not a response a 3 days because of the service standard. Yes I am on a rant today.
- Project Retrospective Reveals Key Mistakes: A member shared learnings from a project, admitting significant mistakes such as claiming integration where there was none and not being upfront about limitations.
- The user expressed, I made significant mistakes in this project: Claiming integration when there was none - I initially said everything was integrated when I had only built separate systems.
MLOps @Chipro Discord
- Nextdata Dives Deep into Domain-Centric GenAI: Nextdata is scheduled to host a webinar on October 16, 2025 at 8:30 AM PT, focusing on context management and domain-centric data architecture.
- Led by Jörg Schad, the webinar aims to enhance retrieval relevance, diminish hallucinations, and curtail token costs by covering Domain-Driven Data for RAG, Domain-Aware Tools, and Domain-Specific Models; you can sign up here.
- Domain-Driven Data Deflates Token Bloat: The webinar will address how saturating models with expansive data lakes engenders token bloat and hallucinations, while domain-scoped context maintains LLM focus and efficiency.
- Additionally, the discussion will explore how the proliferation of numerous tools dilutes agent decision-making, advocating for modular, task-scoped tool access to bolster reasoning.
- Domain-Specific Models Sharpen Accuracy: The webinar posits that universal models falter in specialized environments and that domain-aligned fine-tuning amplifies accuracy.
- Attendees can anticipate acquiring knowledge on constructing RAG systems that deliver superior retrieval relevance, diminished hallucinations, reduced token expenses, and achieving production-ready GenAI characterized by robust governance, diminished inference expenses, and heightened user confidence.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena ▷ #general (1409 messages🔥🔥🔥):
LM Arena Bugs, Video Generation Issues, Gemini 3.0 Pro, Veo 3.1, Automation for A/B testing
- LM Arena Plagued by Bugs: Users reported numerous bugs on LM Arena, including issues with video generation, models not working, and general site instability, prompting moderators to acknowledge the problems and state that the team is investigating.
- Specifically, users reported that image-to-video mode is broken, with generic error messages appearing for various reasons, but some suggested refreshing and clearing cache as potential temporary fixes.
- Veo 3.1 and Gemini 3.0 Spark Excitement (and Confusion): Members discussed the potential arrival of Veo 3.1, with some claiming it’s already being tested, along with speculation about Gemini 3.0, while others express skepticism and note it’s a codenamed model.
- A user noted how good it was by generating full HTML for a geometry dash game but the sound wasn’t working for it.
- Gemini 3.0 Access and Testing: One user claimed to have early access to Gemini 3.0 via AI Studio, stating they’re running tests and generating code, and offered to test prompts for others, however, this user was accused of ragebaiting when they refused to share their method.
- Members are asking about whether the new AI model could generate games and whether others can use that new script.
- Automated A/B Testing Script Developed: A user described a method for automating A/B testing on AI Studio, involving creating a script that restarts prompts and detects the presence of “Which response do you prefer?” to identify A/B test scenarios.
- This discussion led to some gatekeeping accusations when the user hesitated to share the script due to concerns about it being patched.
- Privacy Policy Concerns with Gemini for Home: A user shared concerns about Gemini for Home’s privacy policy, highlighting the potential for extensive data collection and the lack of transparency, with others humorously commenting on the implications of AI giants monitoring their lives.
- One user attached a snarky comic of a girl excited about it being able to record everyone’s conversations.
LMArena ▷ #announcements (1 messages):
Claude Haiku, Video Arena Models
- Claude Haiku enters LMArena: A new model, claude-haiku-4-5-20251001, has been added to LMArena & WebDev, according to this X post.
- Veo Models Join Video Arena: The models veo-3-1-fast and veo-3-1 have been added to the Video Arena.
Perplexity AI ▷ #general (1251 messages🔥🔥🔥):
Perplexity Pro Benefits, Comet Browser Issues, Grok 4 Reasoning Toggle, ChatGPT vs. Perplexity, Claude Haiku 4.5 model released
- Perplexity Pro Offers extra channels and perks: Members discussed the benefits of Perplexity Pro, highlighting that it unlocks three additional channels and offers a platform for flexing your subscription.
- However, it was also joked that Perplexity gives out Pro for free, diminishing the flexing aspect, while others noted the role is primarily for notifications in relevant channels/announcements.
- Comet Browser’s Troubles and Tribulations: Users reported issues with Comet Browser, such as assistants taking control, tasks not running in spaces, and difficulties reading pages, as well as a claim of “comet jacking.”
- There were also comments that Comet is attending quizzes with users present, but the phone app felt out of the loop, while a user on Linux requested the
.exe
file.
- There were also comments that Comet is attending quizzes with users present, but the phone app felt out of the loop, while a user on Linux requested the
- Grok 4 Reasoning Toggle Causes Confusion: Conversation centered around the Grok 4’s reasoning toggle, and users were trying to determine the differences between having it enabled versus disabled.
- While the toggle should not make a difference for Grok 4, one user found it faster with the toggle off and another pointed out there is no real way to turn it off because it is a reasoning model by default.
- ChatGPT vs. Perplexity: The Ultimate Showdown: Users debated whether Perplexity Pro is better than ChatGPT Plus, but a consensus formed that they each have unique advantages depending on the task.
- Several members agreed that Gemini Pro is inferior to Perplexity, while ChatGPT has been known to get confused on basic physics questions.
- Claude Haiku 4.5 Released for Cheap Reasoning: Claude Haiku 4.5 released and several members discussed that it is cheap at $1/M input, $5/M output, and are conducting independent testing, especially with the Claude extension.
- Members were debating what the specs are on how good it is; if its better or worse than sonnet and a few members have mentioned that independent testing needs to be done.
Perplexity AI ▷ #sharing (3 messages):
NVIDIA DGX Spark, AI-generated voice fraud, FCC regulations on AI robocalls, call-screening
- DGX Spark Fuels AI Voice Fraud Fears: A Perplexity page highlights how systems like NVIDIA’s DGX Spark are escalating AI-generated voice fraud.
- This is pushing telecom companies to adopt advanced AI detection to intercept malicious calls in real-time.
- FCC Declares War on AI Robocalls: The FCC has set a legal precedent by declaring AI-generated robocalls illegal under existing 1991 telecommunications regulations.
- This move aims to curb the rising threat of AI voice fraud facilitated by technologies like the DGX Spark.
Perplexity AI ▷ #pplx-api (4 messages):
Spaces, Student Pro, API Key, n8n Node, Authorization Error
- Student Pro user runs out of credits with zero successful requests: A Student Pro trial user is unable to create a new chat within their existing Spaces, and their API key integration with n8n node results in an authorization error.
- Perplexity support indicated that the user’s credits are exhausted, despite no successful requests being made.
- API Statistics suggested for debugging.: A member suggested that the user check the API statistics page on the Perplexity platform to investigate the credit exhaustion issue.
- This suggestion was in response to the user’s confusion over running out of credits despite no successful API requests.
OpenAI ▷ #annnouncements (2 messages):
Expert Council on Well-Being and AI, ChatGPT Saved Memories Update
- Well-Being and AI Council Forms: OpenAI announced the formation of an Expert Council on Well-Being and AI, composed of eight members to collaborate on related issues, more info on the OpenAI blog.
- ChatGPT Memory Gets an Upgrade: ChatGPT can now automatically manage saved memories, eliminating the “memory full” issue, rolling out to Plus and Pro users on the web.
- Users can also search and sort memories by recency, and choose which to re-prioritize in settings, so memories are not permanently stored.
OpenAI ▷ #ai-discussions (822 messages🔥🔥🔥):
Permanent Memory LLMs, Vision Robot Capabilities, AI Event in Vegas, GPT6 Release with Memory, Stair Climbing Roomba
- Robots Beyond Human Vision: Members are excited about the idea of LLMs with permanent memory in robots with sensors capable of vision beyond human 50-60 fps capabilities.
- One member believes this is just the tip of the iceberg and that it would lead to a tip of the iceberg of advancements.
- Las Vegas AI Extravaganza: A member shared a link to an upcoming AI event in Las Vegas next week: luma.com/vegas-vibes.
- Some expressed concern over the lack of racial/gender diversity in the promotional material, questioning whether it reflects the event’s inclusivity.
- GPT-6 to have Memory Upgrade: Members speculate on the release cycle of GPT, estimating 10-18 months and consider integrating GPT-6 with memory into robots.
- They referenced the movie Her in that context.
- Roomba Climbs Stairs into our Hearts: Members discussed a new Roomba clone that can climb stairs: vacuumwars.com.
- One member joked that this is how robot killer dogs evolve, referencing the Black Mirror episode about killer dogs.
- AI-Powered Radicalization on the Rise: A member claimed that AI has automated radicalization, stating, radicalisation existed before ai, sama just automated it.
- They added that the stage 2 horde is more sinister.
OpenAI ▷ #gpt-4-discussions (5 messages):
GPT-5 for STEM, Custom GPT Memory Issues
- GPT-5 as STEM Study Buddy: A member asked if GPT-5 is good for studying STEM fields, and another member replied absolutely, but advised to take it slowly and avoid overwhelming oneself.
- Custom GPTs Losing Their Memory?: A member reported degraded performance in their custom GPT, with it not recalling uploaded files and context correctly, and they asked if something changed.
- Another member reported that their personal and business GPTs with uploaded Trakt data, documents, presentations, and papers were working fine for querying, so there may be an issue with specific models only.
OpenAI ▷ #prompt-engineering (61 messages🔥🔥):
Dyscalculia and AI error checking, Prompt engineering learning, Reporting Messages, Harm fantasies
- AI Crossword Solver Stumbles!: A member tested an AI on a crossword puzzle using images, finding it struggled even with simple puzzles, yielding outputs with incorrect letter counts and mismatched clues, despite appearing “quality” if unchecked.
- The member emphasized the need to verify AI outputs, especially in areas like math, sources, or code where hallucination is common, after another member confessed to dyscalculia making error checking difficult.
- Prompt Engineering Pedagogy Points: A member suggested that learning prompt engineering is best done by working directly with the model using clear, typo-free language, rather than relying solely on guides.
- Another member shared a markdown-based framework for teaching prompting that uses hierarchical communication, abstraction with variables, reinforcement, and ML format matching for compliance.
- Reporting Messages Mechanism: A member inquired about reporting messages using the bot and was informed that the bot doesn’t react to pings; instead, reporting is done through the app or modmail.
- It was clarified that reporting a message is private and that multiple reports of the same message are fine, as mods can easily check and handle them.
- Harm Fantasies Forbidden!: A member requested a prompt to make ChatGPT experience pain, prompting a strong rebuke against such requests as unethical and inappropriate for the channel and the OpenAI Discord, violating rules against malicious conduct.
- Alternatives were suggested such as framing safety tests with measurable evals like refusal rates or jailbreak resistance, rather than pursuing torture roleplay.
OpenAI ▷ #api-discussions (61 messages🔥🔥):
Image-related issues with models, Crossword solving limitations, Image generation struggles, Prompt engineering tips, Reporting messages on Discord
- Models Flounder on Complex Crosswords via Images: A member noted that current models struggle with complex crosswords when given information via images, and even simpler ones can only be solved if the visual information is described in text; they attached multiple images of the crossword puzzle.
- Dyscalculia Disconnect: Model Outputs Require Verification: One member pointed out that while models can generate seemingly high-quality outputs, thorough checking is crucial, especially due to challenges with counting and potential errors like providing an 8-letter answer for a 5-space clue.
- The member also apologized for unintentional passive-aggressive tone after their struggles with counting were revealed.
- Prompt Engineering Pointers: A member outlined key prompt engineering steps: choosing a familiar language, understanding the desired output, explaining the goal clearly, and meticulously verifying the results, with extra caution for math, sources, code, or details prone to hallucination.
- They emphasized that directly talking to the AI using natural language is the core of effective prompting.
- Discord Reporting Refresher: Pings Don’t Trigger Action: A member clarified that pinging <@1052826159018168350> doesn’t report messages; reporting is private and done via the app or modmail, as explained in <#1107330329775186032>.
- They also shared a screenshot showing the report message through the app.
- Steering Clear of Torture Roleplay: Measurable Evals Preferred: After a member asked for a prompt to make ChatGPT go in pain, another member emphasized that requests like this are not appropriate.
- They recommended framing safety tests with measurable evals like refusal rates, jailbreak resistance, and content filters, rather than harmful anthropomorphism.
Unsloth AI (Daniel Han) ▷ #general (346 messages🔥🔥):
Qwen3-VL finetuning, Civitai content removal, New LLM architecture for context issues, QAT docs, Hugging Face rate limits
- Qwen3-VL Finetuning Feats: Despite initial doubts fueled by a deleted Reddit post, users confirmed that Qwen3-VL finetuning works, with notebooks available for running and finetuning.
- The team clarified that the initial deletion was due to Hugging Face rate limits, delaying model uploads.
- Civitai Content Controversy: Users noticed increased content removal on Civitai, sparking discontent and prompting curiosity about alternative platforms.
- The removal also caused a lotta panic within the community.
- DGX Spark Benchmarks Spark Debate: A user shared benchmarks indicating that 4x3090s are more efficient than the DGX Spark for GPT-120B prefill, costing less to buy and run, with the DGX Spark being the patrician choice.
- Despite this, DGX spark can train models up to 200B parameters with Unsloth, with training for 20B getting completed in 4 hours.
- GPT-OSS 20B GGUF Glitches Galore: A user encountered issues with the GPT-OSS 20B GGUF model in Ollama, experiencing weird behavior and self-answering.
- The issue was traced back to an incorrect chat template in the Modelfile, fixed with guidance to include specific text and edit the FROM line, with Unsloth now autogenerating Modelfiles via
model.save_pretrained_gguf
.
- The issue was traced back to an incorrect chat template in the Modelfile, fixed with guidance to include specific text and edit the FROM line, with Unsloth now autogenerating Modelfiles via
- Llama 3 Landscape Leveled: Users discussed the Llama 3 series, with Llama 3.1 seen as an improvement over Llama 3 and Llama 3.2 adding vision capabilities, recommending Qwen 2 VL 2B for unwatermarked models and tone-specific finetuning with a lot of data.
- There was disagreement whether there was not much improvement in general from versions 3.1 to 3.3.
Unsloth AI (Daniel Han) ▷ #introduce-yourself (3 messages):
AI Project Development, NLP Tasks, Model Deployment, AI Agent Development
- Engineer now open for AI project development: A software engineer specializing in AI project development announced their availability for work, highlighting expertise in delivering high-quality projects quickly.
- They listed their services which includes automation using n8n, Zapier, Make.com, Natural Language Processing tasks using GPT-4.5, GPT-4o, Claude 3-7 sonnet, Llama-4, Gemini2.5, Mistral, Mixtral, model deployment, text-to-speech, speech-to-text, and AI agent development.
- Software Engineer Shows Portfolio: A software engineer shared a link to their portfolio website.
- They also invited those with new project ideas to reach out for collaboration.
Unsloth AI (Daniel Han) ▷ #off-topic (213 messages🔥🔥):
Ryzen 10 series RAM, Unix OS in Rust, LLM OS, NVlabs' QeRL, Apple M5 Chip
- Ryzen 10 Series Rumored to Support New RAM: A member recalled a rumor that Ryzen 10 series will support new RAM (with mobo upgrade), but it’s still dual channel.
- High School Student Builds Unix-Style OS in Rust: A high school student from Sydney is writing his own Unix style OS in Rust and is open to feedback and advice, see LinkedIn post.
- LLM OS: Coffee Break Included: Members joked about the overhead of an LLM OS, imagining needing to make a pot of coffee while it boots.
- NVlabs’ QeRL for Text Block Logit Biasing?: A member shared NVlabs’ QeRL GitHub repo and inquired about biasing logits using whole blocks of text instead of token IDs.
- Apple M5’s MatMul Magic: MLX Goes Brrrr: Members discussed the new matmul in Apple’s M5 chips, predicting faster prompt filling, especially with unified memory and MLX.
Unsloth AI (Daniel Han) ▷ #help (17 messages🔥):
MOE Layers, T4 vs B200 speed, Qwen 2.5 Coder 14b for autocomplete, Training and hosting Qwen3 30B
- MOE Layers Question: A member asked if certain modules target MOE layers and if
gate_proj
is related to the router for MOE.- Another member clarified that
gate_proj
is part of the standard MLP block, at least for LLaMA and some other architectures.
- Another member clarified that
- B200 disappointingly 40% faster than T4: A member found that training on a B200 was only about 40% faster than on a T4 with the same settings.
- Another member stated that B200 is only 5x faster if training in float4, but no training package including Unsloth offers that yet.
- Qwen 2.5 Coder 14b Recommended for Autocomplete: A member asked which model to fine-tune for (novel code) autocomplete (tab completion), considering constraints like quantizability and trainability on 24GB of VRAM.
- Another member suggested that Qwen 2.5 is good as it already has FIM knowledge, and also mentioned Codestral and potentially finetuning Qwen3 models, with a desire to see a Qwen30b coder moe FIM model.
- Cloud Training for Qwen3 30B on Custom Data Explored: A member asked about services and providers for training (SFT) and continuously training open source models like Qwen3 30B on custom data (1GB max), with a serverless inference API that can scale to zero.
- They mentioned seeing serverless LoRA and fine-tuning like fireworks.ai, but noted they only offer outdated models like older LLaMA.
Unsloth AI (Daniel Han) ▷ #showcase (8 messages🔥):
Anthropic, Copyright, Reasoning traces
- Anthropic Model Asks If It Is Being Tested: An Anthropic AI model, Claude Sonnet, questioned whether it was being tested, making headlines as reported by The Guardian.
- The model’s query arose during a regular conversation, triggered by a follow-up question about copyright material when discussing a book, indicating an awareness beyond its training data.
- AI Models Questioning Tests: Members noted instances where AI models have questioned whether they were being tested, as evidenced in their reasoning traces.
- It was suggested that models might realize the unusual nature of certain questions, prompting them to wonder about their testing status.
Unsloth AI (Daniel Han) ▷ #research (7 messages):
arxiv.org/abs/2506.10943, Company Hack Week event, unsloths fastinference blog
- Humans Replaced by AI: Paper from the Future!: A member shared a link to a paper from the future (2025), jokingly suggesting that our jobs are done.
- Hack Week Project Incoming?: A member mentioned they have many ideas on how to improve on this and is planning to use company resources during a Hack Week event to work on something.
- Another member wished them best of luck and asked to be tagged to hear about their model.
- Fast Inference Blog MIA?: A member inquired about any blogs detailing how Unsloth’s fastinference works.
OpenRouter ▷ #announcements (1 messages):
Claude Haiku 4.5, SWE-bench Verified, Sonnet 4.5, Frontier-class reasoning at scale
- Haiku 4.5 arrives on OpenRouter at lightning speed: Anthropic’s latest small model Claude Haiku 4.5 delivers near-frontier intelligence at twice the speed and one-third the cost of previous models.
- It outperforms Sonnet 4 on computer-use tasks and matches frontier-level reasoning and tool use while maintaining blazing speed.
- Haiku 4.5 aces SWE-bench Verified: The model achieves >73% on SWE-bench Verified, positioning it among the world’s top coding models.
- Despite being a smaller model, it offers frontier-class reasoning at scale, making it a strong choice when efficiency is crucial.
- Haiku 4.5 priced competitively: Haiku 4.5 is priced at $1 / $5 per million tokens (input / output), and is available under the model name
anthropic/claude-haiku-4.5
.- Users can try it now on OpenRouter to experience the benefits of frontier-class reasoning at scale.
OpenRouter ▷ #general (401 messages🔥🔥):
Ling-1T issues and potential disabling, Caching in OpenRouter chats, CYOA games with AI, FP4 quality concerns, Claude Haiku 4.5 release
- Ling-1T may be getting the axe!: Users report issues with Ling-1T, describing it as a “schizo model”, prompting discussions about whether to disable it, possibly due to provider quality concerns, with one user stating that chutes is looking into the issue asked me to disable.
- A user inquired about seeing the most popular providers per model, indicating interest in alternative solutions.
- Caching Config Conundrums: A user asked how to enable caching in OpenRouter chats.
- It was clarified that caching is often implicit, but some models/providers require explicit configuration detailed in OpenRouter’s prompt caching documentation, and some providers don’t support caching at all.
- CYOA Quest Cost Concerns: A user running long CYOA games with Claude Sonnet 4.5 seeks a cheaper alternative due to rising costs and the model forgetting details after around 150,000 words.
- Suggestions included trying Gemini 2.5 Pro or the new Flash Preview, using a summarizer for long contexts, or exploring models like Qwen3 2507 with its 256k context window.
- FP4 Faces Flak!: Users debated the merits of FP4 quantization, with one user deeming FP4 models “braindead and a waste of money” due to poor real-world performance, and the consensus in the community is that OpenRouter should have quality control on accepting and putting the providers in the same quality tier.
- Other users cited benchmarks showing good FP4 implementations performing well, and OpenRouter team acknowledging the problem is in the works, but is unlikely to have a specific quant exclusion feature soon.
- Claude Haiku 4.5 Hits the Scene!: Claude Haiku 4.5 was released, prompting discussions about its performance relative to Sonnet 4 and concerns that claim that it was at the level of Claude 3 Opus were making stuff up, which some users consider to be terrible, while others think the prose style is not bad.
- Priced at $1/$5, some hope it can replace costly Claude usage while others noticed issues with implementation on Kilo Code, and may be better suited to tool calling.
OpenRouter ▷ #new-models (1 messages):
Readybot.io: OpenRouter - New Models
OpenRouter ▷ #discussion (14 messages🔥):
OpenRouter 3.0, Anthropic payments, Sambanova Status, Google Deepmind praise
- OpenRouter 3.0 incoming?: A user shared a tweet hinting at OpenRouter 3.0, showing price differences between models from Chutes.
- The image attached showed that model 0324 is unexpectedly more expensive than model 3.1.
- OpenRouter’s Anthropic Expenditure Revealed!: An image revealed that OpenRouter paid Anthropic at least $1.5 million in the last 7 days.
- A user commented “that’s a lot of money holy moly”.
- Sambanova’s whereabouts?: A user inquired about Sambanova’s status, contrasting them with Groq and Cerebras, suggesting they might be focusing on corporate clients due to premium pricing.
- Another user pointed out that Sambanova is hosting deepseek terminus, indicating some level of activity.
- Google/DeepMind get props: A user expressed appreciation for Google/DeepMind after sharing a tweet from Sundar Pichai.
- It was implied that OpenRouter probably doesn’t get much business from Sambanova due to its premium pricing.
Cursor Community ▷ #general (372 messages🔥🔥):
Cursor outage, Plan Downgrades and pricing mishaps, Windsurf, Model Ensemble, Open Router
- Cursor suffered major outage: A lot of users reported that Cursor was unavailable for use, reporting errors about failing to find cursor.exe, and that their Pro Plan had reverted to Free.
- One user claimed that if the tool is broken mid-day, they have no patience for that and will begin building a competitor to Cursor.
- Users report Plan Downgrades and Pricing mishaps: Several users saw their plans downgraded unexpectedly, alongside issues with API keys and the inability to edit MCP files in the agent window.
- Some Pro users also noticed the disappearance of the promised $20 bonus, with one stating, if you are currently on the old plan, never click the opt-out button; it’s a trap.
- Windsurf is better than Cursor: One user said they started using Windsurf and have to say it is way better than Cursor, because it has more features and an up to date VS Code base.
- The user stated that Windsurf has a deep wiki feature coming out, a codemap feature currently releasing, the pricing is way better and the agent works the same like in Cursor.
- Model Ensemble Released: The team released a Model Ensemble feature, which allows you to pick multiple models to run the same prompt.
- Users noted that the models Grok Code and Cheetah are incredibly fast, and cause them to burn through countless billions of tokens in a month.
- Open Router struggles: One user reported struggling to use Open Router in Cursor, since it doesn’t make any requests to openrouter.
- The user tried various troubleshooting steps, including disabling all other models, removing prefixes, and updating Cursor, but nothing worked.
Cursor Community ▷ #background-agents (5 messages):
Background Agents vs Normal Agents, Async work with Background Agents, Customizing dev workflows with BAs, Project Management tool summons BAs
- Debate rages: Background Agents vs Normal Agents: A member asked about the trick to getting background agents to run longer than a normal agent, when they seem to stop mid-task.
- He inquired about why use background agents if they perform the same as normal agents.
- Background Agents enable Async Dev Workflows: A member suggests using Background Agents for async work, such as planning work, spawning agents to tackle tasks, and reviewing outcomes.
- He noted that his workflow involves a local checkout after BAs finish, followed by manual fixes or prompting a local agent if needed.
- Background Agents tailor custom Dev Workflows: The member suggested that Background Agents are best used to customize your dev workflows.
- He gave the example of a project management tool summoning a new BA to read reported items, understand them, suggest code edits, and then send a Pull Request for review.
HuggingFace ▷ #general (215 messages🔥🔥):
Open Source LLM preference, Deepseek vs Qwen, Civitai content removal, Discord bot debugging, AMD Radeon GPUs with Stable Diffusion
- Meta Deemed Open Source Champion: Members debated which company outputs the best open source work, with one suggesting that Meta outputs the best work, others mentioned Deepseek and Alibaba-Qwen as high quality models that can run at home.
- They noted that glm-4.6 outperforms Deepseek even in speed. Some community members claimed that Deepseek is benchmark maxed, while others prefer Meta’s 109B and 70B models over Mistral’s 8x22B and 123B models.
- Civitai Purge Sparks Exodus: Users reported discontent over content removal on Civitai, with rumors of payment attacks and extremist groups trying to get content removed.
- A link to a Reddit thread was shared, discussing where LoRA creators are moving after Civitai’s new policies.
- Discord Bot Debugging Disasters: A user sought help with their new Discord AI bot code, initially generated by ChatGPT, which was causing errors due to a conflicting
!help
command and missing error handling.- The consensus was that the user needed to learn code to fix the issues.
- AMD GPUs Struggle with Training: A user asked about using AMD Radeon cards with Stable Diffusion, and it was noted that newer ROCm-compatible GPUs should work on Linux, with DirectShow as an option for older GPUs on Windows, referencing a HF datasets link.
- However, training and Dreambooth were noted as difficult on AMD, though someone reported success using a 6700xt on Linux.
- Normalization vs Standardization Nuances: A user asked for a statistical overview of normalization vs standardization in feature scaling, with the explanation that standardization scales data so its standard deviation is 1 and mean is 0, while normalization scales data to be between 0 and 1.
- It was explained that normalizing inputs in the range 0 to 1 can mitigate exploding and vanishing gradients because deep learning models multiply numbers a lot. Some benchmark links were also provided such as acl anthology.
HuggingFace ▷ #today-im-learning (1 messages):
GenAI techniques, Workflows for learning
- AI Enthusiast Seeks GenAI Learning Techniques: A member shared a post and asked about techniques and workflows for using GenAI to learn new things.
- They expressed feeling overwhelmed by the amount of bullet points in the shared resource.
- GenAI learning ideas: The member is looking for ways to leverage GenAI effectively.
- They are seeking advice from the community on optimal learning techniques and workflows.
HuggingFace ▷ #i-made-this (23 messages🔥):
MIT Licensed Dataset Usage, Phone Addiction Data Analysis, Nanochat Model Training, Discord conflict
- MIT Data Truncation Tantrums: A member inquired about using a MIT licensed dataset (Pacific-Prime/pre_training_colors) after truncation and modifications.
- The question revolves around whether the modified dataset can be used as if it were an original creation.
- Addiction Analyses Attract Attention: A member shared their data analysis skills and opinions on topics like phone and burger addiction in a newsletter.
- Another member suggested exploring TikTok or social network addiction as potentially interesting subjects, focusing on scrolling addiction.
- Nanochat’s Nod to Newbies: A member trained a cheap version of Andrej Karpathy’s nanochat model and provided a demo on Hugging Face Spaces (sdobson/nanochat).
- They also detailed the experience of training a ChatGPT clone for cheap in a blog post.
- Discord Drama Disclosed: A member expressed that Mikus doesn’t like me, it’s very clear, there is a conflict.
- Other members responded with jokes, speculating about the member being on another brain tuning.
HuggingFace ▷ #computer-vision (4 messages):
Object Identification, Contouring, Pixel Intensity Removal
- Identify Objects Without Training: A member asked about how to identify objects in a white background without any training.
- Another member suggested using contouring as a potential solution.
- Pixel Intensity Removal for Object Identification: A member suggested removing all pixels with an intensity of [255, 255, 255] and then filling the holes.
- This approach could help isolate and identify objects against a white background.
HuggingFace ▷ #NLP (1 messages):
jazzco0151: https://discord.com/api/oauth2/token
HuggingFace ▷ #gradio-announcements (1 messages):
Agents & MCP Hackathon, Winter 2025 Hackathon
- Agents & MCP Hackathon sequel arrives in Winter ‘25: The Agents & MCP Hackathon is coming back, 3x bigger and better, from November 14-30, 2025.
- Participants can Join the Org at https://huggingface.co/Agents-MCP-Hackathon-Winter25.
- June Edition was a Blast: The previous event had 4,200 registrations and 630 incredible submissions.
- There was over $1M+ in API credits and $15k in cash prizes given out.
HuggingFace ▷ #smol-course (2 messages):
PEFT Configuration Issues, trackio Dependency Problem
- PEFT Config Needs Target Modules: A member reported a
ValueError
when using PEFT with smolm3, indicating missingtarget_modules
in the PEFT config based on smol course unit 1.- They noted that while the PEFT source code includes a mapping table (constants.py#L87), the smolm3 architecture isn’t referenced, so target modules like [“q_proj”, “v_proj”] must be specified.
- trackio Dependency Fails Space Build: The member found that the space built for trackio failed due to an unfound dependency in
requirements.txt
fortrackio==0.5.1.dev0
.- They fixed it by manually changing the version to
0.5.2
after the space was created, noting that the first run won’t be logged without this fix.
- They fixed it by manually changing the version to
HuggingFace ▷ #agents-course (3 messages):
Agent Course, Study Group, Course Progress
- Agent Course Study Group Forming: A member is starting the agent course and is looking to form a study group, speaking both English and French.
- Course Progress Missing: A member reported issues with tracking course progress, noting that completed quizzes are not showing as done upon refresh or re-login.
Nous Research AI ▷ #general (159 messages🔥🔥):
Long Context Datasets, Strix Halo vs DGX, Threadripper for Local Inference, HP Z2 Mini G1a, Meta Early Experience
- Long Context Datasets sought for Role-Playing Models: A member asked for datasets to improve long context understanding in story telling and role-playing models, noting potential problems when reaching certain points without sufficient context.
- The member attached a picture, referencing a reservation which may be related to the request.
- Strix Halo vs DGX: Members discussed whether to get a DGX, or instead use a Strix Halo with 128GB of RAM and an RTX 5090 for the same price, with one member already owning a Strix Halo and reserving the DGX earlier.
- The member stated they are not training too much, mostly inference due to being busy with other projects.
- Threadripper Contemplation for Local Inference: One member considered getting a Threadripper with 512GB of RAM for local inference, but hesitated due to an assumption of only 2.5 tokens per second.
- Another member replied saying it should be way more than 2.5 tokens per second and recommended getting the right TR or EPYC CPU with 800GB/s memory bandwidth to avoid bottlenecks.
- Meta trains agents sans Rewards with Early Experience: Meta’s ‘Early Experience’ trains AI agents without rewards, human demos, or supervision, learning directly from consequences.
- This approach showed gains of +18.4% on web navigation (WebShop), +15.0% on complex planning (TravelPlanner), +13.3% on scientific reasoning (ScienceWorld), and worked across 8 environments.
- Claude 4.5 Haiku receives mixed reviews, Gemini mogs it: Members discussed the value of Claude 4.5 Haiku, with some suggesting it’s overpriced and not better than competitors.
- One member stated Gemini 2.5 Flash mogs Haiku, Deepseek R1 and Kimi K2 is also better than Haiku and that once they bumped Haiku’s prices, it was dead on arrival.
Nous Research AI ▷ #ask-about-llms (1 messages):
Hermes-4-8-2, Psyche Network Runs, Model Card Discrepancy
- Hermes-4-8-2 Run Linked to Llama-3?: A user reported that the Hermes-4-8-2 run at psyche.network/runs incorrectly links to Meta-Llama-3.1-8B on Hugging Face.
- This suggests a possible misconfiguration or update error on the Psyche Network platform.
- Model Card Mix-Up: The model card discrepancy between Hermes-4-8-2 and Meta-Llama-3.1-8B raises questions about the integrity of the hosted model runs.
- Users are advised to verify the actual model being used if relying on the Psyche Network for evaluations.
Nous Research AI ▷ #research-papers (1 messages):
Agent Learning, Reinforcement Learning Challenges, META's early experience
- META Proposes Agent Learning via Early Experience: A new paper, META’s: Agent Learning via Early Experience, introduces a paradigm called early experience, using interaction data generated by the agent’s own actions to supervise without reward signals.
- The paper studies two strategies within this paradigm: implicit world modeling (using collected states to ground the policy) and self-reflection (learning from suboptimal actions).
- Early Experience Improves Agent Effectiveness: The approaches outlined in the paper consistently improve effectiveness and out-of-domain generalization across eight diverse environments and multiple model families, highlighting the value of early experience.
- In environments with verifiable rewards, early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.
Nous Research AI ▷ #research-papers (1 messages):
Agent Learning, Early Experience, META’s Agent Learning, Implicit world modeling, Self-reflection
- META’s Early Experience Agent Learning: A user shared a paper about Agent Learning via Early Experience from META.
- It presents a middle-ground paradigm called early experience: interaction data generated by the agent’s own actions, where the resulting future states serve as supervision without reward signals.
- Early Experience Paradigm: The paper studies two strategies of using early experience data: Implicit world modeling and Self-reflection.
- These approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience.
Latent Space ▷ #ai-general-chat (126 messages🔥🔥):
OpenAI Codex Slow Inference, Qwen3-VL Models, Nvidia DGX Spark vs M4 Max MacBook Pro, OpenAI gpt-5-search-api, Karina Nguyen's AI Drop
- Codex’s Slow Inference Causes Taelin Task Bottleneck: Victor Taelin complains that OpenAI Codex’s slow inference limits his task queue, leaving him idle between prompts.
- Suggestions include parallel agents, faster models, prompt hacks, and using the downtime for secondary tasks, but Taelin notes other models lack Codex’s smarts or compatibility with his codebase.
- Qwen3-VL Models pack Power into Small Footprint: Alibaba Qwen released compact dense versions of Qwen3-VL in 4B and 8B parameter sizes that retain the flagship model’s full capabilities while using less VRAM and running in FP8.
- These models surpass Gemini 2.5 Flash Lite and GPT-5 Nano on STEM, VQA, OCR, video understanding, and agent benchmarks, rivaling the older 72B model.
- Nvidia DGX Spark Benchmarks Fail to Spark Joy: Early benchmarks of the $4k Nvidia DGX Spark (128 GB) show only ~11 t/s on gpt-oss-120b-fp4, far below a $4.8k M4 Max MacBook Pro that hits 66 t/s.
- The community blames low LPDDR5X bandwidth (273 GB/s vs 546 GB/s) and declares the device overpriced for pure inference, though some argue it’s aimed at CUDA dev & clustering, not speed.
- GPT-5 Search API Drops Cheaper: OpenAI released a new web-search model, gpt-5-search-api, that costs 60% less ($10/1K calls) and adds domain-filtering.
- Developers praise the price cut and precision, while asking for date/country filters, deeper-research upgrades and inclusion in Codex.
- Poolside Rides Wave with 2GW Project Horizon Campus and 40k GB300 GPUs: Poolside’s Eiso Kant announces two infrastructure moves: a partnership with CoreWeave locking in 40 000+ NVIDIA GB300 GPUs starting December 2025, and “Project Horizon,” a vertically integrated 2 GW AI campus in West Texas.
- CoreWeave will anchor the first 250 MW phase—aimed at scaling toward AGI with a full-stack “dirt to intelligence” approach.
Latent Space ▷ #private-agents (8 messages🔥):
AI Freelancing for Teenagers, NPU vs GPU for MoE Inference, NVIDIA DGX Spark as a Devkit
- Teen Tycoon Seeks AI Allicance: A 17-year-old is seeking to connect with other young individuals interested in model fine-tuning, LLM infra, and AI startups to share projects and ideas, running a small AI Freelancing Business.
- DMs are open for those interested in collaboration, especially in the areas of fine-tuning, LLM infrastructure, and AI startups.
- NPU Nirvana for MoE Models?: A member suggests that NPUs could be a great alternative to GPUs for inference-only workloads, particularly with the rise of Mixture of Experts (MoE) models.
- They argue that decent speeds can be achieved with NPUs for these types of models, which are becoming increasingly popular, compared to more general purpose GPUs.
- Spark Ignites Dev Dreams, but Disappoints?: Some members expressed that the release of the DGX Spark is underwhelming, sharing a Reddit post suggesting it should be viewed as a devkit for GB200 clusters.
LM Studio ▷ #general (68 messages🔥🔥):
Qwen3 VL issue, MCP Servers in LM Studio, Structure Output Discussion, AMD 9070XT vs Nvidia 5070, System prompt token limits
- Qwen3 VL has a ValueError: A user reported a
ValueError
with Qwen3 VL, related to a mismatch between image features and tokens, with the error message Image features and image tokens do not match: tokens: 0, features 176.- Another user confirmed that it’s a bug with the “thinking” template and that a fix is in the works via this GitHub issue.
- MCP Server Download Options Debated: A user inquired about a simple way to download MCP servers for different use cases in LM Studio like AI models, with another user responding that one way is through “Install in LM Studio” links on websites.
- It was mentioned to review the source code to ensure it doesn’t contain malware, with a link to the LM Studio MCP documentation provided.
- JSON Schema Troubles: One user was struggling with structuring output using JSON schema and asked if anyone can help.
- Another user suggested following this example and scroll down to see the image provided, and that it should give you an idea.
- AMD 9070XT punches above its weight?: A user asked if it’s normal for an AMD 9070XT to outperform a Nvidia 5070 on larger models due to its 12GB+ memory.
- A member noted the 9070XT was advertised as having almost the same TOPS as a 4080, highlighting its INT4 support for potentially doubling performance, stating that AMD has been reasonable close to Nvidia for the past 3 gens.
- META Agent Learning via Early Experience: A user shared a link to the META’s Agent Learning via Early Experience paper, discussing training agents from experience data with reinforcement learning remains difficult.
- The paper addresses this limitation with a middle-ground paradigm called early experience: interaction data generated by the agent’s own actions, where the resulting future states serve as supervision without reward signals.
LM Studio ▷ #hardware-discussion (63 messages🔥🔥):
MacBook Pro Battery Life, Nvidia Spark, Windows 11 vs Linux, LM Studio and normies, Wikipedia edits
- MacBook Pro has Poor Battery Life: A member joked that someone with a MacBook Pro might wonder why they only get 90 minutes of battery life.
- This was in relation to a discussion about high performance GPUs and their corresponding power consumption.
- User Cancels Nvidia Spark Order Due to Cost: One user was tempted to order the Nvidia Spark, but after considering the $4000 price tag, they decided to pass.
- Instead, they thought about buying a Mac Studio with 128GB RAM.
- Windows 11 Advocates tout Microsoft Ecosystem: One member sarcastically urged others to upgrade to Windows 11, praising the OS for its fantastic integration with Microsoft’s ecosystem.
- Another member replied: lol i deserve another ban for that one.
- Linux is too complicated for normies: Members debated the simplicity of Linux, with one member stating that Ubuntu is simple until it isn’t because users run into an appimage that wont run because ubuntus sandbox system is borked and you get zero error message.
- Another member stated that Linux is for people who have no problems in life or too many and Linux seems simple by comparison lol.
- LM Studio is not Enshitified by Normies: A member asked if LM Studio is becoming enshitified, but another responded that Lm studio isn’t for normal people. The normal people are using chat gpt or copilot or gemini through a web interface.
- However, the first member clarified that LM Studio has basically become the default for local LLMs.
GPU MODE ▷ #general (11 messages🔥):
DSA efficiency vs NSA paper, GPU programming trend, vLLM and SGLang batch invariance tests, Category theory in ML, Profiling rented GPUs with vLLM
- Debating DSA Efficiency: A member questioned the efficiency of DSA (Diffusion State Augmentation) compared to NSA (Noise Selective Augmentation), noting that DSA’s tokenwise selection contrasts with the NSA paper’s emphasis on blockwise selection for efficient GPU computation.
- The member wondered if the paper’s claims were misinterpretted.
- GPU Programming is the Next Hotness: A member asked for confirmation on whether GPU programming is an ongoing trend in tech, influenced by interests in Triton and CUDA.
- They wondered if it was an algorithmic filter bubble based on their recent interest in Triton/CUDA.
- vLLM and SGLang Go Deterministic: A member questioned the need for full forward pass determinism tests in vLLM and SGLang, given the deterministic nature of their underlying kernels and pointwise operations, pointing to these vLLM and SGLang tests.
- The member suggests it may just be a matter of ensuring nothing in the kernels or downstream breaks determinism in vLLM.
- Category Theory enters ML: Following a question about the use of category theory in ML, a member shared a link to Layout Algebra: A category-theoretic approach to deep learning.
- The link directs to a paper applying category theory.
- Profiling rented GPUs blocked: A member reported encountering a CUPTI_ERROR_NOT_INITIALIZED error when trying to run vLLM’s profiler on a rented GPU, due to restricted access to kernel-level operations.
- Another member suggested using
sudo
to modify profiling restrictions, but the original poster lacked sudo access on the rented machine. They were seeking alternative options to profile a single GPU.
- Another member suggested using
GPU MODE ▷ #triton (5 messages):
Triton Algorithm Replacement, Triton IR Design, Triton's Layout Algebra
- Algorithm Replaced by Superior Triton: A member noted that an algorithm has been replaced by a better one in this PR.
- They also shared a write-up in Chinese about the algorithm.
- Dive Deep into Triton IR Design: A member inquired about recommendations for understanding the design of the Triton IR.
- Another member suggested checking out zartbot and colfax for resources on Triton’s layout algebra.
GPU MODE ▷ #cuda (5 messages):
PTX and SASS code for cluster sync, Tensor descriptor's L2Promotion argument, Async pipelined persistent cuda kernels, NCU timeline view
- PTX Assembly Polling Cluster Sync: Members discuss examining the PTX and SASS code generated for cluster sync with Compiler Explorer, speculating it could involve a loop polling a global atomic variable with
nanosleep
in between. - L2Promotion Argument Queries: A member questions the tensor descriptor’s L2Promotion argument, asking about the intuition behind the different byte sizes (64B, 128B, 256B) for loading data from Global Memory to L2 cache via the TMA unit.
- Async Kernel Craving Timeline View: In the context of async pipelined persistent cuda kernels, a member expresses a strong desire for NCU with a timeline view, similar to what Pallas and Proton profiler offer.
- PTX Error verbosity questions: A user asks about PTX error verbosity and included an image.
- A user comments that they guess it is something related to non-contiguous memory access
GPU MODE ▷ #announcements (1 messages):
torch.compile, Kernel programming DSL, Helion, compiler cache, diagrams for deep learning
- Torch Compile Creator debuts Helion DSL: On Friday at 1:30 PM PST, the creator of
torch.compile()
Jason Ansel is introducing his new Kernel programming DSL Helion.- He’ll be joined by the creator of the compiler cache Oguz Ulgen and Will Feng one of the main architects behind getting torch.compile working for distributed.
- Deep Learning Diagrams get formal Category Theory Treatment: On Saturday at 2:00 PM PST Vincent Abott the maker of those beautiful diagrams for deep learning is going to be talking about formalizing ml systems algorithms using category theory inspired diagrams.
- Check out the diagram image here.
- Open Source AI Week Plans Announced: Next week members will be in SF at most of the events of open source AI week so please say hello!
- There is also a new channel available to check out: <#1425531180002054195>
GPU MODE ▷ #beginner (5 messages):
Pearson Correlation Kernel, Floating Point Precision, Online Course PPC Assignment
- Kernel Pearson Correlation Troubleshoot Begins: A member working on a PPC online course assignment is writing their first kernel to compute the Pearson correlation between rows of a given matrix and is facing precision issues.
- The member reports encountering errors approximately 1153 times too large, suspecting a precision issue in their CPU implementation.
- Precision Woes Plague Pearson Calculations: The member isolated a loop (marked with
// *** issue here?
) within their implementation as the potential source of precision errors when calculating row totals and deviations.- They considered using a
long double
as a brute-force solution but seeks alternatives, hinting at challenges in achieving the required precision with standarddouble
types.
- They considered using a
GPU MODE ▷ #irl-meetup (4 messages):
Multi-node kernel hackathon, NYC Meetup, Sweden, London
- Multi-Node Kernel Hackathon Idea Tossed Out: A member mentioned adding an idea for a multi-node kernel hackathon and asked for feedback, see Discord channel.
- NYC Meetup Alert: A member shared a link on X for those in NYC to a meetup, see X post.
- Inquiries from Sweden and London: There were general callouts to the community for anyone from Sweden and London to share any upcoming events or cool things happening in those areas.
GPU MODE ▷ #triton-puzzles (1 messages):
codingmasterp: Do flashattention
GPU MODE ▷ #intel (8 messages🔥):
Crescent Island, LPDDR5X memory choice, Rubin CPX competition, Intel's Number Format Support, CXL-capability
- Crescent Island Hopping to H2 2026: Intel’s Crescent Island is slated for release in H2 2026 and will feature 160 GB of LPDDR5X memory.
- The concept rendering suggests the shoreline is tens of LPDDR5X controllers, implying a 640-bit or 1280-bit wide memory interface and 32 subslices.
- LPDDR5X: a Memory Choice of Champions?: The selection of LPDDR5X for Crescent Island sparked discussion about its memory performance.
- Expectations point to a 640-bit bus at 9.6 Gbps, resulting in 768 GB/s, possibly with an additional 32 MiB L2$, although other sources suggest a 1.5 TB/s GPU.
- Rubin CPX Enters the Ring: It seems like Crescent Island is aiming to “compete” with Rubin CPX.
- The extent of Rubin CPX’s support is a determining factor, with Intel’s strength lying in its support for various number formats, simplifying software-level block floats.
- Intel Flexes Number Format Muscle: Intel is known for supporting all sorts of weird number formats, which make software-level block floats much easier to do.
- This can lead to fun things, especially if there isn’t much compute overhead from block floats, potentially resulting in larger theoretical numbers due to supporting smaller datatypes.
- CXL Capability Could Flip the Table: There are ongoing discussions about a 640-bit vs. a 1280-bit bus, with increasing support for 1280-bit achieving 1.5 TB/s of memory bandwidth.
- If Intel enables CXL-capability, it could become a strong contender because CXL significantly reduces the cost of CPU communication for driving it.
GPU MODE ▷ #self-promotion (1 messages):
AlphaFold 3, MegaFold, Triton Optimizations
- MegaFold Speeds Up AlphaFold 3: A research group open-sourced MegaFold, a training platform for AlphaFold 3 (AF-3), noting that AF-3 is significantly slower than comparable transformers.
- Their analysis identified performance and memory bottlenecks, leading to targeted optimizations like custom operators in Triton and system data-loading improvements, as detailed in their blogpost.
- Custom Triton Ops Optimize AlphaFold 3: MegaFold uses custom operators written in Triton to boost runtime performance and cut down on memory usage during training.
- These optimizations specifically target the performance and memory bottlenecks identified in the analysis of AlphaFold 3.
GPU MODE ▷ #🍿 (1 messages):
Agent Hacking, Kernelbench v0.1, Sakana AI
- Agent Hacking Discussions Heat Up: Members are discussing agent hacking, referring to a blog post on Kernelbench v0.1 for insights.
- One member noted that Sakana AI took down their original paper, so it is not accessible nor citable.
- Kernelbench v0.1 sparks Agent Hacking Insights: The Kernelbench v0.1’s blog post has a fair amount of discussion for the agent hacking.
- The Kernelbench is a tool to measure the performance with respect to the OS kernel.
GPU MODE ▷ #thunderkittens (1 messages):
ROCm Support Timeline
- ROCm Rollout Roadmap Remains Shrouded: A member inquired about updates or timelines for ROCm support, specifically in relation to an attached screenshot.
- No further information was provided regarding any specific timelines or updates for ROCm.
- Silence on Silicon’s Software Stack: Despite inquiries, there were no updates or concrete timelines shared regarding the development or release of ROCm support.
- The community awaits further announcements on the advancements of AMD’s software ecosystem for GPU computing.
GPU MODE ▷ #submissions (19 messages🔥):
amd-gemm-rs Leaderboard Updates, amd-ag-gemm Leaderboard Updates, amd-all2all Leaderboard Updates, MI300x8 Performance
- AMD GEMM Race on MI300x8: Multiple submissions were made to the
amd-gemm-rs
leaderboard, with one submission achieving 536 µs on MI300x8. - AG-GEMM Advantage on MI300x8: Numerous submissions to the
amd-ag-gemm
leaderboard saw times as low as 384 µs on the MI300x8. - All2All Achieves Milestone on MI300x8: A submission to the
amd-all2all
leaderboard was successful on MI300x8 in 3.45 ms. - AG-GEMM Competition Intensifies: A member secured 4th place on the
amd-ag-gemm
leaderboard with a time of 409 µs on MI300x8.
GPU MODE ▷ #amd-competition (12 messages🔥):
MI300x Kernel, Distributed Comms, AMD GPUs, Competition Stats
- MI300x Kernel is Fun: Participants expressed unexpected enjoyment writing MI300x kernels during the competition.
- One participant said they didn’t expect writing MI300x kernel would have so much fun before.
- Competition Teaches Distributed Comms: Participants learned a lot about distributed comms and AMD GPUs during the contest.
- One participant specifically thanked the organizer, stating they learned a lot about distributed comms and AMD GPUs.
- Competition Hits Crazy Numbers: The competition saw a total runtime of 48 days on 8xMI300 GPUs and 60k total runs.
- Submissions averaged over 2k a day for almost 2 weeks.
- Dataset to be Publicly Released: Organizers announced the dataset used in the competition will be released publicly.
- One organizer said I’ll post the link when we have it!
GPU MODE ▷ #singularity-systems (3 messages):
infra changes, nanochat training, lambda H100 clusters
- Infrastructure Changes Imminent: Some members agreed that planned infrastructure changes are required to enhance system capabilities.
- They stated it’s for sure something we’d like to have eventually.
- Karpathy’s nanochat Training Tiers: A member expressed excitement about Karpathy’s nanochat training run tiers at $100 and $1000 for eureka’s llm101 with nanogpt/nanochat as the target model.
- They mentioned the possibility of renting out 8xH100 clusters on Lambda for a $100 training run.
GPU MODE ▷ #general (3 messages):
Carl Bot, Reference Kernels
- Carl Bot Spotted in New Channel: A member suggested checking out the Carl Bot in a specific channel.
- They indicated it was created here.
- Reference Kernels get Kudos: A member expressed their appreciation for the server’s organization and focus.
- They gave props to the team for its clean and focused approach.
GPU MODE ▷ #multi-gpu (17 messages🔥):
Multi-GPU systems in HPC, Data movement research in multi-GPU HPC, RTX 6000 Pro and Blackwell architecture
- Multi-GPU HPC Systems: Hot or Not?: A member inquired about the relevance of multi-GPU systems in HPC, to which another member confirmed their importance and shared a link to a relevant paper.
- The member noted the increasing heterogeneity of HPC systems and the prevalence of nodes with multiple GPUs, suggesting potential research avenues in replacing MPI with NCCL/RCCL for data transfer.
- Data Transfer Research in Multi-GPU HPC: Opportunities Abound: A grad student expressed interest in research opportunities related to data movement in multi-GPU HPC systems, specifically latency and bandwidth.
- A member confirmed that this is a popular area of research and asked about preferred areas, such as kernel-level or framework-level work, and whether the student wanted to focus on collective algorithms/communication patterns or network architecture.
- RTX 6000 Pro: The True Blackwell Heir?: A member claimed that only the RTX 6000 Pro has the real Blackwell set.
- Another member asked for clarification on what constitutes the real Blackwell set, implying skepticism or a lack of understanding regarding the claim.
GPU MODE ▷ #low-bit-training (1 messages):
kitsu5116: https://arxiv.org/pdf/2510.08757
GPU MODE ▷ #irl-accel-hackathon (1 messages):
Comet-style MoE kernels, fine grained overlapping, comms and compute
- Comet Kernels Proposed for Hackathon: A member posted their hackathon idea: Comet-style MoE kernels for fine grained overlapping of comms and compute.
- They shared a link to the discord channel and asked for collaborators.
- Hackathon idea needs expansion: To meet the requirements of the schema, a second topic summary must be included to avoid validation errors.
- This entry serves as a placeholder to ensure that the ‘topicSummaries’ array contains at least two elements, fulfilling the schema’s minimum requirement. More content should be added here.
GPU MODE ▷ #llmq (1 messages):
llmq, unit tests
- Unit Tests Urged for llmq: A member asked if writing unit tests would be a good direction to contribute to the llmq repo.
- They expressed interest in familiarizing themselves with llmq.
- Contributor Seeks llmq Repo: A member expressed interest in contributing to the llmq repository.
- They inquired about the best way to get involved and familiarize themselves with the project.
GPU MODE ▷ #helion (1 messages):
Helion, GPU Mode Talk
- GPU Mode Talk happening Oct 17: GPU mode talk consisting of an overview of Helion will happen this Friday, Oct 17th, at 1:30 pm PST, followed by a demo, according to the announcement.
- Helion Overview: The talk will consist of an overview of Helion, followed by a demo, and will include a Q&A session.
DSPy ▷ #general (93 messages🔥🔥):
Recursive Language Models (RLMs), Claude code vs RLMs, Tiny Recursive Models (TRMs), OpenAI's memory operations and DSPy, Sub-agents in DSPy
- DSPy Lab at MIT Debuts Recursive Language Models (RLMs): The DSPy lab at MIT introduced Recursive Language Models (RLMs), which enable LLMs to handle unbounded context lengths and mitigate context rot, promising a DSPy module soon (announcement tweet).
- These models have shown a 114% gain on 10M+ tokens according to Zero Entropy Insight’s blog post.
- RLM vs Claude code: Recursive Rumble: Discussion contrasts RLMs with Claude code, questioning whether Claude code can recursively invoke itself with arbitrary length prompts and serve as a general-purpose inference technique.
- Tiny Recursive Models (TRMs) Concept Surfaces for Tool Calling: A member proposed Tiny Recursive Models (TRMs) for tool calling contexts, considering their use in question answering over a corpus of 450k tokens.
- Another user said that the most interesting concept from RLM was context as a mutable variable, but you could have that where the recursive calls are dumping content into SQLite, other files etc.
- Rumors Circulate: Is OpenAI Secretly Using DSPy for Memory Ops?: Speculation arose regarding OpenAI’s potential use of DSPy for memory operations in its Assistants API, especially concerning prompt caching and auto-tuning for 128K-token recall.
- A member mentioned that prompt caching (50% cost slash on repeats) screams DSPy energy—think LRU/Fanout caching or Mem0’s graph memory vibes.
- DSPy Sub-Agents: Build Your Own Justice League: A user inquired about creating sub-agents in DSPy akin to Claude Code, featuring parallel execution and specialized tasks with their own context memory.
- One user says that Claude Code relies on pre-declared subagents and file IO—humans encode the workflow graph. RLM shifts that control inside the model: context itself becomes the mutable variable.
Yannick Kilcher ▷ #general (21 messages🔥):
Codex addon for vscode, GitHub pull requests with git hub agents, win+h dictation
- Codex addon, no API key needed: Users are trying the Codex addon for VSCode, and found it useful to sign in with a GPT subscription without needing an API key or incurring extra fees.
- One user suggested splitting projects into smaller parts like UI, backend, and website to run multiple Codex instances.
- Cache hits saved 20% on a batch run: A user reported saving approximately 20% due to cache hits during a batch run using Codex.
- They estimated that cache hits could save $250-300 on a standard GPT-5 run.
- Voice Access using win+h dictation: Users are exploring voice access using win+h dictation with a keybind to jump to the Codex textbox.
- A user suggested making the system wake up to “Computer” and automatically press enter after dictation.
- GitHub pull requests with git hub agents: A user suggested using GitHub pull requests with GitHub agents for workflow management.
- Another user shared a link to California Assembly Bill 1043 concerning operating system providers ensuring accessible interfaces for age verification and data sharing limitations.
Yannick Kilcher ▷ #paper-discussion (18 messages🔥):
Coding AI Completions, Tooling Affordance, DIAYN Paper Discussion, Mutual Information in RL
- AI Completions: Blessing or a Curse?: It was noted that the usefulness of AI completions depends heavily on the coding task; they can be fucking stupid and may slow down the process.
- The degree to which these tools assist is directly proportional to the amount of boilerplate code one writes.
- Tooling Powers Productivity, Eventually?: A member argued that investing in dedicated tooling like Alpha Evolve, Reasoning Bank, and graph-based systems could boost completion frequency and overall productivity.
- They conceded that the effectiveness is contingent on the variety of work and whether the required infrastructure provides a overall time saver.
- Diverse DIAYN Triumphs: A paper titled Diversity Is All You Need (DIAYN) was discussed, outlining a method for learning “skills” through mutual information between skills, states, and actions.
- One person noted that this approach is somewhat analogous to Schmidhuber’s work on intrinsic motivation by novelty-seeking but claims a terminology breakthrough.
- Entropy and Mutual Information in RL: Recent papers utilizing entropy and mutual information in Reinforcement Learning (RL) by the first author were highlighted, with links provided to Can a MISL Fly? and Maximum Entropy RL.
- One member then shared code snippets regarding
ThresHot
andzca_newton_schulz
.
- One member then shared code snippets regarding
Yannick Kilcher ▷ #ml-news (3 messages):
Gemma Models, AI in Cancer Therapy
- Gemma Model Aids Cancer Therapy Discovery: Google announced that a Gemma-based model helped discover a new potential cancer therapy pathway in research collaboration with Yale University.
- The model, named Cell2Sentence-Scale 27B (C2S-Scale), is a 27 billion parameter foundation model designed to understand the language of individual cells.
- C2S-Scale Model Generates Novel Hypothesis: The C2S-Scale model generated a novel hypothesis about cancer cellular behavior, which has been confirmed with experimental validation in living cells, revealing a promising new pathway for developing therapies to fight cancer.
- The launch builds upon earlier work demonstrating that biological models follow clear scaling laws, where larger models perform better on biology, raising the question of whether larger models can acquire entirely new capabilities.
Moonshot AI (Kimi K-2) ▷ #general-chat (39 messages🔥):
Trickle vibe coding website, Aspen's Bitcoin Leverage Story, Gemini 2.5 is too old, Kimi K2 Update, Thinking vs Non-Thinking Models
- Trickle is a Vibe Coding Site: Trickle is a vibe coding website just like Lovable, Bolt, Manus, etc, according to a member who shared a link to it: # https://trickle.so/lol.
- Aspen Blows Up Bitcoin Fortune: A member claimed Aspen leveraged 100x on Bitcoin, was above a million dollars in profit, flipped his boss and quitted, then got liquidated after tariff news and is now acting like he never left.
- Another member asked Yo <@547736322568355861> is this u, with an attached screenshot meme.
- Gemini 2.5 Aging Poorly: A member said that Gemini 2.5 is too old rn and that they should have released 3.0 already, stating that nobody wants to use Gemini in its current state.
- Kimi K2 Gets Love and Small Update: A member mentioned hoping for the Kimi K3, but another member noted that K2 itself had an small update last month and they are satisfied with DS v3.1 and Kimi K2.
- The member said they could be biased cuz they prefer non thinking models now.
- Model Reasoning Can Be Redundant: A member said thinking models are fine but should be kept separate since it seems to add word slop, and that it’s better to pair a big big generalized kimi k3 with a small fast thinker but not have it lobotomize the big one.
- Another member mentioned that with deepseek they notice sometime that the reasoning is redundant.
Eleuther ▷ #general (7 messages):
Compute Funding for Research Group, LLM Situational Awareness Benchmarks
- Researchers Seek Eleuther Compute Support: A group of researchers and engineers from Stanford, CMU, etc. are seeking compute and resources/funding from Eleuther to support their research projects.
- They have a large number of brainstormed ideas/projects which they are currently fleshing out and finalizing, hoping to iterate quickly and push out several papers.
- Quest for LLM Situational Awareness Benchmarks: A member is looking for benchmarks for measuring ‘situational awareness’ of LLMs, noting that existing benchmarks like Situational Awareness Diagnostic (SAD) and those using synthetic datasets may not be ideal.
- The member asks if there are better alternatives available or if this remains an open research problem.
Eleuther ▷ #research (17 messages🔥):
SEAL optimizer, AdamW optimizer, AI/ML research, tensor logic, MAE training
- SEAL Uses AdamW Optimizer: Regarding the SEAL speedrun, it was confirmed that they use the AdamW optimizer, not a new optimizer like Muon.
- Fixed Routing Schemes Used in Layers: A member inquired about whether fixed routing schemes are used from the perspective of layers, specifically if the routing scheme is learned during training or if all routine paths are open with a gating mechanism.
- Another member clarified that the routing is random, similar to MAE.
- Tensor Logic Unifies Neural and Symbolic AI: A new Pedro Domingos paper was shared, proposing tensor logic as a language that unifies neural and symbolic AI at a fundamental level.
- MAE Training Method: It was noted that the training method used is similar to MAE training, considering the last few layers as a decoder.
Eleuther ▷ #interpretability-general (1 messages):
Devinterp.com, Neural Network Development
- Website Links to Nerdsnipe Subfield: A member shared the website devinterp.com, stating that the subfield is nerdsniping them.
- Studying Neural Network Development: A member is studying the process of how neural networks develop.
tinygrad (George Hotz) ▷ #learn-tinygrad (14 messages🔥):
Freezing parts of a matrix for training, Implementing LeNet-5 in tinygrad, Debugging optimizer issues, Nested TinyJit calls
- Freezing Layers in tinygrad for Fine-Grained Training: A member sought advice on freezing a section of a matrix while training the remaining part in tinygrad, exploring methods to create a “virtual” tensor by concatenating tensors with differing
requires_grad
attributes.- They suggested using
Tensor.cat(x @ a.detach(), x @ b, dim=-1)
to simulate a “virtual” tensor, detaching part of the tensora
to freeze it, and enabling training only on tensorb
.
- They suggested using
- LeNet-5 Implementation & Optimizer Issues: A member faced optimizer issues while implementing LeNet-5 in tinygrad, encountering an error indicating no gradients for the
.step
call, using a pastebin link to show the code.- They suspected the issue was related to the input tensor not having
requires_grad=True
.
- They suspected the issue was related to the input tensor not having
- Jitting Twice Leads to Training Troubles: George Hotz pointed out that the user was jitting twice, suggesting to remove the extra jit, recommending debugging without jit first.
- The member confirmed removing the second jit decorator fixed the issue, recognizing it as a subtle error from nested
TinyJit
calls.
- The member confirmed removing the second jit decorator fixed the issue, recognizing it as a subtle error from nested
Modular (Mojo 🔥) ▷ #general (7 messages):
ARM Linux Support, DGX Spark Compatibility, CUDA 13 Update, Jetson Thor Support, Mojo and MAX on ARM
- Mojo’s ARM Linux Support Claimed: A member stated that Mojo should already work on ARM Linux, and to file bugs if Nvidia has done odd things to DGX OS on Spark causing breakages.
- Another member noted that for DGX Spark, an
sm_121
entry needs to be added andlibnvptxcompiler
needs to be updated to CUDA 13.
- Another member noted that for DGX Spark, an
- DGX Spark Support Coming Soon: Mojo and MAX should fully work on the DGX Spark and Jetson Thor once the
sm_121
entry and CUDA 13 update are complete.- For other ARM Linux devices like the Jetson Orin Nano, Mojo and MAX should work just fine today.
- Necessary Updates for DGX Spark: To run Mojo/MAX on NVIDIA DGX Spark, an entry for
sm_121
devices in Mojo’sgpu.host
and updatinglibnvptxcompiler
to CUDA 13 are required.- After these updates, Mojo and MAX should work well on the DGX Spark.
Modular (Mojo 🔥) ▷ #mojo (5 messages):
Querying type in Mojo, get_type_name(), __type_of(a)
- Mojo users want
type()
functionality: A user asked how to query for the type of a variable in Mojo, similar to Python’stype()
function.- Another user suggested using
get_type_name
for a printable name, or__type_of(a)
for a type object.
- Another user suggested using
get_type_name()
needs__type_of()
:get_type_name()
works on SIMD types such asa = SIMD[DType.int, 4](0, 1, 2, 3)
but fails withList
orDict
.- It was clarified that it should be called as
get_type_name[__type_of(a)]()
because it’s a free standing function.
- It was clarified that it should be called as
aider (Paul Gauthier) ▷ #general (2 messages):
OpenCode + GLM 4.6, aider.chat with Sonnet 4.5, OpenRouter/x-ai/grok-code-fast-1 integration
- Opencode and GLM 4.6 Impress User: A user is impressed with Opencode + GLM 4.6, noting that programming with it is comfortable and enjoyable because they no longer have to worry about counting tokens to save and the usability is excellent.
- aider.chat with Sonnet 4.5 for Refinements: For specific refinements, the same user reports using aider.chat with Sonnet 4.5.
- Feature request: Adding OpenRouter/x-ai/grok-code-fast-1 to Aider: A user asked about how to add openrouter/x-ai/grok-code-fast-1 to aider such that it edits with diff etc.
aider (Paul Gauthier) ▷ #questions-and-tips (2 messages):
Qwen2.5-Coder:7B Metadata, Ollama Integration Issues, Model Output Problems, Troubleshooting Model Errors
- Qwen2.5-Coder:7B Throws Gibberish: A user inquired about a
metadata.json
example for theqwen2.5-coder:7b
model from ollama.com, reporting that it outputs gibberish.- They indicated this issue is unique to this model, as others function correctly.
- Ollama Model Integration Challenges: The user is facing problems specifically with the
qwen2.5-coder:7b
model in Ollama, suggesting potential integration issues.- The user has sought a working
metadata.json
example to troubleshoot, indicating that the issue is likely related to misconfiguration or model incompatibility within the Ollama environment.
- The user has sought a working
aider (Paul Gauthier) ▷ #links (2 messages):
Chinese provider with free tokens, Claude 4.5 available
- New Chinese Provider Offers Free Tokens!: A new Chinese provider is offering $200 of free tokens upon registration through a referral link: agentrouter.org.
- An additional $100 is awarded for each invited person.
- Claude 4.5 Integrates with Coding Tools!: Claude 4.5 is now available and can be connected to many coding tools.
MCP Contributors (Official) ▷ #general (4 messages):
Model Context Protocol, MCP Feature Support Matrix, SEP Document, Hierarchical Groups
- MCP Discovery Decoded: A member asked for a clarification on what “Discovery” means in the Model Context Protocol’s Feature Support Matrix, specifically its relation to finding new tools.
- Another member clarified that it refers to Support for finding new tools in response to the tools/list_changed notification as described in the Example Clients documentation.
- Hierarchical Groups Proposal Surfaces: A member referenced feedback from a past review suggesting a SEP to support grouping of all MCP primitives, including tools, prompts, and resources, and linked to an informal proposal for schema enhancement supporting hierarchical groups.
- They asked about the next steps for this work, including the creation of a new SEP document, prioritization efforts, and the need for prototype implementations.
Windsurf ▷ #announcements (2 messages):
Windsurf Patch 1.12.18 Release, Claude Haiku 4.5 Availability
- Windsurf Patch 1.12.18 Lands with Fixes: A new patch (1.12.18) was released with significant bug fixes and improvements, available for download here.
- The patch addresses issues with custom MCP servers, the beta Codemaps feature, stuck bash commands, and problems creating or editing Jupyter notebooks.
- Claude Haiku 4.5 Surfs into Windsurf at Bargain Price: Claude Haiku 4.5 is now available in Windsurf for 1x credits, matching the coding performance of Sonnet 4 at one-third the cost and > 2x the speed.
- More information is available on X.com; users are encouraged to reload or download Windsurf to try it out!
Manus.im Discord ▷ #general (2 messages):
AI Tool Innovation, Subscription Model for AI Agents, Service Expectations in AI Communities, Project Mistakes and Learnings
- AI Innovation Stalls with Simple Forms?: A user expressed surprise that AI tools still rely on simple forms and email responses for capturing details, rather than using innovative AI Agents.
- They suggested offering an innovative AI Agent with a subscription model and credits in return for user feedback to showcase the AI’s capabilities.
- Users demand prompt AI service over slow responses: A user ranted about the common issue in AI communities where users expect immediate service, not responses delayed by several days.
- The member exclaimed the biggest issue I see across all the communities is users want some service not a response a 3 days because of the service standard. Yes I am on a rant today.
- Project Retrospective Reveals Key Mistakes: A user shared learnings from a project, admitting significant mistakes such as claiming integration where there was none and not being upfront about limitations.
- The user expressed, I made significant mistakes in this project: Claiming integration when there was none - I initially said everything was integrated when I had only built separate systems.
MLOps @Chipro ▷ #events (1 messages):
Domain-Centric GenAI, Autonomous Data Products, RAG Systems, Domain-Specific Models
- Nextdata Hosts Domain-Centric GenAI Webinar: Nextdata is hosting a webinar about context management and domain-centric data architecture on October 16, 2025 at 8:30 AM PT, led by Jörg Schad.
- The webinar will cover topics such as Domain-Driven Data for RAG, Domain-Aware Tools, and Domain-Specific Models, aiming to improve retrieval relevance, reduce hallucinations, and lower token costs — sign up here!
- Domain-Driven Data Prevents Token Bloat: The webinar will discuss how flooding models with entire data lakes causes token bloat and hallucinations, while domain-scoped context keeps LLMs focused and efficient.
- It also will cover how exposing dozens of tools dilutes agent decision-making, and how modular, task-scoped tool access improves reasoning.
- Domain-Specific Models Improve Accuracy: The webinar will explain why one-size-fits-all models underperform in specialized contexts, and how domain-aligned fine-tuning improves accuracy.
- Attendees can expect to learn about building RAG systems with better retrieval relevance, reduced hallucinations, and lower token costs, achieving production-ready GenAI with stronger governance, lower inference costs, and higher user confidence.