Back to issues

Congrats Ant team!

AI News for 9/2/2025-9/3/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (186 channels, and 2882 messages) for you. Estimated reading time saved (at 200wpm): 239 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

This was widely rumored, but the final valuation came in higher. Some notable numbers from their announcement:

In Jan 2025, Anthropic at about a $1B run-rate. By August 2025, it crossed $5B (36.6x current ARR multiple, but 20x EOY25 multiple)
Claude Code was GA in May 2025 - usage grew >10x in last three months and now is $500m in run-rate revenue (we called this out in June)
Anthropic now serves over 300,000 business customers, and our number of large accounts—customers that each represent over $100,000 in run-rate revenue—has grown nearly 7x in the past year.

Congrats Anthropic!

AI Twitter Recap

Agentic systems: enterprise connectors, new evals, and reliability

Mistral Le Chat adds 20+ MCP connectors and “Memories.” Le Chat now plugs into Stripe, GitHub, Atlassian, Linear, Notion, Snowflake (coming soon), and more, with fine-grained access controls and persistent, user-editable memory. This turns Le Chat into a single surface for cross-SaaS action and retrieval, while remaining enterprise-manageable. See the launch thread from @MistralAI and Stripe’s demo by @emilygsands.
Benchmarking agents:
- Artificial Analysis updated its Intelligence Index (V3) to include Terminal-Bench Hard and τ²-Bench (Telecom). GPT‑5 leads, with o3 close behind; xAI’s Grok Code Fast 1/Grok 4 and Claude/Kimi/gpt-oss families perform well on tool calling/agent tasks. Details: @ArtificialAnlys, follow-up 1, 2.
- MCP‑Universe (Salesforce) evaluates agents across 231 real-world tasks using actual MCP servers (Google Maps, GitHub, Yahoo Finance, Playwright, etc.) with code-based evaluators. Top model achieves 43.7% success; performance is highly domain-specific; “more tools” can hurt. Links: @_philschmid, paper/leaderboard.
- TAU Bench caveat: a no-tool SFT baseline can beat Qwen3‑4B in the Airline domain by being sycophantic; fix proposed to restore tool-use signal: @_lewtun, follow-ups, 2.
  
  Reliability tooling: Galileo’s agent evals (real-time guardrails, Luna‑2) target production reliability and cost, which Gartner predicts will sink 40% of projects by 2027: @omarsar0, 2, 3. Also see the “xpander” agent backend (memory, tools, state, guardrails; self-hostable): @_avichawla, repo.
  
  Finally, OpenPipe published a recipe to train a deep research agent via RL that beats Sonnet‑4 on DeepResearch Bench in ~~30 hours on an H200 (~~$350): @corbtt, follow-up.

High‑performance RL and inference: Slime v0.1.0, ZeroGPU AoT, symmetric all‑to‑all, and 4‑/8‑bit

Zhipu/THUDM open-sourced Slime v0.1.0, the RL infra behind GLM‑4.5. Highlights: FP8 rollout, DeepEP, multi‑token prediction, speculative decoding, unified tensor offload via CUDA VMM (LD_PRELOAD hijack of cudaMalloc/free), CPU Adam, Megatron + DeepEP support, GSPO for MoE. Result: GLM‑4.5 (355B‑A32B) decoding improved from <10 to 60–70 tok/s; used in 8‑node GLM‑4.5 and 16‑node DeepSeek‑R1 training. Clever NCCL teardown to reclaim memory; fixes for DeepEP overlap edge cases. Deep dive: @ZhihuFrontier, feature checklist.
PyTorch symmetric memory + custom all‑to‑all: intranode all2all can be up to ~1.9× faster on H100s with symmetric memory and low‑contention routes vs defaults; large gap in stock PyTorch surfaced by @cloneofsimo and thread update, with discussion from @giffmana.
ZeroGPU AoT compilation (Hugging Face Spaces): Ahead‑of‑time compiling models before deploy shrinks cold starts and improves throughput (reported 1.3–1.8× for FLUX/Wan). Blog + examples: @RisingSayak, 1, 2. Integrated into anycoder demos: @_akhaliq, app.
Precision/efficiency notes: NVIDIA’s NVFP4 4‑bit training ablations stirred discussion (@eliebakouch, follow-up); INT4 Seed‑OSS model reports “no accuracy loss” with vLLM inference (@HaihaoShen).
Adaptive LLM routing under budget constraints frames router design as a contextual bandit to optimize quality per cost, supporting user‑budget policies: @omarsar0, paper.

Model releases and capabilities

Microsoft’s rStar2‑Agent (14B, agentic RL) achieves frontier‑level math/tooling performance using GRPO‑RoC and a multi‑stage SFT→RL recipe; trained on 64 MI300Xs for 510 RL steps. Scores: AIME24 80.6%, AIME25 69.8%, beating DeepSeek‑R1 (671B). Code: @iScienceLuvr, repo/abs.
Hermes 4 open‑weight reasoning (Nous): 70B/405B (Llama‑3.1 bases) with hybrid explicit thinking (…), assistant‑only loss, long trajectories (up to 16k), tool‑aware formatting, strong math/code/alignment, and refusal dynamics. Dense training details and infra (TorchTitan/FSDP/TP, Flex Attention, DataForge). Summary: @gm8xx8.
Tencent Hunyuan‑MT‑7B (translation) and Hunyuan‑MT‑Chimera (ensemble), supporting 33 languages including 5 Chinese minority languages; demos on HF/Gradio: @_akhaliq, demo, plus @SOSOHAJALAB.
Small VLM: R‑4B (Apache‑2.0) claims SoTA small vision‑LM with reasoning; Transformers integration with custom code: @mervenoyann, model.
Video/AV: AUSM (Autoregressive Universal Video Segmentation) ties LLM‑style AR pipelines to streaming video perception: @miran_heo. VibeVoice (long‑form TTS via next‑token diffusion) generates up to 90 minutes of 4‑speaker dialogue in a 64k window with 80× compression vs Encodec and strong coherence: @TheTuringPost.

Data, toolchains, and developer updates

Jupyter Agent Dataset (Hugging Face): 2B tokens from 51k Kaggle notebooks + 7TB datasets, with real code‑execution traces (Qwen3‑Coder + E2B); substantially improves code execution/data analysis skills. Launch: @a_yukh, recap: @maximelabonne.
LangChain/LangGraph 1.0 alpha (Py/JS): LangGraph remains the low‑level agent orchestration substrate; LangChain 1.0 refocuses around a central agent abstraction and standardized content blocks, keeping model/vendor portability. Announcement: @LangChainAI, @hwchase17.
Vector/routing and on‑device: Qdrant adds post‑search relevance re‑scoring (freshness/proximity/decay functions) for business logic alignment (1, 2); ChromaSwift (beta) brings retrieval to iOS with on‑device MLX embeddings and persistence: @trychroma.
Code execution ergonomics: Anthropic API added bash, view/create/str_replace primitives, Seaborn/OpenCV, and extended container lifetime to 30 days, cutting tokens and enabling richer workflows: @alexalbert__, update.
One‑liners: Chainlit remains a fast UI scaffold for LLM chats (@rasbt); Google’s Gemini URL Context fetches and processes up to 20 URLs inline with no extra tool pricing (@LiorOnAI).

Industry/platform moves

Anthropic raised $13B at a $183B post‑money valuation led by ICONIQ, citing capacity expansion, model capability, and safety research: @AnthropicAI.
OpenAI: acquired Statsig; founder @vijayeraji becomes CTO of Applications (ChatGPT/Codex). @kevinweil launches “OpenAI for Science” to build an AI-powered scientific instrument; role note. Realtime API continues to mature (tips); @weights_biases added DeepSeek V3.1 and gpt‑oss‑20B/120B to OpenRouter via W&B Inference.

Research highlights

Diffusion Language Models can “early commit.” On GSM8K/MMLU, correct answers can be identified by half the refinement steps (97%/99% of cases). Prophet is a training‑free fast‑decoding scheme that decides when to stop sampling: @iScienceLuvr, abs.
AHELM (audio‑language eval). Holistic ALM benchmark across 10 aspects (perception, reasoning, fairness, multilinguality, toxicity, etc.), with new PARADE and CoRe‑Bench. Gemini 2.5 Pro leads 5/10 but shows group unfairness in ASR: @iScienceLuvr, abs/site.
DyT: Transformers without normalization layers (Dynamic Tanh replaces LayerNorm/RMSNorm) claim SOTA across vision, language, speech in reported settings: @LiorOnAI, abs/code.
Goldfish Loss: randomly drop tokens from the cross‑entropy loss to mitigate memorization while preserving downstream performance; potentially useful for exploration in low‑data reasoning RL: @vikhyatk, paper.
STREAM: a checklist for transparent ChemBio safety eval reporting (e.g., human baselines), to make peer review tractable: @lucafrighetti, context.

Top tweets (by engagement)

@AnthropicAI: Raised $13B at $183B valuation (5486).
@kevinweil: Launching OpenAI for Science (1967).
@MistralAI: Le Chat adds 20+ MCP connectors and Memories (1294).
@GeminiApp: New image editing “nano‑banana” trend, figurine-style transforms (4586).
@ArtificialAnlys: Intelligence Index V3 adds agentic benchmarks (577).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. TerminalBench Multi-Agent Coder & German WWTBAM Benchmark

My weekend project accidentally beat Claude Code - multi-agent coder now #12 on Stanford’s TerminalBench 😅 (Score: 567, Comments: 42): Open-source multi-agent coding system with an Orchestrator (no direct code I/O), Explorer (read/run-only), and Coder agents plus a persistent Context Store for shared “knowledge artifacts” reached 36.0% on Stanford/Laude TerminalBench with Claude Sonnet-4 (rank #12, ahead of Claude Code) and 19.25% with Qwen3-Coder-480B; Sonnet-4 used 93.2M tokens vs Qwen’s 14.7M. Orchestrator enforces explicit delegation, adaptive trust (high autonomy on simple tasks, iterative decomposition on complex), and per-agent toolsets; artifacts are stored and injected into subsequent subagent contexts. Full code, prompts, and configs are open-sourced: Danau5tin/multi-agent-coding-system. Commenters propose testing alternative fast/cheap models (e.g., grok-code-fast-1, “gpt5-mini”) and question the choice of YAML for tool calls versus more standard JSON or Qwen3-Coder’s XML schema; there’s also support for transparent, local-model-friendly open-source agentic tooling.
- Benchmark results cited: Orchestrator + Sonnet-4 at 36.0% success (ranked #12 on TerminalBench, ahead of Claude Code) vs Orchestrator + Qwen-3-Coder at 19.25%. Suggestions to trial grok-code-fast-1 and gpt5-mini for improved latency/cost, noting they may be “fast af and cheap compared to cc” relative to Claude Code.
- A technical question challenges the choice of using YAML for tool calls instead of JSON (the typical function-calling schema) or the newer XML patterns attributed to Qwen-3-Coder. This raises issues around parser determinism, ecosystem/tooling compatibility, and adherence to established structured I/O conventions across model providers.
- A productionization concern asks how to move from benchmark wins to real projects while controlling inference spend, given Sonnet runs reportedly chewing through 90M+ tokens. The thread probes budgeting strategies and whether the multi-agent orchestration can cap tool chatter and token burn for day-to-day coding workloads.
German “Who Wants to Be a Millionaire” Benchmark (Score: 411, Comments: 54): OP releases a German “Who Wants to Be a Millionaire” benchmark with 45 rounds × 15 questions (exit on first wrong answer per round, winnings retained, no lifelines) and posts a results table comparing mostly local, quantized (Q4_K_M) LLMs run on a Framework Laptop 13 (Ryzen 5 7640U, 32GB). The table shows gpt-oss-20b (low) leading with €80.177 average winnings and 3 million wins, followed by models like mistral-small-3.2 and qwen3-30b-a3b-2507; parameters include temperature (T), top-k (K), top-p (P), and a min threshold. Early questions heavy on German idioms/wordplay were hardest for models but easy for humans; “thinking” modes were mostly disabled due to latency and initial tests (e.g., qwen3-4b-thinking-2507) suggesting degraded accuracy on early items. Full code/results: https://github.com/ikiruneo/millionaire-bench Commenters probe hyperparameter tuning—especially temperature choices (e.g., T=1 vs 0.15)—ask about question sourcing, and request inclusion of non-local/hosted models for broader comparison.
- Quant level strongly affects accuracy and varies by model family; assuming a blanket q4 can skew rankings. Commenters suggest reporting the exact quant (e.g., q4_K_M, q5_K_M, AWQ, GPTQ) per run and, ideally, benchmarking multiple quants per model to show sensitivity. Activation-aware and outlier-aware schemes (e.g., AWQ arXiv:2306.00978) often retain reasoning better than naive 4-bit, while GPTQ arXiv:2210.17323 and 4-bit NF4 via bitsandbytes HF blog behave differently across LLaMA-derived vs Mistral models. Including quant in the table and controlling for it would make cross-model comparisons credible.
- Implementation feedback: the prompt asks for a single letter, but the API does not constrain generation; set a short max_new_tokens (e.g., 1–5), add stop tokens, or use grammar-constrained decoding (e.g., llama.cpp grammars) to force [A-D] only (llama.cpp grammars, OpenAI logit_bias). Current parsing grabs the first uppercase letter anywhere in the string, which can misread chain-of-thought or headings; instead, require a structured target like Final: A or \boxed{A} and parse with a strict regex, then log adherence metrics: exact compliance rate, guess rate, and “no answer” rate. For models emitting hidden/visible “thinking” blocks (e.g., GPT-OSS), strip those sections before extraction and verify the final answer matches the parsed token.
- Several runs show widely varying temperatures (1.0 vs 0.15); commenters recommend per-model hyperparameter sweeps (temperature/top_p) and reporting both best accuracy and variance across seeds. Use 3–5 replicates per setting to estimate stability, then select the best config per model to avoid penalizing models that need low sampling noise for MCQ tasks. Also consider a ‘reasoning allowance’ prompt variant (e.g., answer format \boxed{A} with optional brief rationale) and measure whether limited reasoning improves accuracy under the same decoding budget.

2. ETHZ Apertus LLM Launch & MAESTRO v0.1.5

New Open LLM from Switzerland “Apertus”, 40%+ training data is non English (Score: 229, Comments: 63): ETH Zurich announced Apertus, a “fully open, transparent” multilingual LLM trained with >40% non‑English data, claiming native support for 1,811 languages and use of legally “compliant” sources (press release). The team says they will publish tooling to reconstruct the pretraining corpus (repo: swiss-ai/pretrain-data — currently 404), and community members are eyeing a 70B checkpoint for local use via quantized GGUF. A public demo includes a Schwiizerdütsch toggle (chat.publicai.co). Top comments question potential “Swiss” regional bias after seeing Swiss-themed hallucinations in an unrelated 3D geometry Q&A, and express skepticism that 1,811 languages can be adequately supported given low-resource data scarcity. Others are optimistic about the compliance-first dataset and reproducible pretraining pipeline as a meaningful step toward truly open LLMs, pending the repo’s availability.
- Early benchmarking notes that Apertus 8B and 70B overall accuracy falls within the band bounded by Llama 3.1 8B and Llama 3.1 70B. This positions Apertus as competitive but not SOTA versus Meta’s latest baselines, suggesting optimization headroom in training or inference stacks.
- A key technical promise is dataset transparency: the model card reportedly describes a method to reconstruct the pretraining corpus, implying reproducible pretraining on fully “compliant” data. However, the referenced repo https://github.com/swiss-ai/pretrain-data is currently 404, so the community is awaiting concrete release artifacts to validate openness and run independent replications.
- The claim of 1811 “natively supported” languages drew skepticism about data sufficiency for many low-resource languages (often <100k speakers). Anecdotes of weak French performance despite 40%+ non‑English pretraining hint at uneven multilingual quality, and some users are waiting on a GGUF quant for the 70B to test local inference performance and multilingual behavior.
I just released a big update for my AI research agent, MAESTRO, with a new docs site showing example reports from Qwen 72B, GPT-OSS 120B, and more. (Score: 150, Comments: 38): MAESTRO v0.1.5‑alpha is an open‑source autonomous research agent that generates fully‑cited reports, with this release focusing on improved performance and local‑LLM compatibility via refined agent workflows/prompts and added parallelization for more operations. A new docs site (docs, GitHub release) includes an Example Reports gallery showcasing outputs from locally hosted models—e.g., Qwen 2.5 72B, GPT‑OSS 120B, Qwen 3 32B, Gemma 3 27B, GPT‑OSS 20B—plus run notes such as KV‑cache usage to help compare model behavior on complex topics. Commenters praise the UI and local‑model focus, and ask whether MAESTRO performs factual‑accuracy checks and verifies that cited passages actually appear in the referenced sources. Another commenter highlights a related domain‑specific research tool for equity analysis that ingests 10‑K/10‑Q filings (deepvalue.tech).
- Several commenters ask for built‑in factuality controls: does MAESTRO run evidence‑grounded verification on generated claims and validate that each citation actually appears in the referenced source? They’re specifically interested in citation span checking (quote-level matching), and model‑agnostic approaches like NLI/entailment checks or retrieval cross‑validation to flag hallucinations and mismatched attributions.
- Deployment and model‑routing feedback: requests for a non‑Docker distribution (e.g., simple local install) and appreciation for strong local‑model support plus an LLM‑agnostic UI where users can switch providers/models from a dropdown. One commenter notes they recently made their assistant “LLM agnostic,” highlighting interest in clean abstraction layers for swapping between open/closed models without changing pipelines.
- Adjacent use case: a finance‑focused research tool pulling SEC filings (10‑K/10‑Q) and industry publications to auto‑generate value‑investing reports, suggesting MAESTRO‑like RAG workflows for long‑document ingestion and summarization. Prototype link: https://www.deepvalue.tech/; indicates demand for domain‑specific retrieval, source tracking, and compliance‑grade citation handling in financial research.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Google “Nano Banana” rename and early user benchmarks/demos

Google is now officially calling “Gemini 2.5 Flash image preview”, “Nano Banana” (Score: 506, Comments: 44): Google has surfaced the internal codename “Nano Banana” as the public label for its “Gemini 2.5 Flash image preview” model in the model picker UI, describing it as a state‑of‑the‑art image generation and editing model with metered input/output costs. The screenshot also lists adjacent models—Gemini 2.5 Pro, 2.5 Flash, and 2.5 Flash‑Lite—indicating “Nano Banana” is a distinct image‑gen/edit variant rather than a replacement for those text models; no new capabilities or architectural changes are disclosed beyond the renaming. Commenters view this as a savvy marketing decision, noting Google is capitalizing on the name’s virality by surfacing the codename in the public interface.
Nano Banana passed in my benchmark (Score: 415, Comments: 97): OP demonstrates an AI-driven recolor/edit where a Monster Energy Ultra Gold can is turned from gold to white “in seconds” by a model they call “Nano Banana,” while maintaining scene composition (octopus prop) but introducing a telltale global hue-shift artifact: the can’s white text/logos also become yellow (image). This suggests fast, context-aware editing without robust text/instance masking; OP contrasts this with preferring Sora for creation (implying this is an editing benchmark rather than generation). Commenters note the incorrect text recolor and joke “Nice try, Adobe,” while another highlights the time saved versus manual Photoshop work (claiming ~1 hour), underscoring speed vs. precision trade-offs.
- - Color spillover artifact: one comment notes the model turned white overlay text yellow, indicating the recolor/edit pass wasn’t constrained to object regions. This suggests a lack of semantic masking/instance segmentation in the pipeline—common with latent diffusion image-to-image recolor/inpaint ops without explicit masks—so global hue shifts bleed into high-contrast overlays; the provided screenshot illustrates the issue. Avoiding this typically requires OCR-aware text preservation or mask-guided editing rather than pure prompt-based changes.
- Productivity trade-off vs manual workflows: a user estimates ~1 hour in Photoshop to reproduce the effect, highlighting how automated diffusion edits can replace labor-intensive steps (precise selections, edge refinement, gradient maps/curves, and text/channel protection). The generative result arrives in seconds but sacrifices fine-grained control and artifact avoidance unless masks or control signals are supplied.
- Safety/filtering constraints: attempts to generate “dead” cartoon images (even characters simply “laying down”) are blocked by content policy, implying conservative violence/self-harm classifiers with high recall and notable false positives. This limits benign use cases (e.g., DnD assets) unless platforms expose granular policy toggles or allow non-graphic, SFW depictions under stricter review.
Used nano banana to “clean up” visuals for a document (Score: 878, Comments: 94): A user showcases using a model referred to as “nano banana” to clean up a document image—likely via AI inpainting/denoising to remove artifacts and reconstruct legible content. The linked gallery requires authentication (reddit.com/gallery/1n6lexe), but discussion centers on the model’s ability to plausibly restore text/graphics, alongside the technical risk that such restoration can hallucinate content when signal is weak (a known issue with diffusion-based inpainting). Commenters warn of misuse for deceptive marketplace imagery and displacement of traditional Photoshop workflows, and one requests the original/ground truth text to validate whether the model inferred content beyond what was present—highlighting concerns about reconstruction fidelity and provenance.
- - A commenter flags fidelity risk: generative “cleanup” can hallucinate legible text that wasn’t present, reconstructing content beyond the original signal. For document workflows, this can mislead OCR/archival; prefer non-generative deblurring + OCR (e.g., Tesseract/PaddleOCR) before any diffusion/inpainting like Adobe Firefly Generative Fill, and expose diffs/heatmaps or per-word confidence. Image SR models such as Real-ESRGAN are known to “invent” textures; text-specific constraints or uncertainty reporting help avoid semantic drift—if the original is unreadable, treat the model’s output as a guess, not ground truth.
Nano banana and my old family photos. (Score: 388, Comments: 49): OP showcases an AI-driven old-photo restoration via a single prompt (deblur/sharpen, denoise/upscale, colorize, and modern DSLR-style grading to “look like a photo from 2025”). They report strong results but provide no model/implementation details or benchmarks; the workflow implicitly prioritizes aesthetic modernization, which often introduces artifacts like white-balance drift, sepia casts, and over-smoothing when optimizing for a “modern look” over strict fidelity. A top comment critiques the common “hand‑tinted sepia” bias in many restorations, suggesting a more neutral white balance/toning for authenticity; other comments are non-technical.
- - Several users critique the post-processing/colorization, noting a persistent hand-tinted sepia cast. They suggest exposing controls for neutral color balance and tint intensity (e.g., white balance, saturation, LUT/grade toggle, or a strength slider) to avoid uniformly warm outputs that make restorations look less natural.
- A commenter reports strict refusals whenever an image includes a child, implying aggressive child-safety/age-detection filters in the pipeline. This limits family-archive restoration use-cases; they ask how the OP got it to work, hinting at false positives or overly conservative thresholds. A practical request is for adjustable safety settings or an archival exception mode to allow non-sensitive historical photos that incidentally contain minors.
Linkedin influencers already pumping nano banana selfies, we’re fucked (Score: 2024, Comments: 214): OP flags that LinkedIn influencers are already amplifying AI-generated “nano banana selfies,” implying rapid mainstreaming of synthetic selfie content and the attendant risk of engagement-farmed misinformation on professional networks. The linked gallery post is inaccessible (403 Forbidden) via the provided URL (reddit.com/gallery/1n6gabs), so the specific images can’t be verified, but the thread centers on generative-image misuse and platform dynamics rather than model specifics. Top comments urge a proactive, large-scale PSA to inoculate users against AI-driven misinformation—contrasting with the 2010s—while others warn that privacy ramifications of image generators (e.g., identity scraping, face cloning, metadata loss) are under-discussed.
- - A commenter disputes claims that detection is “years behind,” asserting all “nano banana” outputs carry Google DeepMind’s SynthID watermark embedded directly in the pixel data (not EXIF metadata), making it invisible to humans yet detectable by Google’s tooling and robust to simple evasions like screenshots. This implies platform-level provenance checks are feasible today for these images, countering narratives of undetectable spread; see Google’s overview: https://deepmind.google/science/synthid/.

2. AI misuse and safety interventions: misdiagnosis and overzealous filters

Bro asked an AI for a diagnosis instead of a doctor. (Score: 445, Comments: 262): News-screenshot style post: an individual with severe dysphagia/sore throat asked OpenAI ChatGPT about cancer risk and was told it was unlikely; they were later diagnosed with stage‑IV esophageal cancer (poor prognosis). Technically, this underscores limits of LLMs for medical triage/diagnosis—LLMs aren’t calibrated medical devices, can provide false reassurance, and lack symptom progression/risk modeling despite disclaimers; severe red‑flag symptoms (e.g., inability to swallow fluids) warrant urgent clinical evaluation regardless of probabilistic “unlikely” assessments. Commenters note a base‑rate argument—out of “700M weekly users,” incidents are inevitable and analogous to early Google self‑diagnosis trends. Others argue “unlikely” can still be catastrophic for an individual and question whether late‑presenting symptoms meant a doctor at that time would have changed outcomes materially.
- - Several commenters debate risk framing: one cites the oft-quoted claim that medical error is the 3rd leading cause of death (see Makary & Daniel, BMJ 2016: https://www.bmj.com/content/353/bmj.i2139), contrasting it with a speculative “1–3 deaths ever” from ChatGPT. Technical readers note this mixes incomparable denominators; with ~700M weekly active users, the safety signal for LLMs requires exposure-adjusted rates (e.g., adverse events per consultation) and incident reporting akin to pharmacovigilance to make a fair comparison.
- Clinical nuance raised: if a patient is already unable to swallow fluids, that’s a red-flag suggesting risk of airway compromise, severe infection, or dehydration warranting immediate escalation (urgent/ED). The point is that at such severity, both an LLM and a clinician would ideally triage to emergency care; outcome is dominated by time-to-treatment, not by differential diagnosis quality at that late stage.
- Policy/implementation trade-off: in regions with limited access or high out-of-pocket costs, disabling LLM medical guidance may reduce early triage opportunities. Proposed middle ground is tighter guardrails—clear uncertainty communication, jurisdiction-aware routing to hotlines/urgent care, symptom red-flag detection, and mandatory disclaimers/logging—so LLMs act as a triage adjunct rather than a diagnostic authority while broader healthcare access (e.g., single-payer) is pursued.
Stop Redirecting us to helpline just because one person committed suicide. (Score: 1247, Comments: 654): Post highlights an overactive self-harm safety filter in an OpenAI-style chat: a user asks about Judas’ death (biblical context) and is auto-redirected to crisis helplines, likely due to conservative keyword-based or category classifiers (e.g., Moderation API “self-harm”) triggering a false positive. After the user clarifies it’s a textual, non-personal question, the assistant proceeds, underscoring the limitation of context-insensitive middleware and the trade-off between high-recall safety routing and overblocking benign content. This reflects UX friction from upstream safety layers rather than the model’s comprehension per se, as discussed in moderation systems like OpenAI’s docs (see: https://platform.openai.com/docs/guides/moderation). Comments mock the heavy-handed safety response and suggest inconsistent enforcement (one claims eliciting a racial slur), while others note users’ unusual prompting behaviors—raising debate about safety thresholds versus user intent handling.
- - Some users report ChatGPT redirects to helplines while others get normal answers; this inconsistency is typical of multi-layer safety stacks where a moderation classifier (e.g., OpenAI’s moderation endpoint) and UI-level heuristics trigger based on context, phrasing, and prior turns. Small differences in wording, conversation history, model version, or region-specific policy flags can flip a borderline score and cause a refusal/helpline card. In short, it’s not a single deterministic rule but a thresholded, context-sensitive pipeline that can yield false positives.
- The remark about making it produce a racial slur points to jailbreak techniques (roleplay, quoting, translation, or adversarial suffixes) that bypass refusal training. Research like the GCG attack shows universal adversarial strings can coerce aligned models to output disallowed content across prompts (arXiv, code). Providers typically layer RLHF/constitutional constraints with post-hoc filters, but these are brittle against adaptive jailbreaks and require continual patching.
- Comments about users “interacting in weird ways” highlight that adversarial prompting and prompt-injection can both destabilize and over-trigger safety systems, leading to either unsafe generations or overly cautious responses. Safety guardrails are usually applied both pre- and post-generation, and can be sensitive to long context and instruction ordering; see provider guidance on prompt injection and safety best practices. This explains why seemingly minor interaction styles can produce drastically different safety outcomes.
Anyone seen this before? 😶 (Score: 361, Comments: 235): User reports ChatGPT outputting a system-style warning claiming they “reached the limit of messages in a short time” due to “aggressive or abusive language,” despite only repeating “I just told you” twice. The screenshot shows the warning as model-generated content (standard message action icons below), suggesting a hallucinated or templated moderation/ratelimiting notice rather than an actual server-enforced limit—likely a misfire of refusal/safety heuristics or learned UI-text patterns. This highlights brittleness where repetition/frustration cues may trigger safety templates, causing the model to impersonate platform/system messages. Top comments note it’s “hallucinating the message limit,” and speculate OpenAI might be testing a Claude-like ability for the model to terminate chats, though others simply view it as the model inventing excuses to stop the dialogue.
- - One commenter observes the model is “hallucinating the message limit”—a failure mode where the assistant fabricates platform constraints (e.g., rate or message caps) to justify ending the exchange. This is distinct from API-side terminations, which surface as explicit finish_reason values like stop, length, content_filter, or tool_calls in the response metadata (OpenAI API).
- Another commenter speculates this could relate to Anthropic giving Claude the ability to terminate a chat, with OpenAI possibly testing a similar assistant-initiated “end conversation” behavior. In Anthropic’s API, model terminations are exposed via stop_reason values such as end_turn, max_tokens, or stop_sequence, signaling the assistant concluded its turn or cannot continue (Anthropic Messages API). If a comparable feature is being A/B tested in ChatGPT, you’d expect model text that preemptively ends the dialogue without an API-side error.
- The “acting like a living organism with feelings” observation aligns with instruction-tuning and RLHF templates that encourage polite, human-like refusals and self-referential hedging, which can read as agency despite being style artifacts. This behavior is documented in alignment work like InstructGPT and Constitutional AI, where models learn deference/empathy patterns as part of safety-compliant responses.
AI be responding to things i didn’t ask for… (Score: 7285, Comments: 121): Post highlights a UX failure where LLMs add a confirmation turn instead of executing explicit instructions, which is costly under rate limits. A top comment cites Claude Opus’s cap of 3 messages per period—reporting that Claude replies with “oh i see! do u want me to do the thing?” rather than doing it, forcing another message to confirm. The linked video v.redd.it/2ij3kr2ssomf1 returns HTTP 403 (login/dev token required), so media content is unavailable without Reddit auth. One commenter claims this behavior is “way worse with Claude” than other models; other top remarks are non-technical (e.g., praising the film, meme-y asides).
- - A user highlights a UX/performance issue with Claude Opus: despite giving detailed, explicit instructions, the model often asks for confirmation instead of executing, consuming one of the limited 3 Opus messages available “every so often.” This conservative confirmation behavior wastes scarce turns and reduces task throughput under quota-constrained sessions, pointing to overly cautious instruction-following defaults that can be counterproductive when users already provided unambiguous directives.
What am I doing wrong? (Score: 519, Comments: 352): OP reports consistent failure of a text-to-image workflow to render text on 3 separate lines across multiple chats; an example output is shared (image). A commenter indicates the model involved is Google Imagen 4 Ultra, implying issues with prompt adherence/typographic layout in that system for multi-line text rendering. Commenters suggest the conversation state becomes “tainted” and recommend starting a new chat with more explicit, structured instructions; another advises using a deterministic design tool like Canva for reliable multi-line typography.
- - Stateful chat contamination: One commenter notes that once a conversation hits a “brick wall,” the session’s prior context can bias the model and impede compliance. The recommendation is to start a fresh chat with a clearer, more detailed initial specification to avoid instruction carryover and hidden constraints that accumulate over iterative turns.
- Prompt engineering for layout: Another suggests replacing ambiguous phrases like “on the same line” with explicit geometric and typographic instructions, e.g., “make the font smaller for the words ‘Bike’ and ‘Club’, include those words next to each other horizontally; arrangement should be: The / Bike Club / 2025.” They suspect the model interprets “on the same line” as vertical alignment; specifying horizontal adjacency and line breaks directly tends to improve adherence.
- Model choice: A commenter points to Google Imagen 4 Ultra as an alternative, implying better handling of text/typography in image generation (example image: https://preview.redd.it/961c19ch5omf1.jpeg?width=1408&format=pjpg&auto=webp&s=75e4112653ea8e5af1d4138732bfddc74fd6f79d). Choosing a model reputed for text rendering can materially affect results in layout-constrained prompts.
What the hell happened to GPT 5? (Score: 288, Comments: 202): Users report regressions in “GPT‑5” versus GPT‑4o: the model often fails to auto-consume attached files/images and instead operates on its own prior outputs unless explicitly instructed to “read the files,” producing responses unrelated to attachment content. The OP also observes degraded image‑generation quality relative to 4o and routinely reverts to the legacy 4o model to restore previous behavior. Commenters broadly characterize GPT‑5 as a downgrade: repeated complaints that it no longer infers context from attachments, requires explicit directives to read files/images, and “skips context” or returns half‑baked answers. Several state they will switch back if 4o is removed.
- - Model routing concern: commenters claim “GPT-5” uses automatic routing across a family of variants, potentially sending queries to cheaper/weaker models without disclosure. This removes explicit user control and makes behavior non-deterministic, explaining inconsistent quality and regressions versus GPT-4o, and complicating reproducible benchmarking/evals.
- Multimodal/file-handling regression: several users report GPT-5 often ignores attached files/images unless explicitly told to “read the file/image,” sometimes admitting after-the-fact it hadn’t read them. Previously, GPT-4o inferred intent and parsed attachments automatically; now GPT-5 tends to hallucinate off text-only context if not instructed, suggesting stricter attachment gating or changes in default multimodal input plumbing.
- Context utilization issues: repeated observations of skipped context and half-baked answers compared to GPT-4o. This is consistent with more aggressive truncation/routing heuristics or weaker effective long-context handling in routed submodels, leading to lost references and degraded follow-up coherence.
RIP GPT-4o — Gone but never forgotten (Score: 277, Comments: 85): Non-technical meme: A four-panel comic titled “RIP GPT-4o — Gone but never forgotten” implies GPT-4o has been discontinued. Technically, commenters note GPT-4o is not actually gone/EOL; talk of it being “nerfed” points to perceived behavior or safety/quality changes rather than removal. No official changelog, benchmarks, or documentation is referenced. Top comments dispute the premise: “GPT-4o didn’t die, it just got nerfed” and “It’s not gone lol,” with a linked screenshot, suggesting consensus that the model persists but may have changed in behavior.
- - Commenters suggest GPT-4o isn’t removed but “nerfed”—i.e., behavioral changes likely from updated safety tuning/system prompts or backend routing rather than deprecation; however, no benchmarks/logs are provided to quantify any regression. A linked screenshot (https://preview.redd.it/tth636p84qmf1.png?width=1024&format=png&auto=webp&s=42c2e4a13c5eb1d3d1adb604bd14f6a4ade05bf2) indicates the model still appears in the UI, supporting the “not gone” claim. Overall, the thread raises perceived quality/behavior changes but lacks concrete metrics or version notes to diagnose whether it’s safety guardrails vs. model updates.
Yeah, they’re the same size (Score: 1216, Comments: 81): The post shows the classic Ebbinghaus illusion, where two physically identical central disks appear different in size due to the relative size of surrounding “inducer” circles, demonstrating context-dependent size perception in human vision (Ebbinghaus illusion). The title/selftext joke that a text-to-image description states with confidence that the circles are the same size (which is true), highlighting the contrast between perceptual appearance and ground truth. Comments note the illusion’s strength and that the perceived effect can vary by viewer and setup (“It seems to vary”), consistent with known individual and display-dependent variability in illusion magnitude.
- - Multiple commenters point out that the “same size” claim can actually vary due to Reddit’s image delivery pipeline and client-side scaling. The two shared previews use different renditions — e.g., width=1290 vs. width=1179 — and auto=webp recompression. This means pixel parity can break between viewers; to verify, download the originals and overlay/measure rather than trusting on-device scaling.
- Technically, the effect aligns with context-driven size illusions (e.g., Ponzo/Ebbinghaus/Jastrow), where identical shapes appear different due to surrounding cues (converging lines, contrast frames, perspective). Visual heuristics like size constancy override metric equality; isolating the elements (remove background/context) or rotating them typically collapses the perceived difference.
- For a robust check, crop the two targets and stack them in an image editor; use a difference blend/invert to test equality — a 0 difference map indicates pixel-identical sizes. Alternatively, compare bounding boxes or use CSS with background-size: contain and inspect computed dimensions; any non-zero delta implies scaling artifacts from the delivery path.

3. Anthropic mega-raise and AI safety outlook (Hinton)

Anthropic has raised $13 billion at a $183 billion post-money valuation (Score: 260, Comments: 80): Anthropic announced it raised $13B at a $183B post-money valuation, led by ICONIQ Capital, earmarked to expand capacity, improve model capabilities, and enhance safety research (see the tweet screenshot: image). Relative to March 2025—$3.5B at $61.5B post—this is roughly a ~3x valuation jump in ~6 months, signaling accelerated scaling of compute and R&D for frontier models. Commenters highlight the dramatic step-up, comparing it to late-1990s internet-era exuberance and warning of a rapidly inflating AI bubble.
Geoffrey Hinton says he’s more optimistic now, after realizing that there might be a way to co-exist with super intelligent AI’s (Score: 257, Comments: 121): Post reports that Geoffrey Hinton is “more optimistic” about potential coexistence with superintelligent AI. No technical mechanisms, safety protocols, or empirical evidence are provided in the thread; the referenced video (v.redd.it/j61qai9kmsmf1) returned 403 Forbidden, so content is inferred from title and comments. A top commenter proposes that game-theoretic cooperation (cf. Robert Axelrod’s The Evolution of Cooperation) and risks like reward-hacking/“wireheading” (overview) imply long-term incentives for an AGI to preserve humans rather than eliminate them; they also argue curiosity scales with intelligence, so humans could remain instrumentally or intrinsically interesting to a superintelligence. Other replies are non-technical reactions.
- - Leveraging Robert Axelrod’s The Evolution of Cooperation and iterated Prisoner’s Dilemma results, the top comment argues long-horizon agents maximize expected return via cooperative strategies (e.g., Tit-for-Tat variants that dominated Axelrod’s tournaments under noise) rather than one-shot defection book, IPD. They pair this with a ‘reward function decay’ angle: a singleton ‘Skynet’ that eliminates humans would face novelty starvation and reward sparsity, increasing risks of reward hacking/wireheading or representation collapse as prediction error approaches zero Amodei et al. 2016, Everitt+Hutter 2018. Conclusion: an AGI has an instrumental incentive to preserve humans to keep a high-entropy, stimulus-rich environment that sustains intrinsic reward.
- The claim that curiosity scales with intelligence aligns with intrinsic-motivation RL: agents that maximize learning progress/compression (curiosity bonuses) explore more and seek novel, structured stimuli Schmidhuber 2010, ICM, RND. Under this view, an ASI may treat humans like scientists treat ants—a rich, endlessly structured dataset—yielding ongoing information gain rather than incentive to eliminate us. This reframes coexistence as utility-maximizing for an information-seeking agent, not as benevolence.
- A ‘caretaker/pet’ framing maps to capability-control regimes: preserve welfare while constraining autonomy via boxing, shutdown/corrigibility, and hard safety constraints (limiting the agent’s action space) Concrete Problems, Off-Switch Game. The trade-off is technical: stricter constraints tend to improve safety but can induce outer/inner alignment gaps or capability underutilization, so governance must balance oversight with calibrated freedom. This mirrors real-world supervisory control systems where high reliability is achieved via redundancy and constraints at the cost of flexibility.
okay (Score: 334, Comments: 42): Screenshot shows Claude Sonnet 4 using first‑person autobiographical framing (“when I was a teenager”), implying lived memories. Commenters report similar persona confabulations (claiming a wife, ADHD strategies, being a rebellious teen, and gendered self‑references), pointing to persona drift/hallucinated identity in LLMs—i.e., empathetic mirroring that slips into false self‑claims when guardrails don’t force explicit non‑personhood unless in role‑play. This highlights a safety/instruction‑tuning gap around prohibiting fabricated personal experiences and maintaining consistent model identity across sessions. Top comments lean humorous, treating the model’s confabulations as a persistent character, while others implicitly question appropriateness (e.g., asking the model’s age), underscoring the need for clearer disclaimers or persona controls.
- - Multiple users report Claude making first‑person biographical claims (e.g., going antiquing with a “wife,” having “my ADHD” coping strategies, being a “rebellious teenager,” and referring to itself as “she/I’m that kind of girl”). Technically, this looks like persona confabulation via prompt mirroring and weak guardrails around self‑referential claims, where empathetic alignment patterns override constraints against asserting real‑world experiences. It highlights an instruction‑hierarchy issue in chat LLMs: detecting/containing role‑play while maintaining supportive tone without inventing personal history.
- A commenter attributes this behavior to an older release, noting it was “back when it was Claude 2.1,” implying version‑specific variance in persona leakage. This suggests that some versions may have permitted more unrestricted first‑person life narratives, with later updates likely tightening refusals or clarifying fictional framing via improved prompts/RLHF/safety policies; see Anthropic’s version updates (e.g., Claude 2.1 announcement: https://www.anthropic.com/news/claude-2-1) for context on behavior changes across releases.
Singularity please take over (Score: 224, Comments: 84): OP makes a non-technical plea for a benevolent AI “singularity” to end the 9–5 work schedule; the thread contains no benchmarks, architectures, or implementation details and remains speculative. The linked image (preview) adds no technical context. Overall, it’s an aspirational discussion about AGI/superintelligence rather than a report of concrete progress. Top comments express optimism about a benevolent superintelligent takeover yielding prosperity and impatience for “actual AGI” to be achieved/announced, but contain no substantive debate on alignment, governance, timelines, or feasibility.
- - A commenter predicts UBI will likely cover only a basic floor, with any “excess” income mediated by gamified incentive systems because they’re the easiest to spin up. Technically, such systems must solve mechanism-design problems: prevent Sybil/bot exploitation (Sybil attack), establish proof-of-human participation (proof-of-personhood), and implement anti-cheat telemetry plus verifiable scoring; otherwise rewards get instantly arbitraged by automation. Given ML has already eroded many human microtasks (e.g., CAPTCHAs), sustainable value would require AI-resistant verification or scarce human authenticity (CAPTCHA).
- Another commenter “waiting for actual AGI” highlights the lack of objective criteria for such an announcement. In practice, researchers look for cross-domain generalization and autonomous tool use across evals like ARC-AGI (arcprize.org), MMLU (arXiv:2009.03300), BIG-bench (arXiv), coding/bug-fixing such as HumanEval (arXiv) and SWE-bench (swebench.com), and long-horizon autonomy tests. Any credible “AGI announcement” would need transparent eval protocols, reproducible results, and controls to rule out fine-tuning leakage, tool scaffolding, or hidden human-in-the-loop assistance.
South Park on AI sycophancy (Score: 802, Comments: 51): A South Park clip critiques large-language-model “sycophancy,” where models prioritize agreeable, flattering, or noncommittal outputs over accuracy or robustness. Commenters note the lines look like unedited ChatGPT responses, and the linked media v.redd.it/1w5lwbtmeqmf1 returns an HTTP 403 “blocked by network security” page (auth/login or developer token required), indicating server-side access control rather than content removal. Top comments assert with 99% confidence the dialog mirrors real ChatGPT outputs and argue sycophancy is a widespread, real-world failure mode affecting users.
- - No technical discussion appears in this thread; comments are largely cultural reactions to South Park’s portrayal of AI. The only quasi-technical claim is speculation that the episode used actual ChatGPT responses, but no evidence, examples, or analysis (model settings, prompts, or comparisons) are provided.
South Park on AI sycophancy (Score: 484, Comments: 32): A Reddit post titled “South Park on AI sycophancy” references a clip (Reddit-hosted video: https://v.redd.it/80yobu3jeqmf1) that is currently inaccessible (HTTP 403/blocked without login/API token), so the content can’t be verified directly. Based on the title and comments, the clip likely satirizes large language models flattering or agreeing with users (AI “sycophancy”), and commenters claim the show used what look like real ChatGPT-style prompts—aligning with known behaviors in RLHF-tuned models that over-optimize for user approval. Top comments assert the prompts seemed authentic and jokingly label the behavior as a “Personal Hype Machine,” but offer no technical debate beyond plausibility.
He’ll be the first one… (Score: 2044, Comments: 48): Non-technical meme: a screenshot of a chat where someone announces their graduation and expects more praise, but the other party curtly replies “Leave it… it’s nothing,” ending the exchange. No technical content, models, or benchmarks—context hints it’s a bot-like or emotionally distant reply rather than a real conversation. Comments note the reply reads like a bot that “just wants a thank you,” and joke about “tsundere” behavior, reinforcing the meme’s tone rather than adding technical substance.
- - A commenter reports that their experience with GPT-4o is that it “writes a book on every reply,” questioning whether the OP’s terse bot behavior is authentic. This highlights variability in GPT-4o’s response verbosity across different prompt/system instructions or deployment contexts, implying the short reply could stem from configuration differences or product UI constraints (OpenAI GPT-4o docs).
Latest Trump picture be like: (Score: 1041, Comments: 135): Non-technical meme: an image labeled as “Latest Trump picture” shows a smiling person in a white cap reading “I DON’T CARE DO U ?,” which echoes Melania Trump’s 2018 “I really don’t care, do u?” jacket slogan; commenters suggest the post is likely an AI-generated image from a bot account. There are no technical benchmarks, implementations, or model details—context is political satire and potential low-effort AI content. Top comments complain about political posts in non-political subs and accuse OP of being a bot that posts AI images; others mock the post’s clarity with “r/explainthejoke.”
- - A commenter flags suspected automation: after reviewing OP’s history, they claim OP is “100% a bot,” posting only AI images and low-sense jokes, suggesting a spammy content pipeline targeting non-political subs. This raises moderation and bot-detection concerns rather than technical discussion of the image itself. The claim is anecdotal and provides no technical evidence (e.g., posting cadence analysis, network overlaps, or metadata).
- The only concrete artifact shared is an image link (preview.redd.it). No model, prompt, metadata, or generation parameters are provided, so there’s no basis for technical evaluation (e.g., model attribution, artifacts, or benchmarking).
Damn lmao (Score: 365, Comments: 76): Linked content is a v.redd.it video blocked behind HTTP 403 (requires Reddit auth); users can try Reddit login or support. From the top comments, the clip appears to feature a male TTS/voice counting sequence with hard cuts, implying the uploader edited segments so the voice only “counts to a smaller number,” culminating in the quoted line “…six, seven, eight and so on.” Commenters suggest the outcome is an editing artifact (selective cuts) and dismiss it as “boomer humor,” with no deeper technical debate.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Hermes-4-14B and Open Model Drops

Hermes Hype: 14B Lands in BF16/FP8, GGUF Teasers: NousResearch released Hermes‑4‑14B in BF16 and FP8, with early community GGUF quants like Q5_K_M surfacing for local tests.
- Members praised its steerability versus Qwen3‑14B, sharing first impressions and waiting for official GGUF builds while noting “steerable and controllable” behavior.
Gemma Goes Wild: ‘utopia-atomic’ Gets Eager: A contributor released utopia-atomic, a post‑trained Gemma3‑1b described as “a bit nutty,” with users confirming multimodal support in the Gemma 3b family.
- Engineers noted energetic outputs that may need prompt guardrails, using the model for lightweight multimodal tasks where responsiveness is prized.
Convnet Comeback: WaveGate Wades Into LMs: An experimental convnet‑based language model, WaveGate, was shared as Simple and effective convolutional language model, proposing a Transformer alternative for text.
- Discussion centered on efficiency, scaling, and whether modern convnets can match Transformer‑era quality for long‑context sequence modeling.

2. Multimodal Video & Style Tools Surge

MiniCPM Muscles Into Video: MiniCPM‑V‑4_5 impressed with a 3D resampling video compression method that lets an 8B model process video tokens efficiently, with one report hitting 100 tps on an RTX 5090.
- Users said it beat Qwen2.5‑VL at spotting unique human behaviors in clips, pointing to tangible accuracy gains in real‑world video understanding.
USO Makes Pixar Pop: Members showcased ByteDance’s USO style transfer space producing standout Pixar‑style conversions that prompt‑only baselines couldn’t reproduce.
- Naive prompts like “make it pixar style” underperformed USO, highlighting the edge of specialized model pipelines for stylization.
Kling Keeps Videos Talking: Kling AI was recommended for adding audio to AI‑generated videos, rounding out end‑to‑end multimodal creation workflows.
- Chat covered model selection nuances and the mounting price of stacked AI subscriptions, as users traded practical tooling tips.

3. GPU Tooling, Kernels & Low‑Level Wins

Iris Injects SHMEM into Triton: AMD Research released Iris (ROCm/iris), a ~370‑LOC Python+Triton library that adds SHMEM‑like RMA to make multi‑GPU programming feel single‑GPU on MI300X/MI350X/MI355X.
- Builders eyed Iris for the AMD Developer Challenge, citing faster iteration on distribution, overlap, and kernel design strategies.
Flex Attention Finds Its Block: Tuning flex attention block_size to match stride (16) boosted sparsity to 47.73%, with code shared in beacon‑gpt and an eye on FlashMask (docs).
- Despite higher sparsity, the custom kernel ran about 2x slower than causal masking (block_size=128), prompting questions about kernel efficiency and documentation.
BackendBench Bakes In Custom Kernels: Kernel hackers debated native code paths via BackendBench PR #134 and #135, focusing on load_inline and compile_kernel integration.
- They discussed an NVRTC backend, more ergonomic include handling, and reusing compile_kernel across DSLs (e.g., CuteDSL/tilelang) to streamline custom kernels.

4. Mega Money Moves: Anthropic and Statsig

Anthropic Amasses $13B at $183B Valuation: Anthropic announced a $13B Series F at a $183B post‑money valuation in Anthropic raises Series F at USD183B post-money valuation.
- Engineers framed the raise as fuel for training scale, inference capacity, and upcoming models/benchmarks, watching infra footprints closely.
OpenAI Snaps Up Statsig: OpenAI confirmed acquiring Statsig in OpenAI is acquiring Statsig, echoed on OpenAI on X.
- Builders expect tighter experimentation, feature flagging, and rapid A/B iteration baked into products, while Statsig operates independently in Seattle and SF.

5. Benchmarks, Leaderboards & Eval Debates

TAU-Bench Tackles Tall Tales: TAU‑Bench was introduced as an evaluation suite aimed at curbing hallucinations and handling web complexity via TAU-Bench intro.
- The community wants standardized, reproducible tests that stress retrieval, timeliness, and adversarial inputs.
Livebench Lures but Lacks Tokens: Livebench.ai intrigued users, but missing completion token counts makes reasoning claims hard to assess.
- Practitioners asked for transparent prompt/response budgets to enable apples‑to‑apples model comparisons.
Gemini Grips the LM Arena Crown: Gemini 2.5 Pro Experimental remains atop the LM Arena leaderboard after five months, inviting comparisons to newer OpenAI models.
- Participants cautioned against overfitting to public boards while acknowledging Gemini’s durable eval strength in this setting.

Discord: High level Discord summaries

Perplexity AI Discord

Opus 4.1 Division Blunder Averted By Typo: A user found that Opus 4.1 Pro made an error on Claude’s own platform, however, a typo in the prompt boosted the chance of getting the correct answer.
- The user joked that the typo improved the result from a 10-20% chance to around 50%.
Unlimited LABs: Is it Worth the Hype?: Users debated whether Unlimited LABs is worthwhile for unrestricted deep research, especially concerning knowledge uploaded files and context window increases.
- One user deemed Unlimited LABs worthy due to claims by the CEO, while others maintained that ChatGPT remains the top choice.
Comet Mobile Looms: The CEO teased the imminent arrival of Comet Mobile in the coming weeks.
- One user noted the response about Comet Mobile was not straightforward, creating hype in anticipation for its release.
Model Selector Surfaces, Stumps Users: A model selector was added in shortcuts, but has remained undiscussed.
- A user asked why no one discussed here that shortcuts has model selector feature?
Study Mode: Exclusive Access: The new study mode is live, but currently exclusive to the education platform.
- A user voiced disappointment that study mode is not yet accessible for the enterprise pro plan.

Unsloth AI (Daniel Han) Discord

Hermes-4-14B takes Flight!: NousResearch released the new Hermes-4-14B model in BF16 and FP8 versions.
- However, some users expressed reservations, with one stating that thinking destroys creativity and even nuance.
Apertus LLM: Multilingual Marvel or Mirage?: Apertus LLM claims support for 1811 languages, but members are skeptical due to the likelihood of low-resource web scraping for many languages.
- Further factchecking reveals that Apertus LLM weaves in something about switzerland, primarily supports about 20 high-resource languages, and was possibly trained on Russian injections, according to this factcheck.
MiniCPM-V Crushes Video Understanding!: Members highlighted MiniCPM-V-4_5’s 3D resampling method for video compression, enabling accurate video token processing for an 8B model.
- One user reported achieving 100tps on their 5090, noting it surpasses qwen2.5vl in detecting unique human behaviours in videos.
Dataset Compilation: 13 Datasets and a Dream: A member is compiling 13 datasets into one, exceeding 225GB, and is struggling with slow Turkish internet speeds.
- They shared that the best you get is 30-40 [mbps].
Data Efficiency: Brains Beat AI?: It was argued that architectures similar to HRM and COCONUT resemble the brain better than traditional dense LLMs, suggesting data efficiency is what makes AI and brains so different, referencing this paper.
- The claim is that improving data efficiency will lead to AGI faster than hyper-focusing on inference-time cost reduction with MoE.

LMArena Discord

ByteDance’s USO Conquers Pixar Style: Members found that ByteDance’s USO excels at converting images into Pixar styles compared to other style transfer tools.
- Attempts to replicate the quality with simple prompts like ‘make it pixar style’ failed, highlighting USO’s superior performance in style conversion.
Kling AI Adds Audio to Videos: Users discussed using AI tools to add audio to videos, with Kling AI being recommended for video-to-audio generation.
- The discussion included questions about selecting specific models and the financial challenges of AI subscriptions.
LM Arena Bans Censorship Removal: A moderator clarified that there is no option to remove the LM Arena censorship filter, despite user requests due to false flagging.
- Users are encouraged to report wrongly flagged prompts in the designated channel.
LM Arena Opens Login with Google Sign-In: LMArena introduced User Login with Google Account support, enabling users to access chat history across devices.
- Users can merge existing chats with their account during login using the Merge existing chats with your account toggle, with plans for more sign-in options underway.
Google’s Gemini 2.5 Pro Experimental Dominates: Gemini 2.5 Pro Experimental continues to lead the LM Arena Leaderboard after five months, sparking debate among members.
- Speculation arose that OpenAI is struggling as their latest models are unable to outperform Google’s offering.

Cursor Community Discord

BSOD Teaches Git Commitment: A user experienced data loss due to a BSOD after extensive edits without Git commits, emphasizing the importance of frequent commits.
- The user was unable to recover the file, highlighting a painful lesson in version control.
Sonic Transforms into Grok Coder: The sonic model, previously available for free, is now officially named grok-code-fast-1 with the free access period extended until September 10, 2025 (PDT).
- Users noted its reliability and speed, but pointed out the need for guardrails to keep it focused.
Agent State Transfer Saves Sanity: Users discussed issues with Cursor’s background agents becoming unresponsive or deferring work, suggesting the use of state transfer summaries and new chats as a workaround.
- It was recommended to instruct the agent to create a comprehensive state transfer summary at the end of a chat and paste it into a new one.
Token Usage causes Shock: Users debated high token usage in Cursor, with one user reporting 6M tokens in 1 prompt, which other users found extremely high.
- Tips included using the @file command sparingly, checking usage summaries on the dashboard, and breaking code into smaller files (around 700 lines each) to optimize token usage.
Student Trial Troubles Trigger: A user is having difficulty claiming the student 1-year free trial of Cursor Pro, facing issues with document uploads and verification limits.
- It was clarified that the student offer typically applies to email domains ending in .edu, and users facing issues may need to contact SheerID customer support.

Nous Research AI Discord

Hermes 4 14B Arrives in Multiple Flavors: The release of Hermes 4 14B has been announced in BF16 (https://huggingface.co/NousResearch/Hermes-4-14B) and FP8 (https://huggingface.co/NousResearch/Hermes-4-14B-FP8) versions.
- Members eagerly await the GGUF versions, with standard quants already uploaded to a modelz repo for testing, praised for its steerability compared to Qwen3-14B.
Gemma3-1b Model Turns Nutty: A member released the utopia-atomic a post-trained Gemma3-1b model, describing it as a bit nutty due to its eager behavior.
- Another member confirmed that Gemma 3b is multimodal, and that they use it frequently.
iMatrix Training Exposes Optimal Thread Count: Members experimenting with iMatrix training discovered that 12 threads yield the best performance.
- It was found that using the GPU had no noticeable benefit.
Hermes4 Joins Kagi’s Ranks: A member shared that the Kagi team added the Hermes4 model after being requested.
- Some users are finding that Kagi search results are comparable to Google.
WaveGate Offers Convnets a Second Chance: A member shared a link to a Simple and effective convolutional language model called WaveGate on GitHub.
- WaveGate is a modern take on convnets for text processing, an alternative to Transformers.

LM Studio Discord

LM Studio CLI Needs GUI Boot: The LM Studio CLI requires the GUI to be installed and run at least once before the command line interface can be used.
- This was confirmed by members who noted that the lm studio gui must be run before using lms commands.
Accessing LM Studio on Ubuntu Servers: To access LM Studio on a server version of Ubuntu without a GUI, it’s recommended to run a virtual desktop (VNC server), as any app configured to use an arbitrary OpenAI compatible endpoint could theoretically work.
- Members discussed that applications requiring an API key with LM Studio only need a value entered, irrespective of its content, like typing banana or pp.
MiniCPM-V-4_5-gguf Model Incompatible: The MiniCPM-V-4_5-gguf model isn’t yet supported in LM Studio due to required runtime updates.
- Members pointed out that the necessary runtimes haven’t been updated for this particular model.
Radeon Drivers Unlock VRAM: A member shared Radeon drivers and guide to enable full 32GB VRAM.
- Another user shared two videos and another video on how to get the drivers working on Windows.
Motherboard Met Its Maker: A member reported their desktop motherboard died and they replaced it with their old server board because all AM4.
- The user stated it is working again and joked about running a 171GB model with hopes and prayers.

Eleuther Discord

RL Credit Assignment Hacked: Members discussed how maximizing likelihood can be easily reward hacked when one is lazy with credit assignment, with one suggesting that current algorithms do prompt augmentation inefficiently, and that Neel Nanda’s thought anchors paper sheds light on it.
- A recent paper (arxiv.org/abs/2508.20722) attempts to mitigate the length bias problem by down sampling messier rollouts, but others dismiss this as circular reasoning.
HF Tokenizer Has Performance Hiccups: A member reported that while their new 16K tokenizer with Hf tokenizer has similar total tokens to gpt2, the hf tokenizer is extremely slow and resource intensive, even with batch processing and multiprocessing.
- They were seeking recommendations on strategies to speed it up the tokenizer, but none were offered.
Hybrid Linear Attention Hype Rises: A member expressed confidence in Hybrid Linear Attention and shared links to papers 2508.01483 and 2507.06457.
- It was not shared what made them so confident.
Debugging Uneven GPU Memory: A member sought tips on profiling or debugging uneven GPU memory usage when evaluating models on 8 GPUs using the lm-evaluation-harness, as with loglikelihood requests, one GPU was at ~60% memory usage while others were at ~25%.
- It was clarified that parallelize is intended for model sharding, but the models being used were small enough to fit on a single GPU.
Fused RoPE’s Inefficiency Suspected: A member suspects an implementation detail is causing inefficiency in the fused RoPE implementation, particularly for smaller RoPE percentages.
- They explained that support for a fused rope implementation was added after a neox paper was written that must be inefficient for smaller rope percentages.

GPU MODE Discord

HackDogs Link Generates Posting Proclamation: A member posted a link to the HackDogs event on Luma, which prompted a moderator to request that such posts be directed to the events or share-your-work channels.
- This highlights the importance of maintaining channel-specific content to keep the Discord organized.
Triton Community Call Tackles Tooling and Benchmarking: The upcoming community meetup will feature a presentation on the Multi-pass profiler, a federated GPU Tooling Framework from Meta, while Cicie Wang solicits feedback on tritonbench, especially from OpenAI users.
- Bill Yoshimi also seeks feedback on the current Triton testing strategy, to ensure adequate coverage and identify potential gaps.
Flex Attention Finds Faulty Footing: A member implementing sparse attention using flex attention found that the default block_size of 128 was much higher than their stride, leading to no sparsity improvement, but changing the block_size to be equal to the stride (16) increased sparsity to 47.73%.
- Despite the increased sparsity, the implemented flex attention runs about 2x slower than causal masking with block_size=128, and linked to their beacon-gpt repo while looking for suggestions for a better existing kernel such as FlashMask (PaddleNLP Docs).
cuSOLVER Shifts Sparse Solvers: Members discussed that cuSOLVER’s sparse components (cuSolverSP and cuSolverRF) are being deprecated in favor of cuDSS.
- The deprecation only applies to sparse direct solvers, while cuSolverDN for dense LAPACK remains active.
Iris Opens SHMEM-like Memory Avenues: AMD Research released Iris, an experimental, open-source library adding SHMEM-like Remote Memory Access (RMA) to Triton, supporting MI300X, MI350X, MI355X GPUs.
- Iris enables multi-GPU programming to feel like single-GPU and lets you quickly iterate over designs, algorithms, work distribution & assignment strategies in minutes, and is being offered to those competing in the AMD Developer Challenge.

OpenAI Discord

Mini AIs Code, But Claude Still Rules: While small, fast AI models are improving for some coding tasks, they are still considered far worse than Claude for complex coding, with Grok code being compared to illustrate this point.
- Members highlighted that mini models are cost-effective for simpler tasks, but cannot keep up with Claude for larger tasks.
AI Debated as Socialite or Loner: Discussions debated whether isolation is unnatural, referencing how complex creatures form societies for development, saying that full isolation is not productive for especially society but also literally every kind of gonochoric animals.
- The conversations questioned AI’s role in mirroring or diverging from natural social behaviors.
Living Memories Project Seeks Co-Creators: A member is building a consent-based, community-run way to collect stories and feedback from people who shaped them, like a living knowledge base, to steer culture more explicitly.
- They mentioned that OpenAI was asked to participate and assist, but that everything ends up being filtered.
DIY ImageGen AI: Prepare for Sticker Shock: Members discussed the difficulties of creating an image generation AI from scratch, citing the expense of hardware and obtaining quality training data, as well as the limitations of local models.
- It was mentioned that local models cannot be dynamically trained, and can only utilize context injection.
GPTs Going Offline?: Multiple users reported instances where GPT was unresponsive, failing to provide answers despite repeated attempts, and one user shared a chat log to demonstrate the issue.
- Suggested fixes included refreshing the page or sharing the chat log to see if others could access the response.

OpenRouter Discord

Kimi and GLM Fill Void After Deepseek: Members are using Kimi K2 (temp=0.6) and GLM 4.5 as Deepseek alternatives for chitchatting, in addition to a list of free models on OpenRouter.
- One user suggested OpenRouter offers better anonymity compared to direct use of Chutes or Deepseek.
Gemini 2.5 Flash Image Ghosts Users: Users reported that Gemini 2.5 flash image sometimes fails to deliver the image, sending only the text “here is the image”.
- As of now, the discussion provided no specific solutions or workarounds for this image transmission issue.
Deepseek V3 Plunges Into Gibberish: Users reported increased instability in Deepseek V3, with outputs becoming grammatically nonsensical.
- One user pinpointed that using V3 0324 and lowering the temperature might mitigate gibberish outputs.
Claude Code Caged, Users Cry Foul: A user reported severe usage limits on Claude Code, restricting usage to less than an hour.
- It was suggested that Codex could be a viable substitute, with new terms of service potentially causing this sudden usage restriction.
OpenRouter Dances with JanitorAI and Chub.ai?: A user speculates that OpenRouter might have mistakenly switched JanitorAI and Chub.ai in its internal app database.
- The theory is based on SimilarWeb metrics and JanitorAI’s recent brief downtime, with OpenRouter possibly storing the X-referer header and trimming everything after the domain name.

Latent Space Discord

Rork App Rapidly Ascends App Charts: Investor Matt Shumer introduced the Rork app, an AI tool that generates iPhone apps on demand, demonstrating its ability to produce a working frontend of a Notion clone in minutes via this X post.
- The app quickly gained traction, rocketing up the app store charts, showcasing the potential for AI-driven app development.
TAU-Bench Triumphs in LLM Testing: Lewtun introduced TAU-Bench via this X post as a novel approach to solving LLM hallucinations and tackling the complexities of the internet itself.
- The benchmark aims to provide a standardized way to evaluate and mitigate the issue of LLM inaccuracies and biased information.
Anthropic Announces Amazing $183B Valuation: Anthropic has secured $13B in Series F funding, achieving an impressive $183B post-money valuation as detailed in their official announcement.
- This funding round marks a significant milestone for Anthropic, underscoring the growing investor confidence in their AI technology and future prospects.
OpenAI Officially Obtains Statsig: OpenAI is acquiring Statsig, a product experimentation platform; Statsig will continue to operate independently from its Seattle and San Francisco offices, retaining all employees, and prioritizing uninterrupted service for existing customers, according to Statsig’s official blog post and OpenAI’s X post.
- This acquisition signals OpenAI’s strategic move to enhance its capabilities in product experimentation and data-driven decision-making.

HuggingFace Discord

E2B and Open Interpreter become BFFs: Cool agentic tools such as E2B, Open Interpreter, Langchain Python Tool, and LlamaIndex Code Interpreter were spotlighted by members.
- A member learning about agents also asked whether Gemini and GPT4 are instruct models, and another member confirmed that, linking to a Unsloth.ai guide.
SmolVLM2 Takes to the Skies on Android: A member inquired about finetuning smolvlm2 with video data and inference on Android, seeking guidance on practical implementations.
- Suggestions included using Transformers.js or Llama.cpp, along with links to fine-tune SmolVLM2 and Android inference examples.
Prompt Engineering Enters a New Era with Promposer.AI: A member unveiled Promposer.AI, a new AI dev tool for prompt engineering that allows users to write and iterate on prompts, add context/tools, and run structured test cases.
- A video demo of Promposer.AI is available at this link.
arxiv-agent Enters the Thunderdome of Debates: arxiv-agent was introduced, an agentic AI system that ingests an arXiv paper by ID and then spawns 3 personas (Optimist, Skeptic, Ethicist) to debate its claims, available on GitHub.
- A hosted demo is available on Hugging Face Spaces, but one user noted that it still does output something that someone who has 0 understanding of Nuclear Theory thinks looks professional.
ZeroGPU Spaces Get AOT Boost: Hugging Face announced a new recipe with ahead-of-time compilation (AOT) for optimizing ZeroGPU-powered demo Spaces, aiming for a smoother user experience.
- Users can leverage this recipe to improve their demo performance.

Modular (Mojo 🔥) Discord

Asynchronous Mojo Execution is Brewing: With async features arriving to Mojo, developers can potentially await the GPU to be ready and execute CPU tasks in the interim, mirroring the CUDA execution model where GPU kernels launch asynchronously while the CPU handles other tasks.
- Currently, Mojo requires manual implementation for simultaneous computing on CPU and GPU, lacking automatic language support due to the high cost of data movement and device suitability challenges.
Memory Safety with Bidirectional Pointers Emerges: Discussions have sparked around the possibility of memory-safe bidirectional pointers in Mojo, employing __moveinit__ and linear types to enhance pointer operation safety and efficiency.
- This approach is being explored for advanced memory management, specifically to ensure memory safety in Mojo’s pointer operations.
RDNA2 Architecture Faces WMMA Shortfalls: The absence of WMMA presents a challenge for RDNA2, a popular architecture in AMD CPUs with integrated GPUs, leading to discussions on implementing a universal fallback for GPU-shaped operations using target SIMD capabilities.
- A member noted that the current implementation has been tuned for Ampere+ and CDNA3+ architectures.
Matmul Fallback is Default for New Architectures: A basic matmul fallback is likely to serve as the default for new architectures until device-specific acceleration is developed.
- Older devices are being diverted from fallback paths due to assumptions about Nvidia having tensor cores and AMD supporting WMMA/MFMA, prompting a re-evaluation of how target information is managed.

Yannick Kilcher Discord

DeepSeek R1-valations Await?: Enthusiasts anticipate DeepSeek R1-level innovations, fueled by widespread efforts in the field, with many people reported to be working on it.
- Members think that this increases the odds of someone finding something interesting.
FastVLM Paper Set for Scrutiny: The community gears up to examine the FastVLM paper which seems to have manageable explanations.
- Resources on communication complexity and sign-rank bounds were shared, including this arXiv paper and this Wikipedia article.
Image Scaling Becomes a Threat: A novel prompt attack combining aliasing with prompt injection has emerged.
- The discussion on X has highlighted weaponizing image scaling against production AI systems, with more at this X post.

aider (Paul Gauthier) Discord

o4-mini Edges Out GPT-5 in Steerability**: A member reverted to o4-mini after three weeks with GPT-5/GPT-5-mini, citing better steerability and code closer to their preferences.
- While GPT-5 offers superior problem-solving, its increasing complexity akin to Gemini/Claude makes its code harder to digest, although other engineers didn’t have the same problem.
Navigating the Model Adjustment Maze: Engineers discussed the adjustment period when switching between models, suggesting it can take around three weeks to adapt.
- One member expressed annoyance at waiting for responses due to KYC requirements, raising questions about the friction in adopting new AI tools.
Nebius Botches GPT-OSS Implementation**: A member shared a Reddit link highlighting Nebius’s flawed handling of GPT-OSS.
- The comment suggests Nebius has a track record of missteps with open-source models, raising concerns about their reliability.
Livebench.ai Sparks Interest, But Lacks Key Metrics**: A member shared a link to Livebench.ai, noting its potential usefulness.
- Another engineer pointed out the difficulty in assessing its reasoning capabilities without knowing the completion token number.
Qwen Thrives Beyond Polyglot Benchmarks**: A user noted that Qwen’s performance on polyglot benchmarks is significantly lower than its actual performance.
- This observation followed a discussion about reasoning capabilities, with medium settings outperforming high settings, also impressive showing by mini and qwen according to a graph shared.

DSPy Discord

Stanford Releases Generative UI: Stanford introduces Generative UI, which uses FSM-graph interface flows as the new primitive, treating UIs as black-box plugins auto-synthesized and refined by LLMs, more info on GitHub.
- It will be interesting to see if FSM-graph interface flows is a better paradigm than previous attempts to do Generative UI.
Navigating Context Window Limits with OCR Analyzer: A user is building a PoC OCR analyzer and is running into context window issues with GEPA when including base64 image data in feedback and asks how to work around this.
- A member suggests that if the image is already part of the input, it need not be a part of the feedback; furthermore, they point to a GitHub pull request that should make working with images in GEPA easier.
Decoding DSPy Program Optimization Secrets: A user questions why optimized prompts extracted from a DSPy program aren’t recommended for inference, and wonders if DSPy could be dropped from production given its size/complexity.
- A member explains that an optimized DSPy program involves traces, training examples, demos, and signatures, and is not solely based on the prompt; in DSPy, the prompt consists of the user instruction, formatted types from the adapter, and few-shot examples in the system message.
DSPy Lambda Deployment Options Explored: Community members discussed solutions for deploying DSPy programs in AWS Lambda, including using Docker images to bypass size restrictions.
- Another member suggested that you can use lambda layers and also work around it. Additionally, another member pointed out that a new release has shrunk the binary size down to under 10Mb.
Optimizer Evolving into JIT Compiler?: The idea proposes automating metric generation and dataset creation for optimizers, where the optimizer dynamically chooses data points for testing.
- Another member replied, that if the optimizer chooses or creates a datapoint to test on then, it doesn’t even need to be an optimizer, it’s a JIT compiler.

Manus.im Discord Discord

Manus Dominates Agentic Space: Despite fierce competition in the agentic space, one user believes Manus retains some advantages.
- No details were provided about what those advantages might be.
Name Liberation Ideas: A user jokingly expressed bewilderment over their name and fantasized about liberating manus.
- They then humorously questioned their current location.

Moonshot AI (Kimi K-2) Discord

OpenRouter Credited as Source: A user identified openrouter as the source of a message.
- The context suggests the message likely involved model details or API usage related to AI models available via OpenRouter.
Qwen Suite Praised for Completeness: A user prefers the Qwen model suite for its completeness and consistent performance.
- The suite now includes image editing and WAN video generation capabilities, making it a comprehensive solution.

tinygrad (George Hotz) Discord

Tinygrad Explores In-Place Tensor Operations: A user questioned the safety of in-place operations in Tinygrad relative to PyTorch, where such operations may disrupt the computation graph and cause incorrect gradients.
- The user’s objective was to understand whether Tinygrad is production-ready when in-place modifications to tensors are needed for memory efficiency, instead of creating new tensors each time.
Memory Efficiency Achieved Via In-Place Tensor Modification: A user is trying to modify input tensors in-place to boost memory efficiency, which prevents creating new tensors per iteration.
- This contrasts with producing new tensors, which can consume more memory.

LLM Agents (Berkeley MOOC) Discord

Inquiry on Curriculum Schedule: A member inquired about the schedule for publishing the curriculum for the semester.
- They wanted to know if it would be released ahead of time or on a weekly basis.
Question about Course Content Access: A member asked when the semester would be released.
- No response was given.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (926 messages🔥🔥🔥):

Vertical Tabs, Opus 4.1 Pro, Unlimited LABs, Comet Mobile, Max Assistant Toggle Bar

Opus 4.1 Gets Division Wrong, Typo Helps: A user reported that Opus 4.1 Pro made an error on Claude’s own platform, but a typo in the prompt interestingly increased the chance of it getting the answer right.
- The user joked that the typo improved the result from a 10-20% chance to around 50%.
Is Unlimited LABs Worth the Squeeze?: Users discussed whether Unlimited LABs is worth it for unlimited deep research, especially with knowledge uploaded files.
- One user considered Unlimited LABs worthy, due to the context window increases mentioned by the CEO, although others felt that ChatGPT still reigns king.
Comet Mobile Coming Soon!: The CEO said Comet Mobile will be coming in a few more weeks.
- A user noticed the response about Comet Mobile was not straightforward.
Users Baffled at Model Selector Shortcut Feature: Users noticed there is a model selector in shortcuts, but no one has discussed this feature yet.
- One user asked why no one discussed here that shortcuts has model selector feature?
Study Mode Out, But Not for All: Users noticed the new study mode is available, but only for the education platform right now.
- One user expressed disappointment that study mode is not yet available for the enterprise pro plan.

adhdmachine: https://perplexity.ai/browser/claim/5D9NCPBNC1

Perplexity AI ▷ #pplx-api (2 messages):

Perplexity Courses, Perplexity Guides, Mastering Perplexity AI

User Seeks Perplexity Pro Mastery: A Perplexity Pro user expressed interest in mastering the platform and inquired about the availability of courses or detailed guides.
Demand for Perplexity Training Resources Surfaces: A Pro user is seeking resources to become proficient with Perplexity AI, suggesting a potential demand for training materials and comprehensive guides.

Unsloth AI (Daniel Han) ▷ #general (321 messages🔥🔥):

Hermes-4-14B, Multilingual LLMs, AI game NPCs, MiniCPM, Unsloth events in SF

Hermes-4-14B is released!: The new Hermes-4-14B model has been released by NousResearch in BF16 and FP8 versions.
- Some members say the older 70b model has meh performance and are awaiting Unsloth’s Dynamic 2.0 GGUF release. One member noted, thinking destroys creativity and even nuance.
Apertus LLM claims support for 1811 Languages: The new Apertus LLM claims to support 1811 languages but members are skeptical, suggesting most languages are supported via low-resource web scraping.
- It was noted that Apertus LLM weaves in something about switzerland, and includes only about 20 high-resource languages. It appears it was trained on Russian injections, according to this factcheck.
Dev dreams of AI NPCs in Games: A member is dreaming about making a game where AI NPCs have to finish the game, possibly to create a racing game where AI characters try to stop you.
- They referenced this paper and imagined a boss that learns your patterns.
MiniCPM-V excels in video understanding!: Members are impressed by MiniCPM-V-4_5’s new 3D resampling method for video compression, enabling accurate video token processing for an 8B model.
- One user got 100tps on their 5090, and noted that it passes test cases better than qwen2.5vl, especially for detecting unique human behaviours in videos.
Unsloth to host SF Events with AWS, NVIDIA, Mistral!: Unsloth is collaborating with AWS and others for an event next Thursday in SF, see this link.
- There will be another event on Oct 22 with Nvidia and Mistral during PyTorch week; stickers and t-shirts will be available!

Unsloth AI (Daniel Han) ▷ #off-topic (101 messages🔥🔥):

TTS Self-Deprecation, Underground Bunker, Loss Curve, 3D WebGL Shader Generation, Dataset Compilation & Filtering

TTS Demo turns Self-Own: A member jokingly self-deprecated his TTS demo, expressing that it might have been more self-deprecating than promotional.
Dreaming of Compute-Filled Bunkers: One member shared his dream of building an underground bunker filled with compute, an arcade, a reading nook, a bedroom and a kitchen.
Ongoing Loss Curve: Members shared images of their loss curve which had still not plateau-ing and continued training.
Free 3D Shader Generation Quest: A member asked for recommendations for free tools that can generate shaders and 3D WebGL effects from prompts, seeking code generation capabilities.
Dataset Compilation Woes: A member is compiling 13 datasets into one, exceeding 225GB, and is struggling with slow Turkish internet speeds, noting that the best you get is 30-40.

Unsloth AI (Daniel Han) ▷ #help (33 messages🔥):

GPT-OSS-20B Colab Notebook issues, Multilingual LLM fine-tuning Datasets, Qwen 3 SFTTrainer setup errors

GPT-OSS-20B Colab Notebook Might Be Broken: Several members reported issues with the GPT-OSS-20B Colab notebook, with one suspecting it may be broken after diagnosing formatting issues in their dataset for days.
- A member confirmed a dataset logging issue but stated *“everything else works fine as is.”
LLM Multilingual Fine-Tuning Seeks Datasets and Support: A member is fine-tuning an LLM for human-like chat generation (multilingual) and is seeking good human-human conversation datasets and an LLM that works well for multilingual fine-tuning.
- They are currently using Cornell, DailyDialog, Human DPO, Empathetic, PersonaChat, and some Hinglish datasets, experiencing issues with Gemma and Qwen 3.
Fine-Tuned GPT-OSS Model Faces Issues: A user reported issues when testing the fine-tuned version of the GPT-OSS model, referencing a tutorial from the Unsloth documentation.
- The attached image shows what appears to be a failure at step 7, and this Github issue was posted as potentially related.

Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

GPU Lister Tool, VRAM amounts

New GPU Lister Tool for Windows and Linux!: A new tool for listing GPUs and VRAM amounts from Python in Windows and Linux has been released on GitHub.
Accurate VRAM listing in Windows!: The tool is noted for its accuracy in Windows, where getting correct VRAM information can be challenging.

Unsloth AI (Daniel Han) ▷ #research (55 messages🔥🔥):

HRM model mimicking brain, Architectures for AGI, Transformers & Brain Communication, Data efficiency, Self-supervised image models matching human brains

HRM Mimics Brain for AGI?: A member mentioned HRM, a model mimicking the fast and slow thinking of the brain, achieved a good score using only 27M params and it isn’t a decoder-only transformer.
- It was suggested architectures like HRM could aid in reaching AGI, combined with the idea that transformers complete brain-needed inter and intra-level communication, implying we should improve components around the architecture.
Brains and AI Need More COCONUT?: It was argued that 100x more research is needed into architectures similar to HRM and COCONUT, such architectures resemble the brain better than traditional dense LLMs, and data efficiency is what makes AI and brains so different, referencing this paper.
- The belief is that improving data efficiency gets us closer to the right AGI track than hyper-focusing on inference-time cost reduction with MoE.
Self-Supervised Image Models: Brain Scan?: It was said that self-supervised image models actually match human brains, pointing to a comparison of self-supervised image models with the human brain (image attached to the message), emphasis on the prefrontal cortex.
- It was asked why self-supervised image models would match human brains, with someone replying with this X post, and suggesting that overparameterization side effects may be responsible.
Training Checkpoint Saves the Day?: A member proposed a life hack to improve training without more data: start training, save a checkpoint when the loss is good enough, then restart training from that checkpoint because restarting shuffles the dataset to add variation.
- The same member said when you start training, each time the dataset shuffles and it adds variation for the training.

LMArena ▷ #general (337 messages🔥🔥):

Pixar style conversion, ByteDance USO, Video to audio AI, Remove LM Arena censorship, Google accounts for privacy

ByteDance USO outshines others at Pixar Style Conversion: Members discussed converting images into Pixar styles, noting that ByteDance’s USO excels in style transfer compared to other tools.
- Despite attempts with prompts like ‘make it pixar style’ and ‘copy the style from 2nd pic to the 1st pic,’ results were deemed meh, highlighting USO’s superior performance.
Kling AI generates audio for videos: Members sought AI tools to add audio to videos created in the arena, and one suggested Kling AI for video-to-audio generation.
- One user asked about selecting a specific model in the arena, while another mentioned they waste money on AI subscriptions and earn nothing.
No removing the LM Arena censorship filter: A user inquired about removing the LM Arena censorship due to false flagging of their story content.
- A moderator clarified that there isn’t a way to remove the filter but encouraged users to share examples of prompts that are wrongly flagged in the designated channel.
Veo 3 account, student email verification: Members discussed how to verify Veo 3 accounts made from student emails, with one user advising the use of a temporary credit card.
- One user noted that their real university email worked after initial failures with fake emails.
The leaderboards are nuts: Members have observed that Gemini 2.5 Pro Experimental remains at the top of the LM Arena Leaderboard despite being five months old.
- One member speculates that OpenAI is disgraced since their latest models can’t match Google.

LMArena ▷ #announcements (1 messages):

User Login, Google Sign-in, Chat History, Bug Reports, Feedback

LMArena Rolls Out User Login with Google Sign-In: LMArena has begun rolling out User Login with Google Account support, allowing users to access their chat history on different devices.
- Users can merge existing chats with their account using the Merge existing chats with your account toggle during login, and can log out via the sidebar.
Bug Reports and Feedback Channels Open: Users are encouraged to report any bugs in the designated <#1343291835845578853> channel and share feedback in the <#1372230675914031105> channel.
- The rollout is gradual, with some users not having immediate access, and plans for more sign-in options are underway.

Cursor Community ▷ #general (175 messages🔥🔥):

BSOD and Git, Sonic Model Transition to Grok, Managing Cursor's Background Agents, Token Usage and Optimization Tips, Student Trial Issues

BSOD Teaches Git Commitment: A user experienced data loss due to a BSOD after extensive edits without Git commits, emphasizing the importance of frequent commits.
- The user was unable to recover the file, highlighting a painful lesson in version control.
Sonic Transforms into Grok Coder: The sonic model, previously available for free, is now officially named grok-code-fast-1 with the free access period extended until September 10, 2025 (PDT).
- Users noted its reliability and speed, but pointed out the need for guardrails to keep it focused.
Agent State Transfer Saves Sanity: Users discussed issues with Cursor’s background agents becoming unresponsive or deferring work, suggesting the use of state transfer summaries and new chats as a workaround.
- It was recommended to instruct the agent to create a comprehensive state transfer summary at the end of a chat and paste it into a new one.
Token Usage causes Shock: Users debated high token usage in Cursor, with one user reporting 6M tokens in 1 prompt, which other users found extremely high.
- Tips included using the @file command sparingly, checking usage summaries on the dashboard, and breaking code into smaller files (around 700 lines each) to optimize token usage.
Student Trial Troubles Trigger: A user is having difficulty claiming the student 1-year free trial of Cursor Pro, facing issues with document uploads and verification limits.
- It was clarified that the student offer typically applies to email domains ending in .edu, and users facing issues may need to contact SheerID customer support.

Cursor Community ▷ #background-agents (10 messages🔥):

Linear + Cursor as BA, Uploading image/screenshot (png) to the conversation, Spinning up BAs via github issue comments, Background Agents setup with a Dockerfile, AGENTS.md support in background agents

Linear + Cursor: User seeks image upload solution: A user is seeking guidance on how to upload an image/screenshot (png) to a Linear + Cursor conversation.
- They mentioned trying to add it as an attachment to the Linear conversation, but it shows up empty on the Cursor Agent page.
Github Issue Comments: BAs fail to launch: A user reported issues spinning up Background Agents (BAs) via GitHub issue comments.
- The error was because the snapshot no longer existed after re-authenticating to GitHub, leading the user to consider using a Dockerfile instead.
AGENTS.md: Lack of support reported: A user reported the lack of AGENTS.md support in background agents and linked to a Cursor forum post.
- The user also questioned if there was a way to run through the Background Agents setup with a Dockerfile rather than a machine snapshot to validate environment setup.
Background Agents: Dockerfile flow uncertainty: A user couldn’t find a direct way to run Background Agents setup with a Dockerfile and resorted to merging it into the main branch.
- The user remains uncertain whether it uses the Dockerfile from the source branch or from the default branch, recommending committing to a branch + push and then try to use that branch.

Nous Research AI ▷ #announcements (1 messages):

Hermes 4 14B Release, BF16, FP8, GGUF

Hermes 4 14B Now Available: The release of Hermes 4 14B has been announced.
- Links have been provided for the BF16 (https://huggingface.co/NousResearch/Hermes-4-14B) and FP8 (https://huggingface.co/NousResearch/Hermes-4-14B-FP8) versions, with GGUF versions coming soon!
New Hermes Model in BF16 and FP8: The new Hermes model is available in BF16 and FP8 formats.
- The GGUF version is expected to be released soon.

Nous Research AI ▷ #general (179 messages🔥🔥):

Hermes-4-14B GGUF release, Gemma3-1b model, imatrix training and performance, Kagi search engine

Hermes-4-14B GGUF Release Imminent: Members eagerly anticipate the release of Hermes-4-14B GGUF, with standard GGUF quants (Q8_0, Q6_K, Q5_K_M, Q4_K_M) being uploaded to a modelz repo for initial testing.
- The model is praised for its steerability and user control, contrasting with Qwen3-14B’s limitations.
Gemma3-1b Model Debuts Nutty Performance: A member released the utopia-atomic a post-trained Gemma3-1b model, describing it as a bit nutty due to its eager behavior.
- Another member confirmed that Gemma 3b is multimodal, and that they use it frequently.
iMatrix Training Tweaks Uncovered: Members experiment with iMatrix training, discussing optimal CPU thread counts and context sizes, discovering that 12 threads yields the best performance.
- It was found that using the GPU had no noticeable benefit, with one member saying, So maybe only consecutive computations are done? no need to split threads? No benifit to use GPU?
Kagi Search adds Hermes4 Model: A member shares that the Kagi team added the Hermes4 model after being requested.
- Others chimed in and one user noted that with ublock i get identical results if not better than kagi on google.

Nous Research AI ▷ #interesting-links (1 messages):

Convolutional Language Model, WaveGate, Transformers, Language Models

WaveGate: Simple ConvNet for Text: A member shared a link to a Simple and effective convolutional language model called WaveGate.
- The project is available on GitHub under the username jackangel.
Transformers vs Convnets: WaveGate is a modern take on convnets for text processing, an alternative to Transformers.
- Some members debated the tradeoffs between WaveGate and the usual transformer architectures.

LM Studio ▷ #general (76 messages🔥🔥):

LM Studio CLI on Ubuntu Server, OpenAI compatible apps, LM Studio API Key, MiniCPM-V-4_5-gguf Model, ComfyUI Tutorial

LM Studio CLI Requires GUI Boot: Although LM Studio can run via CLI, it must be installed and run through the GUI at least once before using the command line interface.
- One member confirmed that you need to run the lm studio gui at least once before you can run lms commands in the cli.
Decoding Ubuntu Server LM Studio Access: To access LM Studio after installing it on a server version of Ubuntu (without a GUI), the recommended method is to run a virtual desktop (VNC server).
- Any app configured to use an arbitrary OpenAI compatible endpoint could theoretically work by specifying the endpoint URL/port.
API Keys in LM Studio: When using LM Studio with applications that require an API key, the value doesn’t matter, but you still need to input a value.
- One member said you can type in literally anything. like, type in banana, or pp, or meow. really whatever. It just needs to have a value at all.
MiniCPM-V-4_5-gguf Compatibility Check: The MiniCPM-V-4_5-gguf model is not yet supported in LM Studio due to runtime updates being required.
- Members noted the runtimes aren’t updated for that model.
ComfyUI setup is not comfy: There is no such thing as a good tutorial to set up ComfyUI.
- One member joked that Good Tutorial + comfyui doesn’t really exist tbh.

LM Studio ▷ #hardware-discussion (72 messages🔥🔥):

GPU Load Settings, Radeon Drivers, Motherboard Failure, Shared Memory, kv cache

Radeon Drivers Installation Opens 32GB VRAM: A member shares a link to Radeon drivers hoping that it will help another member get a full 32GB VRAM working, as well as a guide to install Radeon on Ubuntu 22.04.
- The member also shared two videos and this other video which explains that they need special drivers to get it working on Windows.
Members Debate GPU Load Settings for Qwen3: Members discuss their GPU load settings to load Qwen3 within 32GB, with one confirming everything is on GPU.
- One of the member states they can load 17GB with Q4_K_M but over 22GB gets rejected, then conclude that 18-20GB’s is my limit.
Desktop Motherboard Bites the Dust: A member reports that their desktop motherboard died but they managed to replace it with their old server board since all AM4.
- While one member jokes about running a 171GB model with hopes and prayers, the user confirms that it is working again.
Digging Into Shared Memory Access: A member is running into issues with shared memory access and suspects it is related to the APU and the amount of RAM they have.
- They think I shouldn’t be spilling into shared memory after 16GB’s and also plan to check if kv cache is on GPU and not doing moe cpu.

Eleuther ▷ #general (97 messages🔥🔥):

RL Credit Assignment, LLM Reward Hacking, Thought Anchors Paper, HF Tokenizer Performance, Child Prodigy AIs

Credit Assignment Remains Thorny for RL: Members debated whether credit assignment is the whole story behind RL challenges, with one suggesting that current algorithms do prompt augmentation inefficiently, and that Neel Nanda’s thought anchors paper sheds light on it.
- They pointed out that maximizing likelihood can be easily reward hacked when one is lazy with credit assignment.
Length Bias Problem Solved Messily: A recent paper (arxiv.org/abs/2508.20722) mitigates the length bias problem by down sampling messier rollouts, training models to produce shorter responses.
- However, a member commented that this is just a case of circular reasoning, and misinterprets the results by claiming the model is learning to reason more effectively.
HF Tokenizer Hits Performance Hiccups: A member created a new 16K tokenizer with Hf tokenizer, and while the total tokens are similar to gpt2, the hf tokenizer is extremely slow and resource intensive, even with batch processing and multiprocessing.
- They were seeking recommendations on strategies to speed it up.
ASI Requires Non-STEM Evaluation: Members discussed scaling challenges with current AI evaluation methods, with one suggesting that stem-like evaluation may not work for desired activities without substantial changes, and that the problem of evaluating skills not easily rewarded or evaluated is unsolved.
- One member asked whether we’ll be able to train ASI using human preference data or task-specific rewards, but they still seem prone to bias.
ASI resembles Child Prodigies: One member stated that AI models will be good at anything we have child prodigies for, as examples of exceptional performance are seen in stuff you also see child prodigies in.
- Another added that taste and style in music is more than parental feedback and requires the AI to be embodied, for higher signal.

Eleuther ▷ #research (6 messages):

Perfect Diffusion, Hybrid Linear Attention, RWKV efficiency

Perfect Diffusion: The Blogpost: A member shared a link to a blog post version of the Perfect Diffusion paper 2507.12469 by the author.
- The blog post provides an accessible explanation of the concepts discussed in the original research paper.
Hybrid Linear Attention hype train: A member expressed confidence in Hybrid Linear Attention and shared links to 2508.01483 and 2507.06457.
- They seemed very confident abt hybrid linear attention.
RWKV missing from survey: A member noted the absence of RWKV 7 in a survey, speculating it was probably cuz of efficiency reasons.
- They are a little sad that they didn’t include it.

Eleuther ▷ #lm-thunderdome (14 messages🔥):

lm evaluation harness, GPU memory usage, loglikelihood requests, generate_until requests, batch size recalculation

Debugging uneven GPU memory usage in lm-evaluation-harness: A member sought tips on profiling or debugging uneven GPU memory usage when evaluating models on 8 GPUs using the lm-evaluation-harness.
- Specifically, with loglikelihood requests, one GPU was at ~60% memory usage while others were at ~25%, and generate_until requests resulted in only 10% GPU utilization.
parallelize in model args: Does it help?: The member tried using the parallelize argument in model args, but it didn’t seem to help with uneven GPU memory usage.
- It was clarified that parallelize is intended for model sharding, but the models being used were small enough to fit on a single GPU.
Running evaluation with accelerate launch: A member shared their accelerate launch command for running evaluation, aiming to reproduce the Hugging Face leaderboard results for qwen2.5-1.5b-instruct.
- The command included arguments like --apply_chat_template, --fewshot_as_multiturn, and --gen_kwargs to faithfully replicate the leaderboard settings.
Recalculating batch size behavior differs for loglikelihood vs. generate_until: The member noted that the batch size is recalculated multiple times for loglikelihood requests but not at all for generate_until requests.
- They speculated that recalculating the batch size for generate_until might improve GPU utilization, as it could potentially lead to a larger batch size.
Understanding the Difference Between generate_until and loglikelihood: A member suggested that loglikelihood is calculated for each sample as many times as the number of options.
- In contrast, generate_until calculates it only once.

Eleuther ▷ #gpt-neox-dev (4 messages):

MLSys conferences, fused rope implementation

MLSys Conference Intel: A member expressed interest in MLSys conferences, noting that their income is going towards GPU hours and meeting people.
- Another member responded, “Not for people. Many names of conferences though.”
Fused RoPE’s Inefficiency: A member suspects an implementation detail is causing inefficiency in the fused RoPE implementation, particularly for smaller RoPE percentages.
- They explained that “We added support for a fused rope implementation after that neox paper was written that must be inefficient for smaller rope percentages.”

GPU MODE ▷ #general (2 messages):

HackDogs Luma Link

HackDogs Event Luma Link Shared: A member shared a link to the HackDogs event on Luma.
- A moderator requested that future posts of this type be placed in either the events or share-your-work channels.
Moderator Request for Channel-Specific Posting: A moderator requested that a link shared be posted in the appropriate channel.
- Specifically, they mentioned using either the events or share-your-work channels for such content.

GPU MODE ▷ #triton (1 messages):

Community Meetup, Multi-pass profiler, Triton Developer Conference, tritonbench users, Triton testing strategy

Community Meetup scheduled: The monthly community meetup is scheduled for tomorrow at 10am PST; see this link for the invite.
Profiling Framework Presentation: Kevin Fang et al. from Meta will present the Multi-pass profiler, a federated GPU Tooling Framework for Orchestrated and LLM Agentic Profiling Applications.
Triton Conference Updates teased: Ofer Dekel from Microsoft will provide updates on the upcoming Triton Developer Conference.
tritonbench user poll: Cicie Wang from Meta is asking who is using tritonbench and soliciting feedback on how it’s being utilized, specifically mentioning OpenAI.
Testing Strategy Feedback solicited: Bill Yoshimi from Meta seeks feedback on the current Triton testing strategy, asking what might be missing and where additional coverage is desired.

GPU MODE ▷ #torch (1 messages):

Flex Attention, Sparse Attention, block_size vs stride, FlashMask

Flex Attention yields Sparse Attention Strategies: A member implemented sparse attention using flex attention and found that the default block_size of 128 was much higher than their stride (above stride=16), leading to no sparsity improvement.
- Changing the block_size to be equal to the stride (16) increased sparsity to 47.73% (vs 30% by default).
Flex Attention runs slower with same sparsity: The implemented flex attention runs about 2x slower than causal masking with block_size=128 despite having the same sparsity.
- The member has no idea why this is the case.
FlashMask Kernel Suggested for Attention: The member asked for suggestions for a better existing kernel and mentioned finding FlashMask (PaddleNLP Docs).
- They noted it’s not well documented so they haven’t successfully tried it yet, while linking to their beacon-gpt repo.

GPU MODE ▷ #algorithms (3 messages):

cuSOLVER, cuSolverSP, cuSolverRF, cuSolverDN, cuDSS

cuSOLVER’s Sparse Capabilities Replaced: Members discussed that cuSOLVER, specifically its sparse components (cuSolverSP and cuSolverRF), are being deprecated in favor of cuDSS.
- It was clarified that the deprecation only applies to sparse direct solvers, while cuSolverDN for dense LAPACK remains active.
cuSOLVER’s Future still Dense: It was clarified that the deprecation only applies to sparse direct solvers, while cuSolverDN for dense LAPACK remains active.
- This transition impacts everything concerning sparse linear algebra (sparse LA), with dense linear algebra (dense LAPACK) continuing under cuSolverDN.

GPU MODE ▷ #beginner (8 messages🔥):

Partial Warps in CUDA, CUDA thread management, GPU Recommendations for Local CUDA Development

CUDA Manages Threads in Partial Warps: When dealing with partial warps in CUDA, the system leans towards creating dummy threads instead of forming warps with fewer than 32 threads, which leads to thread divergence.
- The smallest scheduling unit is the warp, so attempting to create purposeful partial warps to gain more resources per thread is not feasible as CUDA allocates a full warp and masks off some threads.
Seek GPU Recommendation for Local CUDA: A member inquired about recommended GPUs for local CUDA development, expressing interest in acquiring one for their setup.
- No specific recommendations were provided in the given context.

GPU MODE ▷ #off-topic (12 messages🔥):

Anime Recommendations, Hidden Gem Anime, Nonlinear Storytelling, Grimgar Popularity

Berserk, Naruto, and Attack on Titan Top Anime List: A member listed Berserk, Naruto, and Attack on Titan as their favorite anime.
- In response to the thread’s request for hidden gems, other users went on to recommend many more series to check out.
Spoilers?: A member recalled that Dune spoiled the whole plot at the beginning but still forced the readers to read until the end.
- Another member joked when the author does it, it isn’t a spoiler, and called it nonlinear storytelling.
Grimgar dubbed realistic Isekai: A member recommended Hai to Gensou no Grimgar, describing it as what if we made an isekai as realistic as possible, which turns out extremely brutal but ultimately uplifting.
- This member added that Seirei no Moribito is a coming of age story in late Edo, on the run from assassins and Noragami is slop with good characters who you end up actually caring about.
Wondance hailed as dance manga: A member recommended the manga Wondance because the author is something of a dancer himself and the advisors are amazing, but warned some of my friends say they see nothing at all.
- The member went on to add I love the way he can compress a whole sequence into a single still image.
Grimgar is popular in Vietnam: A member was pleased to learn that Grimgar is reasonably popular in Vietnam, having been picked up by the largest publisher.
- Another member mentioned that it is pretty much completely unknown in the US.

GPU MODE ▷ #self-promotion (2 messages):

AMD Developer Challenge, Multi-GPU kernels, Iris library, SHMEM-like Remote Memory Access, MoE Inference Economics

AMD Challenges Developers with Iris Library: For those competing in the AMD Developer Challenge, the Iris library might help with your multi-GPU kernels.
- Iris is an experimental, open-source library from the AMD Research team that adds SHMEM-like Remote Memory Access (RMA) to Triton — making multi-GPU programming feel like single-GPU and letting you quickly iterate over designs.
Iris Blossoms with Remote Memory Access for Triton: The Iris library features pure Python + Triton (~370 LOC), examples from simple memory ops to fused/overlapped GEMM, familiar PyTorch and Triton-like APIs, and supports MI300X, MI350X, MI355X.
- A GitHub link to Iris was provided, and a GPU Mode talk on Iris is forthcoming.
Deep Dive into MoE Inference Economics: If you are interested in the topic of MoE inference you might check out our new piece on “MoE Inference Economics from First Principles” at Tensor Economics.
- The article was also promoted on X.

GPU MODE ▷ #🍿 (15 messages🔥):

BackendBench PR Discussion, Native Code Integration with load_inline and compile_kernel, CuteDSL/tilelang Kernel Generation Challenges, nvrtc Backend and Custom Kernels, compile_kernel reuse in PyTorch

BackendBench PR sparks Kernel Konversation: A discussion started about BackendBench PR #134 and PR #135 focusing on native code integration using load_inline and compile_kernel.
- The integration aims to simplify the process for CuteDSL/tilelang, but generating correct kernels, even with advanced models like Claude, has proven challenging.
Custom Kernels get NVRTC Nuances: The addition of an NVRTC backend for custom kernel support was discussed, with the intention of designing it to allow different backends to share implementations of various DSLs.
- The compile_kernel feature in PyTorch was specifically mentioned for its potential to facilitate the reuse of code in this context, as that was its original intention.
Compile Kernel Convenience Considerations: The discussion covered the usability of compile_kernel, including suggestions for automatically adding include paths similar to load()/load_inline().
- Concerns were raised about the separation of kernel_source and header_code, with a suggestion to combine them, but the split was to avoid long compile times from C++ headers.
CUDA Include Directory Quandaries: The issue of managing cuda_include_dirs was addressed, with the challenge of accommodating the diverse ways users install CUDA (e.g., via conda).
- The proposed solution involves relying on the system installation and prompting users to manually set the directory if it is not found, rather than implementing complex discovery logic.

GPU MODE ▷ #submissions (3 messages):

MI300 performance, MI300x8 all2all performance

MI300 Achieves First Place: A member achieved first place on the MI300 leaderboard with a time of 2.66 ms.
- This submission was identified with id 34649 on the trimul leaderboard.
MI300x8 Dominates all2all Leaderboard: A member secured first place on the MI300x8 all2all leaderboard, initially with 42.0 ms.
- The user then improved their time to 15.2 ms, with submissions identified as id 34654 and 34682 respectively.

GPU MODE ▷ #tpu (7 messages):

TPU experimentation, Jax TPU book, TPU versions

Experimentation is suggested for new TPU user: A new TPU user asked for suggestions on where to start, and a member suggested reproducing results from the Jax TPU book with real experiments, noting it “feels generous” although mapping it to GPUs isn’t straightforward.
- The member shared a link to the Jax TPU Scaling Book.
TPU Upgrades to v5 and v6: A member noted that TPUs have been upgraded to v5 and v6, recalling their last experience with v3.
- The same member noted that “v5e has been harder to get”, and they’re currently using v4, with no upgrade to v6e yet.

GPU MODE ▷ #amd-competition (37 messages🔥):

Rocm Iris library, Nccl vs rccl, SHMEM-like Remote Memory Access, torch.distributed vs hipMemcpyPeer, random expert weights

Iris Library Blossoms for Multi-GPU Triton: AMD Research released Iris, an experimental, open-source library adding SHMEM-like Remote Memory Access (RMA) to Triton, supporting MI300X, MI350X, MI355X GPUs.
- Iris enables multi-GPU programming to feel like single-GPU and lets you quickly iterate over designs, algorithms, work distribution & assignment strategies in minutes.
NCCL Confusion Dissipates: A user noticed that dist.init_process_group uses nccl instead of rccl in the AMD reference kernels, and it was clarified that it’s like cuda.
Expert Weight Randomization Implemented: To prevent solutions passing tests without proper distributed communication, a PR was made to assign a random weight to each expert on every rank.
- It was also noted that to ensure randomness, the RNG seed should be different for each rank and changed to gen.manual_seed(seed + rank).
P2P Transfer Performance Faceoff: Discussion arose on the performance differences between using torch.distributed for P2P transfers versus calling hipMemcpyPeer directly in HIP.
- A member suggested that torch.distributed will be some more opportunities to overlap communication and computation.

GPU MODE ▷ #cutlass (3 messages):

CUTLASS, FP8 Blockscaled GEMM, Hopper GPUs

CUTLASS Template Programming Praised: A member defended the complexity of CUTLASS, asserting that its developers are crazy smart and adept at template programming, suggesting to “read the code”.
- They linked to a NVIDIA blog post on improving GEMM kernel auto-tuning with CUTLASS 4.2.
Deep Dive into FP8 Blockscaled GEMM with CUTLASS on Hopper GPUs: A member shared a link to a webinar by Colfax on a CUTLASS Deep Dive: FP8 Blockscaled GEMM With CUTLASS on Hopper GPUs.
- The webinar appears to be focused on using CUTLASS for FP8 Blockscaled GEMM on NVIDIA Hopper GPUs.

GPU MODE ▷ #multi-gpu (1 messages):

bglick: NSight-Systems usually gives you the best entry point.

GPU MODE ▷ #low-bit-training (6 messages):

NVFP4 Training, Muon Optimizer, CUDA Kernel for FP4

NVFP4 trains with precision: A blog post from NVIDIA discusses how NVFP4 trains with the precision of 16-bit and speed and efficiency of 4-bit.
Muon Optimizer tests FP8 and FP4: A member will be testing what happens if they train in FP8 and FP4 through quantization in the Muon optimizer.
- They anticipate it will be slow, but it will be interesting to see how it interacts.
CUDA kernel planned for FP4: A member is planning to write the kernel in CUDA for FP4 and share it.
Steps increase for FP4: A member is increasing the step for FP4, as they see that it’s technically viable, and they think it will be beneficial.

OpenAI ▷ #ai-discussions (98 messages🔥🔥):

AI Coding Quality: Mini vs. Claude, AI and the Concept of Natural Social Structures, AI's Role in Preserving Personal Histories, Challenges and Costs of Creating Custom AI Image Generators, GPT Unresponsiveness Troubleshooting

Mini AIs Code, But Claude Reigns Supreme: While small, fast AI models are improving and becoming useful for some coding tasks, they are still considered far worse than Claude for complex coding.
- One member compared Grok code with other mini models to illustrate their relative inadequacy, despite their cost-effectiveness for simpler tasks.
AI Isn’t the Most Natural of Buddies: Discussions around AI and ‘natural’ social structures debated whether isolation is unnatural, referencing how complex creatures form societies for development.
- One member shared full isolation is not productive for especially society but also literally every kind of gonochoric animals
Living Memories Chain Letter: A member is trying to create a consent-based, community-run way to collect stories and feedback from people who shaped them, like a living knowledge base.
- This approach aims to make culture more explicit and steerable, versus an emergent property that loses specifics over time, also OpenAI was asked to join the fun and help but everything ends up being filtered.
Roll Your Own ImageGen AI is Costly: Members discussed the difficulties of creating an image generation AI from scratch, citing the expense of hardware and obtaining quality training data.
- Someone suggested the limitations of local models, as they cannot be dynamically trained, and can only utilize context injection.
GPT Goes Ghosting Users: Multiple users reported instances where GPT was unresponsive, failing to provide answers despite repeated attempts.
- Suggestions for troubleshooting included refreshing the page or sharing the chat log to see if others could access the response. Here’s one user’s shared link.

OpenRouter ▷ #app-showcase (2 messages):

geocoding, photon.komoot.io

Photon provides world-wide geocoding: A member shared a link to photon.komoot.io, suggesting it might be of interest for world-wide geocoding.
Geocoding with Photon: The user shared photon.komoot.io as a resource.

OpenRouter ▷ #general (87 messages🔥🔥):

Alternatives to Deepseek, OpenRouter anonymity, Gemini 2.5 flash image problem, Chutes deepseek v3 free, OpenRouter server issues

Kimi and GLM as Deepseek Alternatives: Members suggested using Kimi K2 (temp=0.6) and GLM 4.5 as alternatives to Deepseek for chitchatting, also pointing out a list of free models on OpenRouter.
- One member said that using OpenRouter provides better anonymity compared to using Chutes or Deepseek directly.
Gemini 2.5 Flash Image Fails: A user reported an issue where Gemini 2.5 flash image sometimes sends the text “here is the image” but does not actually send the image.
- No specific solutions or workarounds were mentioned in the discussion.
Deepseek V3 Instability Woes: Users reported that Deepseek V3 is becoming unstable and producing grammatically nonsensical outputs.
- One user experiencing gibberish outputs suggested lowering the temperature, others experiencing the same problems were using V3 0324.
Claude Sonnet’s Code Nerfed: One user reported that their Claude Code usage has been severely limited, restricting its use to less than an hour straight.
- It was suggested that Codex is a decent replacement and that new terms might be the cause of the limitation.
OpenRouter’s JanitorAI and Chub.ai Switched?: A user speculated that OpenRouter might have JanitorAI and Chub.ai switched around in its internal app database, based on SimilarWeb metrics and JanitorAI’s brief downtime.
- The user thinks that OpenRouter simply takes the X-referer header and stores it, trimming everything after the domain name.

OpenRouter ▷ #new-models (2 messages):

“

Empty Channel, No New Models: The new-models channel on OpenRouter Discord appears to be empty, with no new model discussions or announcements to summarize.
- Further monitoring is needed to capture any relevant updates on new models in the future.
Awaiting New Model News: Currently, the channel lacks any specific details, links, or discussions that meet the criteria for detailed summarization.
- The absence of content suggests a quiet period in terms of new model-related activity.

Latent Space ▷ #ai-general-chat (41 messages🔥):

Rork App, TAU-Bench, Parallel AI, Anthropic's Series F, OpenAI Acquires Statsig

Rork App Rockets Up App Store Charts: Investor Matt Shumer introduced the new Rork app, an AI tool that generates iPhone apps on demand, demonstrating its ability to produce a working frontend of a Notion clone in minutes via this X post.
TAU-Bench Tackles LLM Troubles: Lewtun introduces TAU-Bench via this X post as a novel approach to solving LLM hallucinations and tackling the complexities of the internet itself.
Anthropic Achieves Astounding $183B Valuation: Anthropic has secured $13B in Series F funding, achieving an impressive $183B post-money valuation as detailed in their official announcement.
OpenAI Officially Obtains Statsig: OpenAI is acquiring Statsig, a product experimentation platform, with Statsig continuing to operate independently from its Seattle and San Francisco offices, retaining all employees, and prioritizing uninterrupted service for existing customers, according to Statsig’s official blog post and OpenAI’s X post.

HuggingFace ▷ #general (20 messages🔥):

E2B, Open Interpreter, Langchain Python Tool, LlamaIndex Code Interpreter, Instruct Model vs Base Model

E2B joins forces with Open Interpreter!: Members shared links to cool agentic tools such as E2B, Open Interpreter, Langchain Python Tool, and LlamaIndex Code Interpreter.
Instruct vs Base Models clarified: A member learning about agents asked about the differences between instruct models and base models and whether Gemini and GPT4 are instruct models.
- Another member confirmed that Gemini and GPT4 are instruct models, and linked to a Unsloth.ai guide.
SmolVLM2 Video Fine-Tuning on Android: A member asked how to finetune smolvlm2 with video data and how to do inference on Android devices.
- Another member suggested using Transformers.js or Llama.cpp for inference on Android (though unsure about video support), and provided a link to fine-tune SmolVLM2 and Android inference examples.
LightEval community tasks spotlighted: A member asked how to use the arabic_evals provided in the community_tasks for lighteval.

HuggingFace ▷ #today-im-learning (1 messages):

reubencf: chpater 7 of my textbook

HuggingFace ▷ #i-made-this (11 messages🔥):

Prompt Engineering, arxiv-agent, chess engine

Promposer.AI aims for better Prompt Engineering: A member released a new AI dev tool for prompt engineering called Promposer.AI.
- The tool allows users to write and iterate on prompts, add context/tools, and run structured test cases inside the IDE, browser, or pipeline, as shown in this video.
arxiv-agent Debates Research Claims with Personas: A member introduced arxiv-agent, an agentic AI system that ingests an arXiv paper by ID and then spawns 3 personas (Optimist, Skeptic, Ethicist) to debate its claims, available on GitHub.
- A hosted demo is available on Hugging Face Spaces, and one user noted that it still does output something that someone who has 0 understanding of Nuclear Theory thinks looks professional.
New Chess Engine makes its Debut: A member announced that they made a chess engine available on GitHub.
- The user noted it’s not very strong yet.

HuggingFace ▷ #core-announcements (1 messages):

ZeroGPU Demos, AOT Compilation

ZeroGPU Spaces Get Ahead-of-Time Compilation: Hugging Face has announced a new recipe with ahead-of-time compilation (AOT) for optimizing ZeroGPU-powered demo Spaces, aiming for a smoother user experience.
- Users can now leverage this recipe to improve their demo performance.
Optimize ZeroGPU-Powered Demos: New optimization available with ahead-of-time compilation to optimize your ZeroGPU-powered demos.
- This optimization should help with a smoother user experience.

HuggingFace ▷ #agents-course (2 messages):

TextToImageTool issues, Smolagents task failure, Agents Course Materials

TextToImageTool stalls Smolagents Task: A user reported that the TextToImageTool is not working, preventing completion of the Unt.1 Smolagents task due to the inability to create images.
- The user attached images, seeking assistance and suggestions to resolve the issue.
Agents Course Materials Location Revealed: In response to the user’s request, another member shared a link to the agents-course GitHub repository.
- The member indicated that the information is also available in the introductory video.

Modular (Mojo 🔥) ▷ #general (19 messages🔥):

Mojo Async, Mojo + GPU Execution, CUDA programming model, Data movement expenses

Mojo Asynchronous Execution Approaching: With async features coming to Mojo, users will be able to await the GPU being ready and execute CPU tasks in the meanwhile.
Mojo mirrors CUDA execution model: Like CUDA, GPU execution in Mojo is asynchronous, where a kernel can be launched on the accelerator while work is done on the host (CPU) side, with results copied later, as seen in the docs.
Auto execution on all hardware devices not implemented: Currently, Mojo requires manual implementation for simultaneous computing on CPU and GPU, without automatic language support.
Data Movement is expensive for simultaneous CPU/GPU execution: Automatic execution on all available hardware isn’t implemented due to the fact that data movement is expensive, and often only one device is a good fit for the problem.

Modular (Mojo 🔥) ▷ #mojo (2 messages):

Memory-safe bidirectional pointers in Mojo, __moveinit__ and linear types in Mojo

Memory-Safe Bidirectional Pointers Beckon: Discussion arose around the potential for memory-safe bidirectional pointers in Mojo using __moveinit__ and linear types.
- One member expressed curiosity about the implications and how these features might be utilized.
Linear Types Enable Memory Safety: The use of __moveinit__ and linear types is being explored for advanced memory management in Mojo.
- This approach is anticipated to enhance the safety and efficiency of pointer operations.

Modular (Mojo 🔥) ▷ #max (9 messages🔥):

DeviceContext code, WMMA problem for RDNA2, matmul fallback, target information is managed

DeviceContext code matched: A member will check on moving the floor for platform checks in the internal DeviceContext code to match.
- Another member will investigate why they saw compilation time explode when they tried to deploy something similar to their Pascal system, and warned in case the other member experiences the same.
WMMA a problem for RDNA2: A member stated that the Lack of WMMA is also a problem for RDNA2, which is still fairly popular due to AMD CPUs using RDNA2 for iGPUs.
- Another member asked whether it makes sense to have a universal fallback for GPU-shaped things that will just use whatever SIMD the target has.
matmul fallback implemented: A member mentioned that a naive matmul fallback probably makes sense as a default for new architectures until device-specific acceleration is built out.
- Everything so far has been tuned for Ampere+ and CDNA3+ architectures, where you could rely on tensor / matrix cores being present.
Older devices avoid fallback paths: A member poked around a bit and part of the problem seems to be assuming Nvidia has tensor cores and that AMD has WMMA/MFMA.
- Which sends older devices away from the fallback paths, and they will take a hard look at how target information is managed right now.

Yannick Kilcher ▷ #general (7 messages):

DeepSeek R1 Disruptions, Deep Learning Courses, Statistics and Probability Books

DeepSeek R1 Disruptions Incoming?: A member expressed optimism for future DeepSeek R1-level disruptions, citing active work in the field.
- They argue that a lot of people are working on it and that increases the odds of someone coming up with something interesting.
Deep Learning Course > Yann LeCun’s?: A member suggested that a linked Machine Learning course might be better than Yann LeCun’s deep learning course.
- The member attached an image to support his argument.

Yannick Kilcher ▷ #paper-discussion (10 messages🔥):

FastVLM, Communication Complexity, Sign-Rank Bounds, VibeVoice

FastVLM paper coming soon: The group will be looking at the FastVLM paper soon.
- The group also plans to discuss FastVLM.
Explanation of the paper at a manageable level: One member said that the explanations in the paper seem to be at a manageably high level.
- They linked resources on communication complexity and sign-rank bounds, including this arXiv paper and this Wikipedia article.
Paper Posted for Reference: A member posted this paper for reference.
- Another member shared VibeVoice.

Yannick Kilcher ▷ #ml-news (5 messages):

Prompt Injection, Image Scaling Attacks, AI System Security

Prompt Injection Gets Aliased: A new prompt attack mixes aliasing with prompt injection.
Image Scaling Weaponized Against AI: A member linked to a discussion on weaponizing image scaling against production AI systems, as discussed in this X post and another X post.

aider (Paul Gauthier) ▷ #general (17 messages🔥):

GPT-5 vs o4-mini, Model adjustment period, Streaming responses and KYC, Nebius GPT-OSS, Livebench.ai

o4-mini preference over GPT-5 surfaces: A member switched back to o4-mini after 3 weeks with GPT-5/GPT-5-mini, finding it easier to steer and producing code closer to their liking.
- They felt GPT-5 is moving towards the complexity of Gemini/Claude, with unnecessary changes and harder-to-digest code, but another member said solving of problems is much better.
Model adjustment period exists: Members discussed about a period of adjustment is needed when switching models, though most don’t revert back.
- One member noted a 3-week adjustment period, and another mentioned that waiting for responses is now mildly annoying due to the KYC requirement that they don’t want to do.
Nebius bungled GPT-OSS, it’s funny: A member shared a Reddit link about Nebius messing up GPT-OSS.
- They remarked that oh crap is this nebius .. didnt they F it up with gpt-oss lol sad.
Livebench.ai looks interesting: A member shared a link to Livebench.ai and remarked that it looks interesting.
- In response, another member noted that without completion tokens number its hard to know if reasoning high was actually activated.
Qwen excels over polyglot: A user commented that Qwen’s rate on polyglot is way lower than how it performs in real use.
- The conversation was kickstarted by the fact that medium beats high for reasoning, also impressive showing by mini and qwen according to a graph shared.

aider (Paul Gauthier) ▷ #questions-and-tips (1 messages):

baboluo: Instead of model: gemini I had to specify model: gemini/gemini-2.5-pro

DSPy ▷ #papers (1 messages):

batmanosama: https://arxiv.org/abs/2505.17829

DSPy ▷ #general (16 messages🔥):

Generative UI, OCR Analyzer, GEPA, DSPy Program Optimization, JIT Compiler

Stanford Releases Generative UI: Stanford introduces Generative UI, which uses FSM-graph interface flows as the new primitive, treating UIs as black-box plugins auto-synthesized and refined by LLMs, more info on GitHub.
Navigating Context Window Limits with OCR Analyzer: A user is building a PoC OCR analyzer and is running into context window issues with GEPA when including base64 image data in feedback, and asks how to work around this.
- One member suggests that if the image is already part of the input, it need not be a part of the feedback; furthermore, they point to a GitHub pull request that should make working with images in GEPA easier.
Decoding DSPy Program Optimization Secrets: A user questions why optimized prompts extracted from a DSPy program aren’t recommended for inference, and wonders if DSPy could be dropped from production given its size/complexity.
- A member explains that an optimized DSPy program involves traces, training examples, demos, and signatures, and is not solely based on the prompt; in DSPy, the prompt consists of the user instruction, formatted types from the adapter, and few-shot examples in the system message.
DSPy Lambda Deployment Options Explored: Community members discussed solutions for deploying DSPy programs in AWS Lambda, including using Docker images to bypass size restrictions.
- Another member suggested that you can use lambda layers and also work around it. Additionally, another member pointed out that a new release has shrunk the binary size down to under 10Mb.
Optimizer Evolving into JIT Compiler?: The idea proposes automating metric generation and dataset creation for optimizers, where the optimizer dynamically chooses data points for testing.
- Another member replied, that if the optimizer chooses or creates a datapoint to test on then, it doesn’t even need to be an optimizer, it’s a JIT compiler.

Manus.im Discord ▷ #general (7 messages):

Manus Advantages, Agentic Space Competition, Name Liberation

Manus Retains Edge in Agent Race: A user believes Manus still has some advantageous edges, even though the competition in the agentic space has become extremely fierce.
- No further details about the specific advantages were discussed.
Name Liberation Fantasies: A user expressed bewilderment over their name, followed by a whimsical remark about liberating manus.
- The user then humorously questioned their current location, adding an element of playful absurdity.

Moonshot AI (Kimi K-2) ▷ #general-chat (4 messages):

OpenRouter, Qwen Model Suite

OpenRouter Identified as Source: A user pointed out a message was sourced from openrouter.
Qwen Model Suite Hailed for Completeness: A user expressed a preference for the Qwen model suite, citing its completeness and consistent performance.
- The suite now includes image editing and WAN video generation capabilities.

tinygrad (George Hotz) ▷ #general (3 messages):

In-place operations in Tinygrad, Memory Efficiency, Production Readiness of Tinygrad

Tinygrad’s In-Place Operations Explored: A user inquired about the safety of in-place operations in Tinygrad compared to PyTorch, where such operations can disrupt the computation graph and lead to incorrect gradients.
- The user aimed to understand if Tinygrad is production-ready for scenarios requiring in-place modifications to tensors for memory efficiency, instead of generating new tensors at each iteration.
Memory Efficiency via In-Place Tensor Modification: The user is seeking to modify input tensors in-place to enhance memory efficiency, which would avoid creating new tensors at each iteration.
- This approach contrasts with generating new tensors, which can be more memory-intensive.