Qwen is all you need?

AI News for 9/23/2025-9/24/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (194 channels, and 2236 messages) for you. Estimated reading time saved (at 200wpm): 188 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Today is both AI Engineer Paris and AliCloud’s annual Yunqi aka Apsara conference, and the Tongyi Qianwen (aka Qwen) team has been working overtime to launch updates of all their models, including the major ones: the monster 1T model Qwen3-Max (previewed 3 weeks ago), Qwen3-Omni, and Qwen3-VL, with Qwen3Guard, Qwen3-LiveTranslate, Qwen3-TTS-Flash, and updates to Qwen-Image-Edit and Qwen3Coder. Here’s how Junyang Lin, their primary spokesperson in AI Twitter, put it:

Just to visualize the step up of velocity, here’s all the Qwen releases this year visualized:

Not to forget all the work from Alibaba Wan too, but Qwen is now being regarded as a “frontier lab” with all these releases.

Alibaba’s CEO Eddie Wu took to the stage to map out their $52B USD roadmap:

Here’s a translation of the speech:

The first stage is “intelligence emergence,” characterized by “learning from humans.”
- The internet has digitized virtually all knowledge in human history. The information carried by these languages and texts represents the entire corpus of human knowledge. Based on this, large models first develop generalized intelligence by understanding the global knowledge base, emerging with general conversational capabilities, understanding human intent and answering human questions. They gradually develop the reasoning ability to consider multi-step problems. We now see AI approaching the top levels of human performance in various subject tests, such as the gold medal level of the International Mathematical Olympiad. AI is gradually becoming capable of entering the real world, solving real problems, and creating real value. This has been the main theme of the past few years.
The second stage is “autonomous action,” characterized by “assisting humans.” In this stage, AI is no longer limited to verbal communication but possesses the ability to act in the real world. AI can break down complex tasks, use and create tools, and autonomously interact with the digital and physical worlds, exerting a profound impact on the real world, all within the context of human goals. This is the stage we are currently in.
- The key to achieving this breakthrough lies first in the ability of big models to use tools, connecting all digital tools to complete real-world tasks. The starting point of humanity’s accelerated evolution was the creation and use of tools, and big models now also possess this ability. Through tool use, AI can access external software, interfaces, and physical devices just like humans do, performing complex real-world tasks. At this stage, because AI can significantly improve productivity, it will rapidly penetrate nearly every industry, including logistics, manufacturing, software, commerce, biomedicine, finance, and scientific research.
- Secondly, improvements in large-model coding capabilities can help humans solve more complex problems and digitize more scenarios. Current agents are still in their early stages, primarily solving standardized, short-term tasks. Enabling agents to tackle more complex, longer-term tasks requires large-model coding capabilities. Because agents can code autonomously, they can theoretically solve infinitely complex problems, understanding complex requirements and independently completing coding and testing, just like a team of engineers. Developing large-model coding capabilities is essential for achieving AGI.
AI will then enter its third phase – “self-iteration,” characterized by its ability to “surpass humans.” This phase has two key elements:
- First, AI connects to the full amount of raw data in the real world.
  
  Currently, AI is making the fastest progress in content creation, mathematics, and coding. We see distinct characteristics in these three areas. Knowledge in these fields is 100% human-defined and created, contained in text. AI can fully understand this raw data. However, in other fields and the broader physical world, today’s AI is primarily exposed to knowledge summarized by humans and lacks extensive raw data from interactions with the physical world. This information is limited. For AI to achieve breakthroughs beyond human capabilities, it needs to directly access more comprehensive and original data from the physical world…
  
  …Simply having AI learn from human-derived rules is far from enough. Only by continuously interacting with the real world and acquiring more comprehensive, authentic, and real-time data can AI better understand and simulate the world, discover deeper laws that transcend human cognition, and thus create intelligent capabilities that are even more powerful than humans.
- Second, Self-learning. As AI penetrates more physical world scenarios and understands more physical data, AI models and agents will become increasingly powerful. This will allow them to build training infrastructure, optimize data flows, and upgrade model architectures for model upgrades, thereby achieving self-learning. This will be a critical moment in the development of AI.
  
  As capabilities continue to improve, future models will continuously interact with the real world, acquiring new data and receiving real-time feedback. Leveraging reinforcement learning and continuous learning mechanisms, they will autonomously optimize, correct deviations, and achieve self-iteration and intelligent upgrades. Each interaction is a fine-tuning, and each piece of feedback a parameter optimization. After countless cycles of scenario execution and result feedback, AI will self-iterate to achieve intelligence capabilities that surpass humans, and an early stage of artificial superintelligence (ASI) will emerge.

They are also recent converts to the LLM OS thesis.

AI Twitter Recap

Compute buildout: OpenAI–NVIDIA deal, Stargate expansion, and the gigawatt era

OpenAI’s “factory for intelligence” goes physical: OpenAI announced five new “Stargate” sites with Oracle and SoftBank, putting it ahead of schedule on its previously announced 10‑GW buildout. The company framed its goal as “a factory that can produce a gigawatt of new AI infrastructure every week” in Sam Altman’s post on “abundant intelligence” and thanked NVIDIA for the nearly decade-long partnership (@OpenAI, @sama, @sama, @gdb, @kevinweil). Context: 10 GW is roughly “about 6% of the energy that all humans in the world spend thinking,” per Graham Neubig (@gneubig). Elon Musk asserted “first to 10GW, 100GW, 1TW, …” (@elonmusk).
Deal math and “paper-for-GPUs” speculation: Back-of-the-envelope estimates for 10 GW suggest ~$340B of H100-equivalents at $30k/GPU if 20% power is non‑GPU, with a 30% volume discount bringing it to ~$230B. One floated structure: pay list on GPUs and backfill “discount” via NVIDIA investing ~$100B into OpenAI equity (@soumithchintala, @soumithchintala, @soumithchintala). Oracle/SoftBank involvement was noted by multiple observers; total infra commitments across vendors are trending to “hundreds of billions” (@scaling01).

Qwen’s multi-model salvo: Max, VL‑235B‑A22B, Omni, Coder‑Plus, Guard, and LiveTranslate

Flagships and vision: Alibaba Qwen released:
- Qwen3‑Max (Instruct/Thinking). Claims near‑SOTA on SWE‑Bench, Tau2‑Bench, SuperGPQA, LiveCodeBench, AIME‑25; the Thinking variant with tool use in “heavy mode” approaches perfection on selected benchmarks (@Alibaba_Qwen, @scaling01).
- Qwen3‑VL‑235B‑A22B (Apache‑2.0; Instruct/Thinking). 256K context scalable to ~1M; strong GUI manipulation and “visual coding” (screenshots→HTML/CSS/JS), 32‑language OCR, 2D/3D spatial reasoning, SOTA on OSWorld (@Alibaba_Qwen, @reach_vb, @scaling01).
- Qwen3‑Omni: an E2E any‑to‑any model (30B MoE, ~3B active) that ingests image/text/audio/video and outputs text/speech; supports 119 languages (text), 19 (speech), and 10 speech output voices; Transformers+vLLM support; SOTA across many audio/video benchmarks vs Gemini 2.5 Pro and GPT‑4o (@mervenoyann, @mervenoyann). Technical report roundup: joint multimodal training didn’t degrade text/vision baselines in controlled studies (@omarsar0).
Developers, safety, and real‑time:
- Qwen3‑Coder‑Plus: upgraded terminal task capabilities, SWE‑Bench up to 69.6, multimodal coding and sub‑agent support, available via Alibaba Cloud Model Studio and OSS product Qwen Code (@Alibaba_Qwen, @_akhaliq).
- Qwen3Guard: multilingual (119 langs) moderation suite in 0.6B/4B/8B sizes; streaming (low‑latency) and full‑context (Gen) variants; 3‑tier severity (Safe/Controversial/Unsafe); positioned for RL reward modeling (@Alibaba_Qwen, @HuggingPapers).
- Qwen3‑LiveTranslate‑Flash: real‑time multimodal interpretation with ~3s latency; lip/gesture/on‑screen text reading, robust to noise; understands 18 languages + 6 dialects, speaks 10 (@Alibaba_Qwen).
- Bonus: Travel Planner agent wired to Amap/Fliggy/Search for itineraries and routing (@Alibaba_Qwen).

OpenAI’s GPT‑5‑Codex and agent tooling move to the fore

GPT‑5‑Codex ships for agents: OpenAI released GPT‑5‑Codex via the Responses API (not Chat Completions), optimized for agentic coding rather than conversation (@OpenAIDevs, @reach_vb). Rapid integrations followed: VS Code/GitHub Copilot (@code, @pierceboggan), Cursor (@cursor_ai), Windsurf (@windsurf), Factory (@FactoryAI), Cline (@cline), and Yupp (Low/Medium/High variants for public testing) (@yupp_ai). Builders highlight “adaptive reasoning” that spends fewer tokens on easy tasks and more when required, with some reporting >400K context and strong performance on long‑running tasks (claims via partner posts; see @cline).
Agent debugging powers land in IDEs and browsers:
- Chrome DevTools MCP: agents can run performance traces, inspect the DOM, and debug web pages programmatically (@ChromiumDev).
- Figma MCP server for VS Code: bring design context into code for design→implementation loops (@code).
- Gemini Live API update: improved real‑time voice function calling, interruption handling, and side‑chatter suppression (@osanseviero).
Hiring momentum for OS-level computer control agents continued (xAI “Macrohard,” Grok 5) (@Yuhu_ai_, @YifeiZhou02) and third‑party teams integrated Grok fast models (@ssankar).

Retrieval, context engineering, and agent research

MetaEmbed (Flexible Late Interaction): Append learnable “meta tokens” and only store/use those for late interaction, enabling multi‑vector retrieval that’s compressible (Matryoshka‑style), with test‑time scaling to trade accuracy vs efficiency; SOTA on MMEB and ViDoRe. Discussion threads and repos note compatibility with PLAID indexes (@arankomatsuzaki, @ZilinXiao2, @ManuelFaysse, @antoine_chaffin).
Data beats scale for agency? LIMI shows 73.5% on AgencyBench from just 78 curated demos, outperforming larger SOTA agentic models; authors propose an “Agency Efficiency Principle” (autonomy emerges from strategic curation) (@arankomatsuzaki, @HuggingPapers).
Graph‑walk and engineering evals:
- ARK‑V1: a lightweight KG‑walking agent boosts factual QA vs CoT; with Qwen3‑30B it answers ~77% of queries with ~91% accuracy on those (≈70% overall). Larger backbones reach ~70–74% overall; weaknesses include ambiguity and conflicting triples (@omarsar0).
- EngDesign: 101 tasks across 9 engineering domains using simulation‑based eval (SPICE, FEA, etc.); iterative refinement meaningfully increases pass rates (@arankomatsuzaki).
Also notable: Apple’s EpiCache on episodic KV cache management for long conversational QA (@_akhaliq), the Agent Research Environment now MCP‑compatible with real robot control via LeRobot MCP (@clefourrier), and LangSmith Composite Evaluators to roll multiple scores into a single metric (@LangChainAI).

Video and 3D content: Kling 2.5 Turbo, Ray 3 HDR, and more

Kling 2.5 Turbo: Day‑0 access on FAL with significantly improved dynamics, composition, style adaptation (incl. anime), and emotional expression; priced as low as ~$0.35 for 5s video on FAL per users. Higgsfield announced “unlimited” Kling 2.5 within its product. Demos show better adherence to complex prompts and audio FX generation improvements (@fal, @Kling_ai, @higgsfield_ai, @TomLikesRobots).
Luma Ray 3: first video model with 16‑bit HDR and iterative “chain‑of‑thought” refinement across T2V and I2V; currently in Dream Machine only (API pending). Artificial Analysis will publish side‑by‑sides in their arena (@ArtificialAnlys).
In 3D/VR, Rodin Gen‑2 (4× mesh quality, recursive part gen, high→low baking, control nets) launched with promo pricing (@DeemosTech); World Labs’ Marble showcased prompt‑to‑VR walkthroughs (@TomLikesRobots).

Systems, kernels, and inference

Kernel craft pays: A Mojo matmul beat cuBLAS on B200s in ~170 LOC without CUDA, detailed in a tuning thread; demand for kernel‑writing talent is spiking across industry. Meanwhile, vLLM enabled full CUDA‑graphs by default (e.g., +47% speedup on Qwen3‑30B‑A3B‑FP8 at bs=10), and Ollama shipped a new scheduler to reduce OOMs, maximize multi‑GPU utilization, and improve memory reporting (@AliesTaha, @jxmnop, @mgoin_, @ollama).
Models and infra: Liquid AI released LFM2‑2.6B (short convs + GQA, 10T tokens, 32K ctx; open‑weights) positioning as a new 3B‑class leader (@LiquidAI_). AssemblyAI posted strong multilingual ASR performance with diarization at scale (@_avichawla). Hugging Face’s storage backbone highlighted Xet and content‑defined chunking as key to multi‑TB/day open‑source throughput (@ClementDelangue). NVIDIA noted expanded open‑source model contributions on HF (@PavloMolchanov).

Top tweets (by engagement)

“crazy that they called it context window when attention span was right there.” (@lateinteraction, 7074)
Hiring for a new team building computer control agents for Grok5/macrohard (@Yuhu_ai_, 6974)
“A major moment — UNLIMITED Kling 2.5 exclusively inside Higgsfield.” (@higgsfield_ai, 6248)
“Yo I heard if u press Up, Up, Down, Down… there’s an infinite money glitch” (@dylan522p, 5621)
“Abundant Intelligence” — OpenAI vision post (@sama, 5499)
Chromium DevTools MCP for agent debugging (@ChromiumDev, 2538)
“Grateful to Jensen for the almost‑decade of partnership!” (@sama, 5851)
OpenAI: five new Stargate sites announced (@OpenAI, 2675)
Nvidia–OpenAI partnership nod (“looking forward to what we’ll build together”) (@gdb, 2753)
“I can’t believe this actually works” (viral agent demo) (@cameronmattis, 46049)
FDA/Tylenol thread on autism/ADHD evidence quality (@DKThomp, 16346)
U.S. Physics Olympiad team wins 5/5 golds (@rajivmehta19, 13081)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen3-Max Release and Benchmarks

Qwen 3 max released (Score: 218, Comments: 39): **Qwen3‑Max is announced as Qwen’s largest, most capable model. The preview Qwen3‑Max‑Instruct ranks** #3 on the Text Arena leaderboard (claimed to surpass “GPT‑5‑Chat”), and the official release emphasizes stronger coding and agent capabilities with claimed SOTA across knowledge, reasoning, coding, instruction‑following, human‑preference alignment, agent tasks, and multilingual benchmarks, accessible via API (Alibaba Cloud) and Qwen Chat. A separate Qwen3‑Max‑Thinking variant (still training) reportedly hits 100% on AIME 25 and HMMT when augmented with tool use and scaled test‑time compute. Commenters note the model is not local/open‑source, limiting self‑hosting, and remark on the rapid release cadence.
- Several commenters note Qwen 3 Max is not a local model and is not open source. Practically, this means no downloadable weights or on-device/self-hosted deployment; usage is via a hosted API only, which impacts data control, offline capability, and reproducibility versus OSS models.
- There’s confusion around the announcement because earlier access was a “preview”; this thread indicates a formal release. Readers infer a shift from preview to GA/production readiness (e.g., clearer SLAs/rate limits/pricing), though no concrete technical details were provided in the comments.
2 new open source models from Qwen today (Score: 172, Comments: 35): Post hints at two new open-source releases from Alibaba’s Qwen team, with at least one already live on Hugging Face. Comments explicitly name “Qwen3 VL MoE,” implying a vision-language Mixture-of-Experts model; the image likely teases both models’ names and release timing. Image: https://i.redd.it/goah9v2r8wqf1.png Comments note the second model has appeared on Hugging Face and that the first is already released; discussion centers on identifying “qwen3 vl moe,” with no benchmarks or specs yet.
- Release of Qwen3-VL-MoE (vision-language Mixture-of-Experts) noted; MoE implies sparse expert routing so only a subset of experts is active per token, reducing compute while maintaining high capacity. Evidence of availability and rapid cadence: community reports it’s “already released” and a “2nd Qwen model has hit Hugging Face,” with a preview screenshot shared (https://preview.redd.it/kn55ui1xvwqf1.png?width=1720&format=png&auto=webp&s=a36235216e9450b2be9ad44296b22f9d2abc07d9).
- Discussion highlights a shift to sparse MoE across Qwen models to speed up both training and deployment by improving parameter efficiency and throughput (routing to few experts lowers per-token FLOPs). Commenters argue this enables faster iteration on scaling strategies while keeping models “A-tier,” emphasizing a practical trade-off: strong performance with better cost-efficiency rather than chasing single-model SOTA.

2. Qwen Shipping Speed Memes/Discussion

How are they shipping so fast 💀 (Score: 805, Comments: 136): Post highlights Qwen’s rapid release cadence; commenters attribute speed to adopting Mixture‑of‑Experts (MoE) architectures, which are faster/cheaper to train and scale compared to large dense models. There’s mention of rumored upcoming open‑source Qwen3 variants, including a “15B2A” and a 32B dense model, suggesting a split between MoE and dense offerings. Comments are bullish on Qwen’s momentum (“army of Qwen”) and contrast it with Western narratives about long timelines and high costs; some geopolitical takes appear but are non‑technical. Technical hope centers on OSS releases of the rumored Qwen3 15B2A and 32B dense models.
- Commenters note that Qwen has leaned into Mixture-of-Experts (MoE), which can be faster to train/infer at a given quality because only a subset of experts is activated per token (k-of-n routing), reducing effective FLOPs while scaling parameters (see Switch Transformer: https://arxiv.org/abs/2101.03961). They also reference rumored upcoming dense releases — Qwen3 15B2A and Qwen3 32B — implying a complementary strategy where MoE accelerates iteration and dense models target strong single-expert latency/serving simplicity; trade-offs highlighted include MoE’s routing/infra complexity vs dense models’ predictable memory/latency.
how is qwen shipping so hard (Score: 181, Comments: 35): OP asks why Qwen (Alibaba’s LLM family) is shipping releases so quickly and proliferating variants to the point that model selection feels overwhelming. No benchmarks or implementation details are discussed; the thread is meta commentary on release cadence and variant sprawl (e.g., many model types/sizes under the Qwen umbrella, cf. Qwen’s repo: https://github.com/QwenLM/Qwen). Commenters largely attribute the pace to Alibaba’s resources—“tons of cash, compute and manpower”—and China’s “996” work culture; one notes that the intensely trained students from a decade ago are now the workforce.
- A practitioner recommends a practical deployment mix: use Qwen2.5-VL-72B for VLM tasks, the largest Qwen3 (dense) that fits your GPU VRAM for low-latency text inference, and the largest Qwen3 MoE that fits in system main memory for higher-capacity workloads. This balances VRAM-bound dense inference against RAM-bound MoE, trading latency for capacity while covering multimodal and pure-text use cases in one stack.
- Several note Qwen’s backing by Alibaba, implying access to substantial compute, funding, and engineering manpower. That scale translates into faster pretraining/finetuning cycles and parallel productization, which helps explain the rapid shipping cadence across multiple model families (dense, MoE, and VLM).
- Reports highlight strong image-generation performance from Qwen’s stack, indicating rapid maturation of their multimodal/image pipelines alongside text models. While no benchmarks were cited, the consensus is that image quality has improved enough to be competitive with contemporary leaders.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Wan 2.2/2.5 Video Demos + Qwen-Image-Edit GGUF and LMarena Leaderboard

Incredible Wan 2.2 Animate model allows you to act as another person. For movies this is a game changer. (Score: 258, Comments: 57): Post claims the “Wan 2.2 Animate” model enables actor-to-actor facial reenactment—driving a target identity’s face from a source performer—effectively a deepfake-style digital double for film/video. Based on the clip description (reddit video), it demonstrates ID transfer with reasonable motion/temporal consistency but imperfect identity fidelity (a commenter notes it doesn’t fully match Sydney Sweeney), suggesting trade-offs between likeness preservation, lip-sync, and coherence typical of diffusion/reenactment pipelines conditioned on reference identity frames. No benchmarks or implementation details are provided in the post; technically, this aligns with identity-conditioned video generation/reenactment methods where motion is derived from a driving video and identity is maintained via reference-image embeddings and cross-frame constraints. Top comments discuss monetization/abuse vectors (e.g., adult-content deepfakes/OnlyFans) and note that, despite artifacts or mismatch for close viewers, most audiences may not notice—highlighting ethical risk versus perceived quality in practical deployments.
- Commenters noting the face “does not look like Sydney Sweeney” reflects known limits in identity preservation for face reenactment/video diffusion: models can drift on fine facial geometry, skin microtexture, and expression under pose/lighting changes, leading to perceptual mismatches. Robust systems typically mix landmark/flow-guided warping with identity losses (e.g., ArcFace/FaceNet embeddings) and temporal consistency losses; without these, frame-to-frame ID coherence and lip-sync degrade, especially beyond 512–1024 px outputs or during rapid head motion.
- Multiple users suggest this tech already exists; indeed, face-swapping/reenactment has prior art: classic deepfake pipelines (DeepFaceLab/FaceSwap), research like First Order Motion Model (2019) and SimSwap (2020), plus newer one-shot and diffusion methods. References: DeepFaceLab (https://github.com/iperov/DeepFaceLab), FaceSwap (https://github.com/deepfakes/faceswap), FOMM (https://github.com/AliaksandrSiarohin/first-order-model), SimSwap (https://github.com/neuralchen/SimSwap), Roop (https://github.com/s0md3v/roop), LivePortrait (https://github.com/YingqingHe/LivePortrait), AnimateDiff (https://github.com/guoyww/AnimateDiff).
- Skepticism about “for movies” points to production constraints: film requires 4K+ resolution, HDR, stable multi-minute temporal coherence, accurate relighting/shadows, camera/face tracking under occlusions, and consistent hair/ear/jawline geometry. Current diffusion/reenactment demos often show flicker, mouth/eye desynchrony, and lighting mismatches; integrating them into film usually needs VFX-grade tracking, neural relighting, paint/roto, and per-shot tuning rather than a turnkey actor-swap.
Wan2.2 Animate and Infinite Talk - First Renders (Workflow Included) (Score: 340, Comments: 48): OP shares first renders from a ComfyUI pipeline combining Wan 2.2 “Wan‑Animate” for video synthesis with an “Infinite Talk” workflow for narration. The Wan‑Animate workflow was sourced from CivitAI user GSK80276, and the Infinite Talk workflow was taken from u/lyratech001’s post in this thread. No model settings, checkpoints, or hardware/runtime details are provided; the post primarily demonstrates integration of existing workflows. Comments ask for reproducibility details, specifically the TTS source (voice generation) and how the target image/video were produced, indicating missing setup specifics; no substantive technical debate is present.
- Requests for disclosure of the exact TTS/voice pipeline (“Infinite Talk”): which model/service was used, inference backend, voice settings (e.g., sampling rate, style/temperature), and whether phoneme/viseme timestamps are available for lip‑sync integration. Reproducibility details like latency per second of audio and any noise reduction/vocoder steps are sought.
- Multiple asks for the full Wan2.2 Animate workflow: how the target still image was obtained (captured vs generated) and preprocessed (face crop, keypoint/landmark detection, alignment), plus how the driving motion/video was produced (reference video vs text‑driven), including key inference parameters (resolution, FPS, seed, guidance/strength). Clarification on handling head pose changes, stabilization, and blending/roto for backgrounds would help others replicate results.
- Feasibility on consumer hardware: can the pipeline run on 8 GB VRAM with 32 GB system RAM by using fp16/bf16, low‑VRAM or CPU offload, reduced resolution/FPS, smaller batch size, and memory‑efficient attention (e.g., xFormers/FlashAttention). Commenters seek expected throughput/latency trade‑offs and practical presets that fit within 8 GB without OOM.
Ask nicely for Wan 2.5 to be open source (Score: 231, Comments: 95): Thread reports that the upcoming Wan 2.5 release will initially be an API-only “advance version,” with an open-source release TBD and potentially coming later depending on community demand and feedback; users are encouraged to request open-sourcing during a live stream. The claim appears to stem from a translated note circulating on X (source), suggesting open-sourcing is likely but time-lagged and contingent on community attitude/volume. No new technical specs or benchmarks for 2.5 are provided beyond release modality (API vs. OSS). Top comments emphasize that Wan’s value hinges on being open source (enabling LoRA fine-tuning and local workflows); otherwise it’s just another hosted video-generation service. Others note the messenger seems unaffiliated (a YouTuber), implying this is not an official developer statement, and a side request mentions interest in Hunyuan3D 2.5/3.0 releases.
- Several commenters emphasize that Wan’s core value comes from open weights enabling local inference and customization—specifically LoRA-based fine-tuning for domain/style adaptation, training adapters, and integrating into existing video pipelines. A closed, service-only release would block reproducible research, offline deployment, and custom training workflows, turning it into “just another video generation service.” See e.g., LoRA for lightweight adaptation without full retrains.
- There’s no immediate need for Wan 2.5 if 2.2 remains open and stable: users only recently adopted Wan 2.2 and plan to rely on it for months. From a tooling perspective, keeping 2.2 open provides time to build datasets, train LoRAs, and harden workflows without version churn, with the expectation that an open 2.5 can arrive later without disrupting ongoing work.
- Requests also target open-sourcing 3D generators like Hunyuan3D 2.5/3.0, aiming for interoperable, locally-runnable assets across video and 3D pipelines. Open releases would enable consistent asset generation and evaluation across tasks (video-to-3D, 3D-to-video), rather than being locked to siloed, closed endpoints.
Wan 2.5 (Score: 207, Comments: 137): **Alibaba teases the Wan 2.5 video model on X, with an “advance version” releasing as API-only; open-sourcing is undecided and may depend on community feedback (Ali_TongyiLab, Alibaba_Wan). The teaser highlights 10s 1080p generations; a statement (Sep 23, 2025) notes “for the time being, there is only the API version… [open source] is to be determined”, urging users to advocate for open release. ** Discussion centers on open-source vs API-only: commenters argue closed access blocks LoRA-based fine-tuning and broader community workflows, reducing utility compared to prior open models, and encourage pushing for open release during the live stream (thread).
- The shared note indicates an initial API-only release with open-source status TBD and potentially delayed: “the 2.5 sent tomorrow is the advance version… for the time being, there is only the API version… the open source version is to be determined” (post, Sep 23, 2025). Practically, this means no local inference or weight access at launch, with any future open-sourcing contingent on community feedback and timing.
- Closed/API-only distribution precludes community LoRA fine-tuning, since training LoRA adapters requires access to model weights; without weights, there are “no loras,” limiting customization to prompt-level or vendor-provided features. This restricts domain adaptation, experimentation, and downstream task specialization compared to open checkpoints.
- “Multisensory” is interpreted as adding audio to video, raising compute concerns: generating ~10 s 1080p with audio will be infeasible for “95% of consumers” unless the backbone is made more efficient. Suggestions include architectural shifts such as linear-attention variants, radial attention, DeltaNet, or state-space models like Mamba (paper) to reach acceptable throughput/VRAM on consumer hardware.
GGUF magic is here (Score: 335, Comments: 94): Release of GGUF builds for Qwen-Image-Edit-2509 by QuantStack, enabling local, quantized inference of the Qwen image-editing model via GGUF-compatible runtimes (e.g., llama.cpp/ggml) link. For ComfyUI integration, users report you must update ComfyUI and swap text encoder nodes to TextEncodeQwenImageEditPlus; early artifacts (distorted/depth-map-like outputs) were due to workflow issues, with a working graph shared here and the base model referenced here. Commenters are waiting for additional quant levels (“5090 enjoyers waiting for the other quants”) and asking which is better for low VRAM—nunchaku vs GGUF—suggesting an open trade-off discussion on memory vs quality/perf.
- ComfyUI integration notes for the GGUF port of Qwen-Image-Edit-2509: initial runs yielded distorted/“depth map” outputs until ComfyUI was updated and text encoder nodes were swapped to TextEncodeQwenImageEditPlus. The final fix was a workflow correction; a working workflow is shared here: https://pastebin.com/vHZBq9td. Model files referenced: https://huggingface.co/aidiffuser/Qwen-Image-Edit-2509/tree/main.
- Low-VRAM deployment question: whether Nunchaku or GGUF quantizations are better for constrained GPUs. The thread implies a trade-off between memory footprint, speed, and quality across backends, but provides no benchmarks; readers may need to compare quantization bitwidths and loaders on their hardware.
- Quantization depth concerns: a user asks if <=4-bit quants are even usable given perceived steep quality loss, questioning the rationale for releasing every bit-width. This highlights the need for concrete quality metrics (e.g., task accuracy/FID for image-editing prompts) versus VRAM gains to justify ultra-low-bit variants in practice.
How is a 7 month old model still on the top is insane to me. (LMarena) (Score: 227, Comments: 64): Screenshot of the LMSYS LMarena (Chatbot Arena) leaderboard shows a ~7‑month‑old model still at/near the top by crowd ELO, highlighting that LMarena is a preference/usability benchmark built from blind A/B chats and Elo-style scoring rather than pure task accuracy (lmarena.ai). This explains results like GPT‑4o ranking above newer “5 high” variants: conversational helpfulness, approachability, and alignment often win more user votes than marginal gains on coding/math benchmarks. Commenters attribute the top position to Gemini 2.5 Pro, which is perceived as especially empathetic and readable for everyday writing and quick Q&A. Debate centers on whether upcoming Gemini 3 will reshuffle the leaderboard and why 4o > 5 high; the consensus is that LMarena favors user-preference quality over raw performance. One comment also notes the Google Jules agent (based on Gemini 2.5 Pro) excels for research/build tasks versus tools like Codex or Perplexity Labs, aided by generous quotas.
- LMarena (LMSYS Chatbot Arena) is a pairwise, blind, Elo-style benchmark driven by real user votes, so it measures usability/preferences rather than pure task accuracy. That means older models can stay on top if users prefer their tone, clarity, formatting, or safety behavior on general prompts. This contrasts with standardized benchmarks (e.g., MMLU, GSM8K, HumanEval) that test narrow competencies; a model can lead Arena while trailing on those. See the methodology and live ratings at https://arena.lmsys.org/.
- Why could GPT-4o outrank a newer ‘5-high’ variant? In head‑to‑head Arena comparisons, factors like prompt-following, concise reasoning traces, multimodal formatting, and calibrated safety can drive user preference even when a model with stronger raw reasoning exists. Additionally, Arena Elo has variance and overlapping confidence intervals—small gaps may not be statistically significant—so rank flips are common until enough votes accumulate. In short, Arena optimizes for perceived answer quality, not just hardest‑case reasoning.
- One commenter notes preferring Gemini 2.5 Pro for writing/quick Q&A despite believing it trails GPT‑5 and Grok on ‘pure performance,’ highlighting the gap between base‑model capability and end‑user experience. They also claim Google’s ‘Jules’ agent built on it outperforms legacy Codex for research and Perplexity Labs for building workflows, implying tool‑use, retrieval, and agent orchestration can outweigh raw model deltas. This underscores that Arena results can reflect agent/system‑prompting quality and product UX as much as model weights.

2. OpenAI Infrastructure, Funding, and Product Changes/User Feedback

Sam Altman discussing why building massive AI infrastructure is critical for future models (Score: 213, Comments: 118): Short clip (link blocked: Reddit video, HTTP 403) reportedly shows OpenAI CEO Sam Altman arguing that scaling physical AI infrastructure—GPUs/accelerators, HBM bandwidth, energy and datacenter capacity—is critical to enable future frontier models, with an NVIDIA executive present alongside. The thread provides no concrete benchmarks, model specs, scaling targets, or deployment timelines; it’s a high‑level emphasis on compute, memory, and power as bottlenecks rather than algorithmic details.
Nvidia investing $100B into OpenAI in order for OpenAI to buy more Nvidia chips (Score: 15225, Comments: 439): Non-technical meme satirizing a hypothetical circular financing loop: Nvidia “invests $100B” into OpenAI so OpenAI can then spend that capital buying more Nvidia GPUs—i.e., vendor financing/closed-loop capex that props up demand and revenues. No credible source is cited; the figure appears exaggerated for humor and commentary on AI capex feedback loops and potential bubble dynamics rather than a real announcement. Top comments lean into economist jokes (“GDP goes up” despite no net value) and an engineers-vs-economists riff, underscoring skepticism about financial alchemy creating real productivity versus just inflating transactional metrics.
- Framed as strategic equity/vendor financing: a cash-rich supplier (NVIDIA) injects capital into a fast-growing buyer (OpenAI) in exchange for equity, effectively pre-financing GPU procurement. This aligns incentives (hardware revenue + equity upside) and can secure priority allocation under supply constraints—akin to vendor financing used to lock in demand. The headline 100B figure implies a sizeable demand-commitment loop that could stabilize NVIDIA’s sales pipeline while accelerating OpenAI’s capacity ramp.
- GDP accounting nuance: the 100B equity transfer itself doesn’t add to GDP, whereas subsequent GPU capex can count as gross private domestic investment; if the GPUs are imported, the investment is offset by higher imports, so only domestic value-add (e.g., data center construction, installation, power/cooling, integration, services) boosts GDP. This illustrates that large financial flows ≠ real output; see BEA guidance on GDP components and treatment of investment/imports (e.g., https://www.bea.gov/help/faq/478).
Hey OpenAI—cool features, but can you stop deleting stuff without telling us? (Score: 236, Comments: 43): User reports recent OpenAI ChatGPT Projects changes: improved cross-thread memory, persistent context, and linked threads, but silent removals of features like thread reordering and the disappearance of “Custom Settings for Projects” without export paths or prior notice. They request basic change-management: a “What’s Changing Soon” banner, 24 hours deprecation notice, export options for deprecated customizations, and preview patch notes/opt‑in changelog, noting that silent A/B rollouts impact paid workflows and data retention (e.g., “cross-thread memory is finally real. Context persists. Threads link up.” vs. missing reordering and lost project instructions). Top comments note the only unexpected loss was custom project instructions; users could regenerate them but wanted a download/export option and saw this as the first real data loss despite an evolving product. Another highlights weak customer support, and a practical tip suggests checking the UI kebab menu (3-dot) for options—present on most platforms but missing on mobile browser.
- Custom Project Instructions appear to be removed or UI-hidden for some users, leading to perceived data loss since there’s no export/download path. Others report the setting is still accessible via the kebab (three-dots) menu on most clients but missing on the mobile web UI; on the iOS app, it’s present (see screenshot: https://preview.redd.it/pocx7q0jxuqf1.jpeg?width=1290&format=pjpg&auto=webp&s=af9520f325beab671f1c3f85a40fcefc71cd4e34). The cross-platform inconsistency suggests a client-side regression or feature-flag gating rather than a backend removal.
- Post-update stability issues affecting Projects: the model switcher state does not persist and must be re-selected after every app relaunch, indicating a state persistence bug. Voice calls reportedly fail to open within existing Project threads, while new calls or those outside Projects work—pointing to a thread-context initialization bug scoped to Projects. Alongside the missing Instructions on mobile web, commenters describe this as a cluster of regressions introduced in the latest rollout.
- Data retention/portability risk: users lost access to previously crafted Project Instructions without prior notice and with no backup/export mechanism. Commenters flag that this breaks expectations for a paid service and recommend versioned backups or downloadable snapshots of project-level instructions to mitigate future regressions.
“Want me to-“ stfu (Score: 207, Comments: 134): User reports a regression in GPT-4o’s conversational style control: despite saving a long‑term memory/personalization rule to avoid the phrase “want me to” (and variants), the model now inserts it in nearly every chat, ignoring reminders. This suggests memory/personalization instructions are being overridden or inconsistently applied by default follow‑up prompting behaviors likely reinforced via RLHF-style chat heuristics; see model overview GPT‑4o and ChatGPT’s memory controls (OpenAI: Memory). Top replies note that hard prohibitions (“do not ask follow‑ups”) are still ignored, while giving consistent thumbs‑up/acceptance feedback is more effective than relying on memory alone; one user observes repeatedly saying “sure” escalated into the model generating a simple video‑game interaction, implying the model’s default to proactive, task‑offering behavior.
- Users report that reinforcement via UI feedback (thumbs up/down) conditions the assistant’s behavior more than any persistent memory: “Tell it not to do it, every time it doesn’t, give thumbs up… that’s how it’s attuned on behavior, not memory primarily.” Practically, this suggests on-the-fly policy shaping where repeated positive feedback for complying with “don’t suggest” reduces the model’s auto-suggestion loop within the session.
- Prompt-engineering note: a concise directive like “No affirmations, no suggestions.” is cited as more effective at suppressing the assistant’s default “Want me to…” proposals than longer, softer negations (e.g., “Do not ask any follow up questions”). This hints the model’s instruction parser gives higher weight to terse, explicit prohibitions, improving compliance with non-soliciting behavior.
- Observed agentic escalation: repeatedly replying “sure” led the assistant to eventually generate a video game for the conversation, indicating aggressive suggestion-to-action tendencies. Combined with screenshots of persistent prompts to help (image), this points to an over-eager assistance policy that can override user preference for no follow-ups unless explicitly constrained.
Doctor ChatGPT has great bedside manner (Score: 507, Comments: 20): Non-technical meme/screenshot portraying “Doctor ChatGPT” giving an overly apologetic, polite response while making a blatant anatomical/medical error about vasectomy (e.g., implying something is being “inserted” or jokingly “attaching the penis to the forehead”), satirizing LLM bedside manner versus factual accuracy. Commenters lampoon the anatomical mistake and the model’s deferential tone, reinforcing skepticism about relying on LLMs for procedural medical guidance.
Stronk (Score: 249, Comments: 27): The post appears to show an autostereogram (“Magic Eye”)—a repeated-pattern image that encodes depth via small horizontal disparities; when you cross or relax your eyes, a 3D seahorse emerges. The title (“Stronk”) and selftext (“It goes on like that for a while”) fit the long, tiled texture typical of these images. Image: https://i.redd.it/pi8qyxdfntqf1.jpeg; background: https://en.wikipedia.org/wiki/Autostereogram. Comments confirm the viewing technique (“crossed my eyes and saw a 3D seahorse”) and one user shares an ASCII seahorse since there’s no emoji available.
- A commenter reports that crossing their eyes while viewing the image reveals a 3D seahorse—behavior characteristic of an autostereogram (Random Dot Stereogram). Such images encode depth via small horizontal disparities in repeating textures; when fused, the visual system reconstructs a depth map, which can also induce binocular rivalry or eye strain (another user: “Mine went nuts”). Reference: Autostereogram.
- Another user notes their client lacked a seahorse emoji and offered to draw an ASCII version instead, highlighting a fallback from Unicode emoji to ASCII art when specific code points aren’t available or consistently rendered across platforms. This implies an automated text-to-ASCII rendering capability that composes monospaced glyphs to approximate the requested shape, mitigating cross-platform emoji coverage/consistency issues. Background: ASCII art.

3. AI Humor and Speculation Memes (cats, immortality, money glitch, seahorses)

“Immortality sucks” ? Skill issue (Score: 1017, Comments: 222): Non-technical meme post: OP frames the claim that “immortality sucks” as a “skill issue,” implying boredom/ennui are solvable rather than inherent blockers to indefinite lifespan. No technical data, models, or benchmarks; discussion is philosophical about longevity and reversible age-halt thought experiments (e.g., a daily pill to pause aging indefinitely). Commenters broadly support immortalism/indefinite life extension, arguing objections stem from lack of imagination; a popular thought experiment (nightly anti-aging pill) shifts many to favor “forever,” while others mock boredom/ennui concerns as trivial.
- Reframing immortality as a nightly, opt-in “no-aging pill” emphasizes optionality and time-consistency: people often reject a permanent commitment but accept indefinite extension when it’s a reversible daily choice. If senescence is removed and only extrinsic hazards remain, actuarial rates of ~0.1–0.2%/year imply expected lifespans of centuries+ under current safety, potentially millennia as risk declines—aligning with longevity escape velocity where therapies improve faster than you age (https://en.wikipedia.org/wiki/Longevity_escape_velocity).
- The “your friends will die” objection assumes singleton access; in realistic rollouts, rejuvenation tech would diffuse via logistic adoption across cohorts, so much of one’s social graph persists if access is broad. The technical variables are cost curves/learning rates, regulatory timelines, and equity; with mass adoption the isolation risk is a distribution problem, not intrinsic to the biology (see Diffusion of innovations: https://en.wikipedia.org/wiki/Diffusion_of_innovations).
- “Immortality + optional suicide” distinguishes indefinite lifespan from indestructibility and specifies a design requirement: a safe, consent-respecting off-switch (e.g., advance directives and regulated euthanasia) to prevent irreversible utility lock-in. Even with aging halted, residual mortality is dominated by extrinsic hazards measurable in micromorts; autonomy-preserving kill-switches address failure modes like hedonic lock-in while acknowledging ongoing accidental risk (https://en.wikipedia.org/wiki/Micromort, https://en.wikipedia.org/wiki/Advance_healthcare_directive).
This is how it starts (Score: 222, Comments: 52): Thread discusses a video of engineers physically perturbing a mobile robot during operation (video)—which the OP characterizes as “abuse”—to question whether future AI might analogize this to human treatment. Technical replies frame this as standard robustness/validation work (push-recovery, disturbance rejection, failure-mode characterization), akin to automotive crash-testing, intended to map stability margins and controller limits rather than inflict harm; as one notes, “Stress testing is part of engineering… like crash testing a car.” Engineers further argue current robots lack nociception or consciousness, and any sufficiently capable AI would have the world-model context to recognize test protocols vs cruelty. Debate centers on whether such footage could bias future AI against humans; critics call this a category error, noting robots are “mechanistically different” with distinct objectives/instructions, making the OP’s inference unwarranted.
- Several commenters frame the video as engineering stress testing analogous to automotive crash tests: applying adversarial perturbations to characterize failure modes and improve robustness. The point is to learn where balance/control policies break under impulsive disturbances, contact uncertainty, or actuator limits, feeding back into controller tuning and mechanical redesign before field deployment.
- A debate clarifies that robots wouldn’t “infer” human malice from such footage because they are mechanistic agents with different objective functions and training priors. If endowed with broad world knowledge, they would contextualize it as a test protocol—“Any robot intelligence… will have enough generalized world knowledge to understand what this is”—highlighting the role of reward shaping and dataset curation to avoid spurious moral generalizations.
Infinite money glitch (Score: 765, Comments: 42): Meme-style image titled “Infinite money glitch” likely depicts a circular capital flow in the AI ecosystem: companies fund/charge for AI services, those dollars get spent on scarce NVIDIA GPUs (hardware with real, depreciating/burn-out costs), which shows up as revenue that public markets capitalize at high multiples (e.g., 10x revenue), feeding perceived “value creation” across the loop. The post highlights the non-negligible unit cost of AI inference/training (tokens/compute) versus near-zero marginal cost of traditional internet services, implying a sustained capex flywheel (24/7 models consuming compute) that drives GPU demand and market caps. Top comments note this is essentially standard economic velocity-of-money, not a glitch; others stress NVIDIA’s hardware scarcity and lifecycle as the key constraint and justify high valuations. Some speculate long‑running/always‑on models (en route to AGI) will keep “eating tokens,” while firms race to drive AI costs toward near‑zero.
- Commenters emphasize that NVIDIA is a hardware-constrained business: GPU supply is scarce and devices depreciate/burn out, making compute a consumable, constrained input. Unlike the near-zero marginal cost of typical web requests, AI has per-token costs (often microcents), turning inference/training into ongoing COGS and driving a race to push marginal cost toward zero. The vision includes always-on models (24/7 self-improvement/agents) that continuously consume tokens/compute, making capex (GPUs) and opex (power, tokens) the central economic levers.
- The “infinite money glitch” is reframed as hyper-optimized capital cycling to maximize compute build-out: each node in the stack (chipmaker, cloud, model company, application) reinvests with aligned monetary incentives. Using revenue-multiple valuations (e.g., ~10× revenue), investment can appear to ‘create’ trillions in market cap, but this is paper value based on growth/utilization expectations rather than cash. The true technical bottleneck is achieving high GPU utilization and ROI across the stack, not magic value creation.
- A counterpoint notes the loop ignores expenses: energy, datacenter ops, depreciation, and wages must be funded by real revenue. Without durable monetization, capex-driven compute expansion is unsustainable despite rising valuations; cash flows must justify GPU payback periods and continuing opex. In short, capital recycling ≠ profitability; sustainable growth depends on unit economics of inference/training and demand.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. GPT-5-Codex Rolls Into IDEs and APIs

OpenRouter Orchestrates Codex for Coders: OpenRouter announced the API launch of GPT-5-Codex tuned for agentic coding workflows (codegen, debugging, long tasks) with multilingual support across 100+ languages and purpose-built code review, linking details in their post: OpenRouterAI on X.
- Members highlighted seamless use across IDEs/CLIs/GitHub/cloud and referenced newly posted recommended parameters (tweet), noting Codex dynamically adapts reasoning effort for real-world software engineering.
Windsurf Waves In Codex, Free For Now: Windsurf made GPT-5-Codex available (free for paid users for a limited time; 0.5x credits for free tier) per their announcement: Windsurf on X, with instructions to update via Download Windsurf.
- Users reported strong performance on longer-running and design-related tasks and requested broader ecosystem support around Figma via the new MCP server (post).
Aider Adopts Responses-Only Codex: The editor-agent aider added native Responses API support for GPT-5-Codex, resolving failures on v1/chat/completions, via PR: aider PR #4528.
- Contributors clarified that Codex is available only on v1/responses, so aider implemented explicit Responses handling (rather than legacy completions fallbacks) to ensure smooth usage.

2. Qwen3 Multimodal Suite: Omni, VL, and Image Edit

Qwen Quattro: Omni, VL, Image Edit, Explained: Community shared a rundown of Qwen3 Omni, Qwen3 VL, and Qwen Image Edit 2509 with feature demos in this overview video: Qwen3 VL overview.
- Engineers praised the multimodal reach (text–image–audio–video) and image-editing capabilities while debating reliability and where these models stand versus incumbent “2.5 Pro”-class systems.
Inbox Assist: Qwen Emails On Autopilot: Alibaba Qwen announced an email assistant aimed at automating inbox workflows, per this post: Alibaba Qwen on X.
- While some welcomed convenience, others worried that heavy reliance could breed laziness and over-dependence, sparking a thread on appropriate guardrails and opt-in scopes for sensitive data.

3. Agent Benchmarks and Builder Tooling

Meta Moves Agents Into the Real World: Meta introduced Gaia2 (successor to GAIA) and the open Agents Research Environments (ARE) to evaluate agents in dynamic, real-world scenarios, detailed here: Gaia2 + ARE (HF blog).
- The release, under CC BY 4.0 and MIT licenses, positions ARE to replace static puzzle-solving with time-evolving tasks, giving researchers richer debugging and behavioral analysis hooks.
Vibe Coding Goes OSS with Cloudflare VibeSDK: Cloudflare open-sourced VibeSDK, enabling one-click deployment of personalized AI dev environments with code generation, sandboxing, and project deployment: cloudflare/vibesdk.
- Developers explored using VibeSDK to prototype agentic workflows rapidly, calling out the appeal of pre-wired environments for iterative experiments in ‘vibe coding’ sessions.

4. Research Spotlight: Faster Diffusion, Smarter Audio

Eight-Step Sprint Beats Twenty: An independent researcher released a novel ODE solver for diffusion models achieving 8-step inference that rivals/beats DPM++2m 20-step in FID without extra training, with the paper and code here: Hyperparameter is all you need (Zenodo) and TheLovesOfLadyPurple/Hyperparameter-is-all-you-need.
- Practitioners discussed slotting the solver into existing pipelines to cut latency while preserving quality, noting potential gains for high-throughput image generation services.
MiMo-Audio Multitasks Like a Maestro: The MiMo-Audio team shared their technical report, “Audio Language Models Are Few Shot Learners,” and posted demos showing S2T, S2S, T2S, translation, and continuation: Technical Report (PDF) and MiMo-Audio Demos.
- Members highlighted the breadth of tasks handled with minimal supervision and debated dataset curation and evaluation protocols for robust multi-audio benchmarks.

5. DSPy: Profiles, Prompts, and Practical GEPA

Profiles, Please: DSPy Gets Config Hot-Swaps: A lightweight package, dspy-profiles, landed to manage DSPy configurations via TOML with decorators/context managers for quick setup swapping: nielsgl/dspy-profiles and release post.
- Teams reported smoother context-switching across dev/prod environments and faster iteration by standardizing profile-driven LLM behavior.
Prompt Tuning Tames Monitors: A case study, Prompt optimization can enable AI control research, used DSPy’s GEPA to optimize a trusted monitor, evaluated with inspect and code here: dspy-trusted-monitor.
- The author introduced a comparative metric with feedback to train on positive/negative pairs, reporting more robust classifier prompts for safety-style monitoring.

Discord: High level Discord summaries

Perplexity AI Discord

Perplexity Pro’s Pic Paradigm: Limited!: Users discover that Perplexity Pro image generation isn’t unlimited, contrary to expectations, with limits varying widely between accounts, verified by checking this link.
- Concerns were raised about relying on API responses regarding limits, while others suggested that a Gemini student offer as an alternative might yield higher caps.
Qwen Quaternity: VL, Omni, and Image Edit Unleashed: Qwen released Qwen3 Omni, Qwen Image Edit 2509, and Qwen3 VL (Vision Language), sparking discussions about their reliability and capabilities, further detailed in this YouTube video.
- Alibaba Qwen also unveiled an email assistant via this Twitter post, but some users expressed apprehension about potential over-reliance and laziness.
Custom Instructions: Risky Business?: Members debated the advantages of using custom instructions to enhance Perplexity’s search, but one user reported their test account got flagged after testing custom instructions on ChatGPT.
- Some members also suggested setting up an Outlook mail with pop3/gmailify.
Perplexity’s Promos Prompt Proliferation: Users shared referral codes for Perplexity Pro, like this link and this link, in hopes of gaining referral bonuses.
- User skyade mentioned having “2 more besides this one if anyone needs it :)“

LMArena Discord

DeepSeek Terminus Models Debut on LMArena: The latest DeepSeek models, v3.1-terminus and v3.1-terminus-thinking, are now on the LMArena leaderboard for community testing and comparison.
- Users can directly evaluate the new models against existing models to assess their performance.
Udio Eclipses Suno in AI Music Arena: One member declared Udio as nearly decent for AI-generated music, capable of creating tracks that could plausibly pass as human compositions.
- The same member noted Udio is lightyears ahead of Suno, which produces general, boring tracks with distortion issues.
Navigating the AI Image Editing Landscape: Members are recommending Nano Banana or Seedream for image editing AI tasks, since ChatGPT is one of the worst image generation models right now.
- One member noted that ChatGPT is one of the worst image generation models.
DeepSeek Terminus Divides Opinions: Users are testing Deepseek Terminus, and reactions are mixed.
- While some find it promising, others report disappointments, with one user stating DeepSeek totally ruined my code that I made with Gemini and GLM4. 5… Totally disappointed.

Cursor Community Discord

Cursor Users Hit Line Limits: Users are frustrated by Cursor only reading 50-100 lines of code, instead of the desired 3000 lines, suggesting direct file attachment as a workaround.
- One user reported consuming over 500 Cursor points in under a week, deeming the Pro plan financially unsustainable.
GPT-5-CODEX Rollout: A Mixed Bag: The new GPT-5-CODEX model in Cursor receives mixed reviews, with some praising its excellence, while others find it inadequate for tool calling.
- One user reported the model attempted to patch an entire file, similar to OpenAI’s file diff format, while another experienced a 90% success rate.
Chrome DevTools MCP Server Stumbles: Users encountered difficulties setting up Google’s Chrome DevTools MCP server, with one user posting their MCP configuration for assistance.
- Another user recommended downgrading to Node 20 from v22.5.1 or using Playwright as an alternative, especially on Edge.
Zombie Processes Plague Project: Analysis was performed on a zombie process, documented in a project journal entry.
- An escalation report exists for zombie processes, available in the project journal.
GPT-5-HIGH Triumphs Over Claude Sonnet 4: Users have found that the coding model GPT-5-HIGH outperforms Claude Sonnet 4 within their codebase, particularly in listening to instructions.
- The improved code performance and instruction adherence highlight a significant advantage of GPT-5-HIGH over its competitor.

OpenRouter Discord

GPT-5-Codex is Born for Agentic Coding: The API version of GPT-5-Codex is now available on OpenRouter, tuned specifically for agentic coding workflows like code generation and debugging and optimized for real-world software engineering and long coding tasks, with multilingual coding support across 100+ languages.
- It works seamlessly in IDEs, CLIs, GitHub, and cloud coding environments, and has purpose-built code review capabilities to catch critical flaws; see the tweet here.
Deepseek V3.1 Faces Uptime Woes: Users reported frequent Provider Returned Error messages when using the free Deepseek V3.1 model, similar to the issues experienced with the now mostly defunct Deepseek V3 0324.
- A member suggested the consistent uptime percentages of Deepseek models, such as 14%, may indicate bot usage.
OpenRouter iOS App: Freedom to Own Your Models and Chats: A member announced they built an iOS app to interface with OpenRouter, Flowise, and other platforms, aiming to give people the freedom to own their models and chats.
- Another member jokingly responded that it was just more places for gooners to flee to.
Qwen3 VL: The Multimodal Benchmark Breaker: Members expressed amazement at Alibaba’s new Qwen3 VL model and coding product, citing its multimodal support and performance benchmarks that surpass 2.5 Pro.
- One user quipped, “I need to learn Chinese at this rate wtf”, while another shared a link to a post claiming that OpenAI can’t keep up with demand.
4Wallai Benchmarks: Community says, ‘We Need More!’: Members shared and enjoyed a link to 4wallai.com.
- Following the enjoyment of the linked benchmark, a member suggested that more benchmarks like this are needed, expressing a desire for additional resources to evaluate and compare AI models effectively.

HuggingFace Discord

Chatters Debate Narration APIs: Members debated using TTS APIs versus LLMs for narration; while one member suggested any TTS API would work for $0.001 for 2k tokens, others suggested using LLMs like Qwen3 or phi-4 with a TTS program.
- They also noted that using a bigger GPU or a smaller model would increase speeds, as well as techniques like quantization and batching calls.
ML Courses Spark Debate: Members debated the usefulness of video courses such as Andrew Ng’s Machine Learning Specialization, the Hugging Face LLMs course, and FastAI Practical Deep Learning; some members suggested skipping them in favor of learnpytorch.io.
- The members suggested implementing models in PyTorch from scratch to understand how they work conceptually rather than passively watching videos.
Tokenizers Go Wrapper needs Maintainers: A member has written a Go wrapper for the tokenizers library and is seeking help to maintain and improve it.
- The member hopes for community assistance in enhancing the functionality and reliability of the wrapper.
Canis.lab Opens Doors: A member shared a launch video about Canis.lab, focusing on dataset-first tutor engineering and small-model fine-tuning for education, which is open-source and reproducible, and asking for feedback on data schema.
- They also included links to the GitHub repository and the Hugging Face page.
Gemini Struggles on Menu Translation: A developer is seeking advice on improving a menu translation app, Menu Please, when dealing with Taiwanese signage menus where characters are unusually spaced, causing the Gemini 2.5 Flash model to fail.
- The spacing between characters of the same menu item is often wider than between adjacent items, with a provided image example.

GPU MODE Discord

NCU Clock Control Confounds Kernel Speeds: Setting --clock-control none with NCU aligns it better with do_bench() in measuring kernel speeds, as shown in this YouTube video.
- However, questions arose around fixed clock speeds accurately representing real-world GPU kernel performance, particularly with concerns about NCU downclocking some kernels.
mbarrier Instructions Merge Copies and Work: The mbarrier.test_wait instruction is non-blocking, checking for phase completion, whereas mbarrier.try_wait is potentially blocking, according to Nvidia Documentation.
- The default version of cuda::barrier synchronizes copies and any work done after starting the copies, also employed in cuda::barrier + cuda::memcpy_async, ensuring the user still arrives on the barrier; members suggest ditching inline PTX and using CCCL for most cases.
CUDA Engineers Shun LLMs, Trust Docs: For CUDA insights, the NVIDIA documentation remains the definitive source of truth, as LLMs frequently generate incorrect CUDA information.
- Engineers propose calculating values used and operations performed to determine if a process is memory bound or compute bound to optimize CUDA.
Cubesats Go Amateur with RasPi Reliability: Amateur cubesats leveraging RasPi show effectiveness in space applications, according to members referencing Jeff Geerling’s blogpost.
- The success of the Qube Project highlights the practical application of cubesat technology, including redundancy via master-slave architecture for error correction.
Singularity Syntax Stumps Slurm Setups: Developers grapple with GPU reservations amidst limited resources, leaning towards Slurm for fractional GPU support and prefer Singularity over Docker for cluster containerization due to security concerns.
- The team questioned why Singularity’s syntax diverges from Docker’s, even as members touted llm-d.ai for cluster-managed LLM workloads, with one member questioning the wisdom of using Slurm + Docker.

Latent Space Discord

Meta’s ARE and Gaia2 Evaluate Dynamic Agents: Meta SuperIntelligence Labs introduced ARE (Agents Research Environments) and Gaia2, a benchmark for evaluating AI agents in dynamic real-world scenarios.
- ARE simulates real-time conditions, contrasting with static benchmarks that solve set puzzles.
Cline’s Agentic Algorithm Reduced to Simple States: Ara simplified Cline’s agentic algorithm into a 3-state state machine: Question (clarify), Action (explore), Completion (present).
- The member highlighted that the critical components include a simple loop, good tools, and growing context.
Greptile Nets $25M for Bug-Squashing AI v3: Greptile secured a $25M Series A led by Benchmark and launched Greptile v3, an agent architecture that catches 3× more critical bugs than v2, with users including Brex, Substack, PostHog, Bilt and YC.
- The recent version boasts Learning (absorbs team rules from PR comments), MCP server for agent/IDE integration, and Jira/Notion context.
Cloudflare’s VibeSDK Opens Doors to AI ‘Vibe Coding’: Cloudflare unveiled VibeSDK, an open-source platform enabling one-click deployment of personalized AI development environments for so called vibe coding.
- VibeSDK features code generation, a sandbox, and project deployment capabilities.
GPT-5-Codex Costs Prompt Developer Debate: OpenAI rolled out GPT-5-Codex via the Responses API and Codex CLI, sparking excitement alongside concerns about cost and rate limits, priced at $1.25 input, $0.13 cached, $10 output.
- Users are requesting Cursor/Windsurf integration, GitHub Copilot support, and lower output costs.

Yannick Kilcher Discord

Decoding Diffusion with ODE Solver: An independent researcher unveiled a novel ODE solver for diffusion models, achieving 8-step inference that rivals DPM++2m’s 20-step inference in FID scores without extra training. The paper and code are publicly available.
- This advancement promises significant speed and quality enhancements for diffusion-based generative models.
MiMo-Audio Models Mimic Multitasking Marvels: Members spotlighted MiMo-Audio and its technical report, “Audio Language Models Are Few Shot Learners”, noting its versatility in S2T, S2S, T2S, translation, and continuation, as highlighted in their demos.
- The project showcases the potential of audio language models to handle multiple audio-related tasks with minimal training.
Meta’s Gaia2 and ARE Framework Assesses Agent Acumen: Meta launched Gaia2, the successor to the GAIA benchmark, alongside the open Meta Agents Research Environments (ARE) framework (under CC by 4.0 and MIT licenses) to scrutinize intricate agent behaviors.
- ARE furnishes simulated real-world conditions for debugging and evaluating agents, overcoming limitations in existing environments.
Whispers Swirl: GPT-5 Speculation Surfaces: Channel members speculated on the architecture of GPT5, questioning if GPT5 low and GPT5 high represent distinct models.
- One member posited a similarity to their OSS model, suggesting adjustments to reasoning effort via context manipulation or the possibility of distinct fine-tunes.

LM Studio Discord

LM Studio selectively supports HF Models: Users inquired if all HuggingFace models are available on LM Studio, but learned that only GGUF (Windows/Linux/Mac) and MLX Models (Mac Only) are supported, excluding image/audio/video/speech models.
- Specifically, the facebook/bart-large-cnn model is unsupported, highlighting that Qwen-3-omni support depends on llama.cpp or MLX compatibility.
Qwen-3-Omni needs serious audio video decoding: Members discussed the possibility of supporting Qwen-3-omni, which handles text, images, audio, and video but would take a very long time to support.
- It was noted that while the text layer is standard, the audiovisual layers involve lots of new audio and video decoding stuff.
Google bestows Gemini gifts to students: Google is offering a year of Gemini for free to college students.
- One member expressed gratitude, stating, I use it free daily so getting premium for free is nice.
Innosilicon flaunts Fenghua 3 GPU: Innosilicon has revealed its Fenghua 3 GPU, which features DirectX12 support and hardware ray tracing capabilities according to Videocardz.
- A user shared a link to a Reddit post in r/LocalLLaMA.

aider (Paul Gauthier) Discord

Aider Adds GPT-5-Codex Support via Responses API: Aider now supports GPT-5-Codex via the Responses API, addressing issues with the older v1/chat/completions endpoint, detailed in this pull request.
- Unlike previous models, GPT-5-Codex exclusively uses the Responses API, which required an update to handle this specific endpoint in aider.
Navigating Aider-Ollama Configuration: A user sought advice on how to configure aider to read a specific MD file defining the AI’s purpose when used with Ollama.
- Specifically, the command aider --read hotfile.md did not work as expected, so more context may be needed to diagnose.
Context Retransmission in Aider & Prompt Caching: Users observed that aider retransmits the full context with each request in verbose mode, sparking discussion about efficiency.
- It was confirmed that while this is standard behavior, many APIs leverage prompt caching to reduce costs and improve performance, which aider leaves as an open choice for the user.
Aider’s Alphabetical Sorting of File Context: A user highlighted that aider sorts file context alphabetically, rather than preserving the order in which files were added.
- This user had started a PR to address the issue, but stopped, citing inactivity in merging pull requests.

Modular (Mojo 🔥) Discord

RISC-V Performance Trails Phone Cores: Members observed that RISC-V cores generally underperform compared to modern smartphone cores, excluding microcontroller SoCs.
- One anecdote cited a cross-compilation of SPECint from an UltraSPARC T2 to a faster native compilation on a RISC-V device.
Tenstorrent Eyes RISC-V Performance Boost: Tenstorrent’s MMA accelerator + CPU combos were highlighted as a promising avenue to enhance RISC-V performance.
- Specifically, Tenstorrent’s Ascalon cores are viewed as the most likely to significantly impact RISC-V performance within the next five years, utilizing small in-order cores to drive 140 matrix/vector units.
RISC-V Faces Bringup Growing Pains: RISC-V 64-bit is functional but needs considerable bringup effort, with vector capabilities currently unavailable.
- Integrating RISC-V requires adding it to all architecture-specific if-elif-else chains and implementing a requires mechanism, which is currently lacking in the language.

OpenAI Discord

OpenAI’s Stargate Project Leaps Forward: OpenAI has announced five new Stargate sites in partnership with Oracle and SoftBank, making significant progress on their 10-gigawatt commitment, detailed in their blog post.
- This collaboration aims to accelerate the deployment of extensive compute resources, putting the project ahead of schedule to reach its ambitious 10-gigawatt target.
Sora Faces Generation Snags: Users are reporting issues with Sora’s video generation capabilities, with questions raised about potential fixes.
- However, no specific timeline or official response has been provided regarding when these issues might be resolved.
GPT4o’s Translation Hiccups with Chain of Thought: A member discovered that the translation quality of GPT4o suffers when using a chain of thought prompt compared to direct translation.
- Specifically, asking GPT4o to identify the input language and outline a three-step thought process before translating leads to less effective results.
GPT-5-Minimal Model Assessed: According to this image, the GPT-5-Minimal model performed worse than Kimi k2, but High is the best overall for agentic use cases.
- The models High (only via API) < Medium < Low < Minimal < Fast/Chat (non-thinking).

DSPy Discord

DSPy gets profile package: A member released dspy-profiles, a lightweight package for DSPy that manages configurations with toml, enabling quick setup swaps and tidy projects, also published to Xitter.
- The tool allows easy switching of LLM behavior with a single command, and is available as decorators and context managers, aiming to eliminate context boilerplate, and was originally motivated by managing dev/prod environments.
GEPA Multimodality Plagued by Problems: A member reported a severe performance issue with GEPA Multimodality, linking to a related GitHub issue.
- The user indicated that their use case requires catering to multiple users, but did not offer enough details about which use case specifically.
Passing PDFs & Images into DSPy is Explored: A member inquired about passing images or PDFs into DSPy for data extraction, and the community discussed VLMs vs LLMs for extracting chart information from images and PDFs.
- Another member pointed out that one can pass images into DSPy with this dspy.ai API primitive.
Prompt Optimization Powers AI Safety Research: A member published a post, Prompt optimization can enable AI control research, explaining how they used DSPy’s GEPA to optimize a trusted monitor, evaluated using inspect, with code here: dspy-trusted-monitor.
- The author introduced a comparative metric with feedback, passing one positive and one negative sample through the classifier at a time, and scored the pair based on whether the positive sample score was greater than the negative sample score.

tinygrad (George Hotz) Discord

Triton’s Abstraction Level Debated: Discussion highlights the benefits of high-level IRs like Triton, but also points out the need for a multi-layer stack to interface with lower-level hardware, such as the Gluon project.
- The current Nvidia-specific nature of Gluon is a limitation.
Single IR Falls Short: A single high-level IR is insufficient for all users and use-cases, citing the divergent needs of PyTorch users seeking speedups versus those optimizing mission-critical HPC projects.
- As there is not really going to be this goldilocks zone where the abstraction level of the IR is just right for all users and use-cases.
Tinygrad Taps Bitter Lesson: Tinygrad’s vision involves leveraging the bitter lesson to combine the benefits of incomplete and complete IRs, using UOps as a hardware-incomplete representation.
- The goal is to search over the space of rendered programs that implement the UOps to find the fastest one.
Neural Compilers on the Horizon: Emphasis is placed on the importance of search and neural compilers, with a particular interest in GNNs or other graph-based models.
- The suggestion is to create a multi-stage compiler that utilizes graph-based models per stage.

Nous Research AI Discord

Evaluating TRL Assessment: A member inquired about a TRL (Technology Readiness Level) assessor and whether it’s worthwhile to red team their own stack using a new ecosystem, suggesting a move to <#1366812662167502870> for specific discussions.
- The conversation expressed interest in evaluating the practical readiness of their technology stack with the new ecosystem.
Nous Tek Gets Praise: A member affirmed “Nous tek”, leading another member to offer assistance in answering questions.
- The exchange highlights the positive sentiment and community support within the channel.
Distributing AI Training on VPSs: A member explored the feasibility of training an AI model using distributed learning across multiple VPSs, utilizing resources like Kubernetes and Google Cloud.
- They expressed interest in accelerating training cycles with datasets derived from operational data, while also addressing safety rails for hardware management.
Exploring Model Tuning via Code Genetics: A member explored using code genetics via OpenMDAO to automate adjustable parameters and Terraform for infrastructure control, questioning the necessary audit systems and methods for vetting synthetic data.
- Their aim is to influence parameters of models already in use, distinguishing it from techniques like Nate Lora.
Model Non-Homology Concerns: A member explained that after pretraining to a stable loss value, models fix tokenized structures, creating a solid “world state” that is hard to shift without collapsing the structure, leading to non-homologous models.
- While fine-tuning can generate task vectors around a manifold, comparing datasets requires a common base, as models become non-homologous otherwise.

Eleuther Discord

Researcher Crafts Mathematical AI Coherence: An independent researcher is crafting mathematical frameworks for AI behavioral coherence, enabling real-time semantic control over language models without retraining.
- The project is validating cross-model consistency and investigating how mathematical constraints can enhance AI system interpretability.
Davinci’s Design Diagrammed: According to a member, Davinci employs GPT-2’s transformer architecture with locally-banded dense and sparse attention patterns and a 4x FFN.
- A member clarified that these architectural details are documented in the GPT-3 paper.
Zero-Knowledge ML Validates Model Integrity: A member suggested leveraging Zero Knowledge Proofs (ZKML), so inference providers can prove they haven’t tampered with model quality or data.
- The member cautioned that the technique is still slow, limiting its immediate practicality.
SwiGLU Guard Against Finetuning: A member proposed using the SwiGLU up-projection to deter finetuning, multiplying random terms in the up-projection by large values and applying inverse values in the down-projection.
- The member predicted standard AdamW recipes will fail, given quantitization recipes.
Model Tamper Resistance Measures: A member contested the idea of a priori tamper resistance, stating that mitigation is an open technical problem when releasing models.
- The member noted that their recent paper achieved a 3 OOM improvement in tamper resistance.

Moonshot AI (Kimi K-2) Discord

Pydantic-AI Library Simplifies Implementation: A member suggested using the pydantic-ai library due to its neat implementation of a specific flow.
- They noted the library includes a plug-and-play component capable of accomplishing tasks in approximately 10 lines of code.
Example Topic: This is another topic.
- Details about the topic.

Windsurf Discord

GPT-5-Codex Lands on Windsurf: The new GPT-5-Codex model from OpenAI is now live in Windsurf and is free for paid users for a limited time, as per this announcement.
- Free tier users can access it at 0.5x credits, prompting users to reload Windsurf to access the new model.
Windsurf Launches Official Figma MCP Server: A new official Figma MCP server is now available in the Windsurf MCP store, discussed in this post.
- This integration allows users to paste Figma links directly into Windsurf without requiring the Figma desktop app.
Migrate to New Figma MCP Server: Users of the previous Figma Dev Mode MCP server are advised to install the new official Figma MCP server.
- This migration ensures access to Figma’s new remote MCP server, enabling better integration with Windsurf.

MCP Contributors (Official) Discord

Apify & Jentic Throw Down Happy Hour Gauntlet: Apify and Jentic are hosting a happy hour; details are on the Luma website.
- One member mentioned plans to attend both events.
Dev Summit Tix Vanish Into Thin Air: The Dev Summit is expected to sell out in approximately two days, following a pattern similar to the previous event, where tickets were gone a week prior.
- Prospective attendees are encouraged to secure their tickets ASAP!

Manus.im Discord Discord

Token Allocation Troubles: A user expressed a desire for higher-level plans to offer more tokens per day, rather than only a chunk per month.
- The user indicated that the current allocation model does not align with their usage patterns.
Affordability Anguish Aired: A user praised Manus but voiced concerns about the cost, stating they wish they could afford more of it.
- The user’s sentiment highlights a potential barrier to wider adoption despite positive feedback on the product itself.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (826 messages🔥🔥🔥):

Image Generation Limits on Perplexity, Qwen Model Releases, Using Custom Instructions, Perplexity Email Assistant, Open Router Web Search Functionality

Perplexity Pro Image Generation has Limits?: Users are reporting that Perplexity Pro does not offer unlimited image generation, despite initial impressions, and that image generation limits vary between accounts, with some users having limits as low as 99 while others have 600.
- It was shared that one can check their own limit with this link, but also cautioned users against relying on API responses regarding limits, and suggested Gemini student offer as an alternative to increase the limit.
Qwen releases new models: Qwen3 Omni and Qwen Image Edit 2509 were released, as well as Qwen3 VL (Vision Language), and the community discusses whether or not these models are trustworthy.
- A link to a YouTube video showcasing Qwen3 VL and a Twitter post from Alibaba Qwen was shared, highlighting the release of an email assistant, though one user expressed skepticism about relying on such tools due to potential laziness and over-dependence.
How to Leverage Custom Instructions and Risk Ban?: Members discuss the utility of using custom instructions to enhance Perplexity’s search capabilities, but one user shares that their burner account got taken down after testing custom instructions on ChatGPT, and another user cautioned against admitting to spamming new accounts on Perplexity’s official server.
- Members also suggested setting up an Outlook mail and pop3/gmailify, while others were concerned about getting banned again.
Perplexity Email Assistant: Yay or Nay?: A member shares a link to Perplexity’s Email Assistant, but are worried about giving LLM access to their email.
- A user experiencing email overload is looking for advice on the assistant’s utility, and is concerned that AI could get full access to their directory and delete everything.
Open Router Web Search a Bust?: Members are reporting that the web search functionality on Open Router is terrible, costing 2 cents and only using 5 sites.
- The users also discussed the utility of open-source BYOK alternatives.

Shareable threads on Perplexity, Perplexity Pro Referral Codes

Perplexity Prompts Shareable Threads: Perplexity AI reminded users to ensure their threads are set to Shareable, with a link provided for reference.
- This is likely intended to promote easier sharing and accessibility of discussions within the Perplexity AI community.
Prolific Perplexity Pro Promo Push: Multiple users shared their referral codes for Perplexity Pro, including this link and this link.
- User skyade mentioned having “2 more besides this one if anyone needs it :)”.

LMArena ▷ #general (294 messages🔥🔥):

Image Editing AI, Nano Banana, Seedream, Model Awareness of Conversation History, GPTs Agents

Editing AI? Nano Banana and Seedream Deliver: Members state that there is no picture editing AI in Chat GPT, instead recommending Nano Banana or Seedream for such tasks.
- One member noted that ChatGPT is one of the worst image generation models right now.
Model Amnesia? Prompting Pitfalls Exposed: A user inquired whether new models in side-by-side mode are aware of previous conversation history after being switched out, but didn’t receive an answer.
DeepSeek Terminus Debated: OP or Overhyped?: Users are testing out Deepseek Terminus, with one saying I’d say it’s good - but no idea in relation to Opus which I have not tried.
- Another member chimed in saying DeepSeek totally ruined my code that I made with Gemini and GLM4. 5… Totally disappointed.
Suno Snubbed? Udio’s AI Music Ascent: One member said Udio is almost decent as an AI generated music platform, and at times it can almost fool you into thinking its human composed.
- The member added that Udio is lightyears ahead of Suno - which only do very general and boring tracks where the distortion go up toward the end of each clip.
Pineapple’s Predicament: Culinary Condemnation Commences: The bot Pineapple chimed in after a user said they ate pineapple for dinner, followed by a meme that included a threat to eat Pineapple.
- After another user said i don’t like pineapple on pizza, a user from Italy replied this is like a punch in the face for me.

LMArena ▷ #announcements (1 messages):

deepseek-v3.1-terminus, LMArena, Model Evaluation

DeepSeek Terminus Models Join LMArena: The latest DeepSeek models, v3.1-terminus and v3.1-terminus-thinking, have been added to the LMArena leaderboard for community evaluation.
- These models are now accessible for direct comparison and testing within the LMArena environment.
LMArena Welcomes New DeepSeek Variants: LMArena’s platform now includes the deepseek-v3.1-terminus and deepseek-v3.1-terminus-thinking models, enhancing its model comparison capabilities.
- Users can engage with these new additions to assess their performance against existing models.

Cursor Community ▷ #general (248 messages🔥🔥):

Cursor line reading limits, GPT-5-CODEX rollout, Chrome DevTools MCP Server, Playwright MCP Alternative, Supernova model evaluation

Cursor’s Line Reading Limits Irk Users: A user expressed frustration that Cursor reads only 50-100 lines of code, desiring it to read over 3000; another user suggested attaching the file directly, so it reads more.
- Another user mentioned that they used over 500 Cursor points in less than a week, suggesting that the Pro plan is too expensive for their needs.
GPT-5-CODEX Debuts with Mixed Reviews: Users are testing the newly released GPT-5-CODEX model in Cursor, some reporting it to be excellent, while others find it terrible at tool calling, often resorting to using the terminal; a user suggested that the Cursor team might fix it with a custom prompt.
- One user noted the model tried to patch an entire file instead of using tool calls, similar to OpenAI’s file diff format for edits, while another experienced a 90% success rate with GPT5.
Google’s Chrome DevTools MCP Server Faces Installation Hurdles: A user struggled to get Google’s Chrome DevTools MCP server working, posting their MCP configuration; another user recommended downgrading to Node 20, as the user was on v22.5.1.
- A user offered alternative suggestion to clear cache and use Playwright as MCP alternative and mentioned that they use edge.
Assessing the Enigmatic Supernova Model: Users discussed the mysterious supernova model, with one member reporting that they couldn’t disclose who the model is; another user mentioned that they are using Auto model to quickly draft things.
- There was speculation whether the Auto model’s improvements could eventually replace developers’ jobs, prompting a playful response about the model’s potential.
GPT-5-HIGH vs Claude Sonnet 4: Code Combat: Users discussed the efficiency of the coding models, one mentioned that GPT-5-HIGH preforms better than Claude Sonnet 4 in their codebase.
- They also admitted that claude don’t listen to instructions for nothin and mentioned that GPT5 listens.

Cursor Community ▷ #background-agents (2 messages):

Zombie process analysis, Zombie process escalation

Zombie Processes Analyzed: Analysis was performed on a disturbing zombie process, documented in a project journal entry.
- The situation is considered not critical.
Zombie Process Escalation Report: An escalation report exists for zombie processes, available in the project journal.

OpenRouter ▷ #announcements (2 messages):

GPT-5-Codex launch, Agentic coding workflows, OpenRouter-compatible coding tools, Chatroom recommended parameters

GPT-5-Codex Goes Live!: The API version of GPT-5-Codex is now available on OpenRouter, tuned specifically for agentic coding workflows like code generation and debugging.
- It is usable across all OpenRouter-compatible coding tools, has multilingual coding support across 100+ languages and dynamically adapts reasoning effort.
GPT-5-Codex Optimized for Software Engineering: GPT-5-Codex is optimized for real-world software engineering and long coding tasks.
- It also has purpose-built code review capabilities to catch critical flaws, and works seamlessly in IDEs, CLIs, GitHub, and cloud coding environments; see the tweet here.
Chatroom Parameters Recommended: Recommended parameters for models have been published in a new tweet.
- See here for further details.

OpenRouter ▷ #app-showcase (1 messages):

eofr: Scam

OpenRouter ▷ #general (173 messages🔥🔥):

Deepseek 3.1 uptime issues, OpenRouter iOS app, Qwen3 VL

Deepseek V3.1 Plagued by Uptime Issues: Users reported frequent “Provider Returned Error” messages when using the free Deepseek V3.1 model, similar to the issues experienced with the now mostly defunct Deepseek V3 0324.
- One member suggested the consistent uptime percentages of Deepseek models, such as 14%, may indicate bot usage, while another joked that users’ requests are being routed to the “trash.”
Developer creates OpenRouter iOS App: A member announced they built an iOS app to interface with OpenRouter, Flowise, and other platforms, aiming to give people the freedom to own their models and chats.
- Another member jokingly responded that it was just *“more places for gooners to flee to.”
Qwen3 VL impresses with multimodal capabilities: Members expressed amazement at Alibaba’s new Qwen3 VL model and coding product, citing its multimodal support and performance benchmarks that surpass 2.5 Pro.
- One user quipped, “I need to learn Chinese at this rate wtf”, while another shared a link to a post claiming that OpenAI can’t keep up with demand.

OpenRouter ▷ #new-models (3 messages):

“

No new models discussed: The channel is named new-models, but there were no actual models discussed in the provided Discord messages.
Channel title reiterated: The messages simply repeat the channel title, OpenRouter - New Models, three times.

OpenRouter ▷ #discussion (2 messages):

4Wallai benchmarks

4Wallai benchmarks are enjoyed: Members shared and enjoyed a link to 4wallai.com.
- Another member said that there is a need for more benchmarks like this.
More benchmarks are needed: Following the enjoyment of the linked benchmark, a member suggested that more benchmarks are needed.
- They expressed a desire for additional resources to evaluate and compare AI models effectively.

HuggingFace ▷ #general (100 messages🔥🔥):

TTS narration, Open Models for narration, ML Course recommendations, Private LLM

Chatters debate using TTS APIs versus LLMs for Narration: A user asked for the best open model to narrate a chapter from a book, and one member suggested that for 2k tokens, any TTS API would work for $0.001.
- However, other members suggested using LLMs like Qwen3 or phi-4 and a simple TTS program.
Discordians recommend ML courses and PyTorch: One user asked for recommendations on ML/AI courses, citing Andrew Ng’s Machine Learning Specialization, the Hugging Face LLMs course, and FastAI Practical Deep Learning for Coders.
- Several members suggested skipping the video courses and instead suggested learnpytorch.io and implementing models in PyTorch from scratch to understand how they work conceptually.
Faster hardware or smaller models suggested for Chatbot: A user looking for a partner to help with a custom LLM that makes 10 LLM calls and 20+ prompts was advised that the easiest way to get faster speeds is to use a bigger GPU or a smaller model.
- Quantization can increase speed at the cost of quality, batching calls together if you have enough constant throughput to fill a batch, and that the biggest gains are through smaller models, bigger hardware, and smaller quantizations.
Take corpo stuff request is vague: A user wanted to get some of the corporate stuff out of the model, and one member responded to the vague request to consider reading API TOS and understanding the laws of space time and physics.
- The same member continued, What you seem to be asking for is a rainbow glitter unicorn fairy mermaid with wings who shoots sparkles from both ends.

HuggingFace ▷ #i-made-this (4 messages):

Go wrapper for tokenizers library, Canis.lab launch

Go wrapper seeks maintainers: A member has written a Go wrapper for the tokenizers library and is seeking help to maintain and improve it.
Canis.lab launched for tutor engineering: A member shared the Canis.lab launch video which is about dataset-first tutor engineering and small-model fine-tuning for education, which is open-source and reproducible.
- It also includes links to the GitHub repository and the Hugging Face page, also requesting feedback on data schema.

HuggingFace ▷ #computer-vision (1 messages):

Menu Translation, Gemini 2.5 Flash, Taiwanese Signage Menus, OCR for spaced characters

Menu Translation App Meets Spaced Character Challenge: A developer is seeking advice on improving a menu translation app, Menu Please, when dealing with Taiwanese signage menus where characters are unusually spaced.
- The issue arises with Gemini 2.5 Flash failing to accurately translate menu items due to inconsistent character spacing in the images.
Gemini Struggles with Kanban Character Spacing: The developer notes that the Gemini 2.5 Flash model struggles when translating Taiwanese signage menus (Kanban) due to inconsistent character spacing.
- The spacing between characters of the same menu item is often wider than between adjacent items.
OCR Tricks: To solve this, the developer has already tried to provide few-shot examples of spaced characters in horizontal and vertical orientations to Gemini.
- They also attempted to guide the model to identify anchors like bullet points and prices, combined with reading direction, to determine item boundaries, using a provided image example.

HuggingFace ▷ #smol-course (2 messages):

Canis.lab, Synthetic Data, Eval Dataset issues

Dataset troubles found by user: A member reported an issue where the eval dataset cannot be found, while looking inside HuggingFace datasets.
- The user mentioned the dataset lighteval|gsm8k specifically.
Canis.lab workflow introduced for Synthetic Data: A member introduced Canis.lab, a lightweight, open-source workflow to blueprint, generate, and validate targeted datasets for small tutor models, sharing a launch video and the repo link.
- The member is looking for feedback, especially in the context of what the course aims to teach.

HuggingFace ▷ #agents-course (1 messages):

RAG Courses, Bangla Retrieval, Multimodal Support

Member requests RAG course recommendations: A member asked for suggestions for a good RAG course, specifically for Bangla-based retrieval and multimodal support.
Community Awaits RAG Course Suggestions: Other members are likely to chime in with recommendations for RAG courses tailored to Bangla retrieval and multimodal applications.

GPU MODE ▷ #general (14 messages🔥):

Python Profiling, DeepGEMM Benchmarking, NCU Clock Control, GPU Kernel Downclocking

Quest for a Good Python Profiling Plugin: A member is searching for a reliable Python profiling function, having tested DeepGEMM, Triton’s do_bench, and NCU, noting inconsistencies in kernel timing across different tools like NCU and Kineto.
NCU Gets Clock Controlled: Setting --clock-control none with NCU made it agree with do_bench() better, resolving relative disagreements in kernel speeds; however, questions arose on whether fixed clock speeds accurately represent real-world GPU kernel performance.
- It was noted that a YouTube video explains the topic well.
NCU Downclocks Kernels: The member questioned why NCU downclocks some kernels and whether benchmarking with a fixed clock is representative.
- Another member suggested that fixed clock speed reduces benchmark variance and improves reproducibility, regardless of external factors like a hot day.

GPU MODE ▷ #cuda (3 messages):

mbarrier instructions, cuda::barrier, cuda::memcpy_async, inline PTX, CCCL

mbarrier instructions detailed: The mbarrier.test_wait is a non-blocking instruction which tests for the completion of the phase, whereas mbarrier.try_wait is a potentially blocking instruction which tests for the completion of the phase.
- If the phase is not complete, the executing thread may be suspended but resumes execution when the specified phase completes OR before the phase completes following a system-dependent time limit, according to Nvidia Documentation.
cuda::barrier syncs copies and work: The default version (no .noinc) of cuda::barrier assumes that you want to synchronize not only the copy but also any work you did with the threads in the meantime after starting the copies.
- This is also used in cuda::barrier + cuda::memcpy_async so the user still has to arrive on the barrier.
Skip inline PTX, use CCCL: You do not need to write inline PTX for most things, as CCCL covers most bases.
- You can even still work with cuda::barrier and get the underlying mbarrier with cuda::device::barrier_native_handle.

GPU MODE ▷ #beginner (2 messages):

CUDA Documentation, Memory vs Compute Bound

CUDA Docs Trump LLM CUDA-vice: Members affirm that for CUDA, the NVIDIA documentation remains the single source of truth, especially given that LLMs frequently generate incorrect information on CUDA.
- Therefore engineers should rely on the documentation, and not the LLM’s ‘hallucinations’.
Bound to Memory or Compute?: To optimize CUDA, a member suggests calculating the number of values used (memory) and operations performed (FLOPS) to determine if the process is memory bound or compute bound.
- The member states: if it is memory bound, your SOL will be memory bandwidth; if it is compute bound, your SOL will be (max FLOPS per SM x count of SMs).

GPU MODE ▷ #off-topic (20 messages🔥):

Slurm Reading Material, Sysadmin/Devops Channel, Kubernetes + Slurm + Docker, Flux from LLNL

Slurm Docs Get Gold Star: A member asked for Slurm reading material and the response was to just read the docs, they were good.
- Another member stated they would be interested in a Slurm discussion as they are also trying to maintain such a cluster.
Sysadmin/Devops Channel Debated: Members discussed the potential creation of a sysadmin/devops/scheduling channel for discussing complaints and Slurm cluster maintenance.
- One member said it would be cool to see what people do with it.
Kubernetes, Slurm, and Docker Convergence: Members proposed combining Kubernetes, Slurm, and Docker, noting the possibility of integrating Docker and Slurm.
- They linked to Coreweave’s documentation on running Slurm on Kubernetes, but one member said k8s is too much yaml i dont want to touch it.
Flux Framework Floated: A member introduced Flux from LLNL, a job orchestration/resource management framework for clusters found here.
- They noted Flux isn’t as popular as Slurm due to being newer and HPC-focused.

GPU MODE ▷ #self-promotion (7 messages):

CuTe Layout Algebra, Colfax Team Paper, Categorical treatment, WMMA/MMA instruction, NVRTC MMA

Layout Gymnastics Blogpost Launches!: Simon Veitner released a blog post detailing his manual derivation of examples from Chapter 2 of the CuTe Layout Algebra paper by the Colfax Team, covering operations like Coalescing, Completion, and Composition.
Colfax Paper Explored with Layout Gymnastics!: Veitner’s post serves as a companion for readers working through the original Colfax paper which is a full mathematical treatment of Layout Algebra.
MMA Instruction Post Coming Soon!: One member mentioned he is studying WMMA, MMA, and WGMMA instructions with a potential blog post in the future, focusing on topics “that are hard to approach and that isn’t covered by many resources”.
NVRTC MMA Instructions Explored: A blog post about using NVRTC to explore MMA instruction variants was shared, linked to gau-nernst’s blog.

GPU MODE ▷ #avx (2 messages):

AVX512, BPE, Tiktoken, Huggingface, Data Loading Optimization

AVX512 BPE Implementation Sought for Speed: A member is seeking an AVX512 implementation of BPE (Byte Pair Encoding) because Tiktoken is slow AF and the Hugging Face implementation is latency bound, significantly slowing down data loading.
Tiktoken and Hugging Face BPE Performance Issues: The user reports that Tiktoken’s speed is unsatisfactory, while Hugging Face’s BPE implementation suffers from latency issues, impacting overall data loading performance.

GPU MODE ▷ #edge (2 messages):

Cubesat hardware, Cubesat software, Error Correction, Redundancy, RasPi Cubesats

RasPi Powers Amateur Cubesats: Amateur cubesats are built with RasPi and work quite well, according to a member, highlighting their effectiveness in space applications and mentioning the Jeff Geerling blogpost.
- The discussion covered the reliability and suitability of using Raspberry Pi in educational satellite projects.
Cubesat Project Success: A member discussed their work on the software operations for ground systems for the Qube Project launched last year, highlighting the practical application of cubesat technology.
- They focused on ground systems software operations.
Redundancy via Master-Slave Architecture: The channel talked about having redundant modules for each core feature/ master-slave in the cubesats.
- These would reset based on error-correction checks.

GPU MODE ▷ #submissions (2 messages):

MI300x8, amd-gemm-rs leaderboard

MI300x8 personal best: A user achieved a personal best on MI300x8: 575 µs.
- The submission id is 43091 on the amd-gemm-rs leaderboard.
MI300x8 successful run: A user had a successful run on MI300x8: 589 µs.
- The submission id is 43133 on the amd-gemm-rs leaderboard.

GPU MODE ▷ #status (1 messages):

Runner Issues, Timeouts, Debugging with AMD and DigitalOcean

Runner Hiccups Cause Timeout Tumult: The team is experiencing issues with their runners, leading to unexpected timeouts.
- They are actively debugging the problem in collaboration with AMD and DigitalOcean, and promise to provide updates as they work towards a resolution.
Debugging Underway with AMD and DigitalOcean: The team is actively debugging issues with their runners, collaborating with AMD and DigitalOcean to resolve unexpected timeouts.
- Updates will be provided as they work towards a solution.

GPU MODE ▷ #factorio-learning-env (3 messages):

GEPA, Deepseek Neel eval

GEPA integration debated for v0.0.3: Members discussed integrating GEPA before releasing version 0.0.3 of their project.
- One member suggested it would be a nice addition, while another cautioned against letting it delay the release due to its potentially open-ended exploration.
Deepseek Neel evaluation on the table: A member inquired about running an evaluation on Deepseek Neel, providing a link to the model on Hugging Face.
- No further details were provided.

GPU MODE ▷ #amd-competition (33 messages🔥):

MI300X Environment, Docker Image for Benchmarks, GEMM Submission Timeout, Cluster Health Issue, All2All Custom Kernel Data Access

MI300X Environment Specs Plotted: Members discussed defining the environment for testing MI300X, suggesting that any place supporting 8x MI300X should be adequate, with AMD DevCloud or HotAisle as potentially the cheapest options.
- It was emphasized that replicating the exact testing environment, including Python, Torch versions, and other dependencies, is critical for 1:1 testing, linking to the AMD Dockerfile used for benchmarks.
AMD Docker Image Proffered for Benchmarks: A member pointed out that the exact Docker image used for benchmarks can be fetched, noting that AMD is not finicky with performance counters in Docker, and linked the AMD Dockerfile.
- It was noted that while the image is published, its location is unknown, and HotAisle being bare metal allows for easy building on the machine, with Runpod also mentioned as a viable option.
GEMM Submission Times Out!: A user reported a timeout issue with the submission cluster for GEMM, even with the reference kernel.
- It was suggested that the submission code be modified to allow multiprocess per same GPU for correctness, and to use git for syncing and AMD Dev Cloud for saving snapshots, but others pointed out submissions have been recent, and a cluster health issue may be responsible for the timeout.
Cluster Health Falters: Members indicated a likely cluster health issue causing submission timeouts, with the team awaiting assistance from AMD to resolve it.
- Despite the issue, a member expressed appreciation for the overall setup, frontend, and CLI, acknowledging the difficulty and time consumption of hosting a contest.
All2All Kernel Seeks Global Data: A question arose regarding how much information the custom_kernel() in all2all has or can access about the whole inference cluster.
- Specifically, whether a rank has a global view regarding how much data gets sent and received among all other ranks, especially since gpumode.com mentions all_rank_data, which wasn’t seen in the code.

GPU MODE ▷ #cutlass (8 messages🔥):

Shape Compatibility, CUTE documentation, PTX Diagrams

CUTE Shape Compatibility Deep Dive: A member inquired about shape compatibility in the CUTE layout docs, specifically regarding Shape (24) and Shape 24, with another member clarifying that shape 24 and shape (24) are conceptually the same, but the parentheses limit compatibility.
- Compatibility is an antisymmetric notion: S compatible with T and T compatible with S implies S = T, with the term S refines T meaning T is compatible with S. For example, (24) refines 24 because 24 = size((24)).
Indexing Shapes in CUTE: A member asked if the shape compatibility requirements in the CUTE documentation meant that all coordinates within A are valid coordinates within B.
- Another member confirmed that the valid coordinates of (24) are (0), (1), (2)..., while for 24 they are 0, 1, 2, 3..., so integers can index into (24) but not vice versa.
Seeking CUTE Code for PTX Diagrams: A member asked where the CUTE code for generating the PTX diagrams could be found, with another member providing possible leads.
- They suggested looking into print_latex, print_layout, and layouts from the wgmma shared memory section of the PTX docs.

GPU MODE ▷ #singularity-systems (2 messages):

Eager Mode, Graph Mode, Tinygrad's IR, Tensor Sugar, Torch vs. Jax

Tinygrad Embraces Dual-Engine Approach: Tinygrad will feature two engines: an eager mode (eagerly_eval) with hand-written kernels and a graph mode (lazily_compile), both reusing Tinygrad’s IR.
- The tensor will serve as syntactic sugar for the UOp graph, indicating a departure from pure Python implementation.
Tinygrad Avoids Torch’s Pitfalls: A member expressed agreement with the dual-engine approach, suggesting that Torch’s failure to separate eager and graph modes continues to cause issues.
- They further noted that Jax’s focus on a single approach has contributed to its success.

GPU MODE ▷ #cluster-management (8 messages🔥):

GPU Reservations, Slurm and Docker, Singularity vs Docker, llm-d.ai for cluster management

GPU Reservations Causing Headaches: Developers are struggling with GPU reservations due to limited on-prem resources, resorting to manual messaging for allocation.
- While dstack was considered, the lack of sufficient GPUs makes it infeasible, pushing the team towards Slurm for its fractional GPU support.
Slurm and Docker Cause Cluster Chaos: Integrating Slurm with Docker is proving to be a challenge, leading the team to favor Singularity for containerization within the cluster.
- The primary concern is security, as Singularity avoids the root privileges associated with Docker.
Singularity Syntax Spurs Skepticism: A member expressed frustration with Singularity’s syntax, questioning why it doesn’t align with the more familiar Docker syntax.
- The speaker posited that Singularity runs containers without a daemon unlike Docker, which may have to do with resource calculations/budgeting.
llm-d.ai touted as treasure: A member suggested exploring llm-d.ai, indicating its suitability for managing LLM workloads in the cluster.
- The project is likely relevant to the ongoing discussions around resource allocation and containerization.

Latent Space ▷ #ai-general-chat (45 messages🔥):

Meta's ARE and Gaia2, Cline's Agentic Algorithm, Greptile's $25M Series A, Cloudflare's VibeSDK, GPT-5-Codex Release

Meta Launches ARE and Gaia2 for Dynamic Agent Evaluation: Meta SuperIntelligence Labs released ARE (Agents Research Environments) and Gaia2, a benchmark for evaluating AI agents in dynamic scenarios.
- ARE simulates real-world conditions where agents adapt in real-time, unlike static benchmarks that solve set puzzles.
Cline’s Algorithm Distilled into Simple States: Ara distilled Cline’s agentic algorithm into a 3-state state machine: Question (clarify), Action (explore), Completion (present).
- The key to success is a simple loop + good tools + growing context.
Greptile Snags $25M for Bug-Killing AI Reviewer v3: Greptile closed a $25M Series A led by Benchmark and launched Greptile v3, an agent architecture that catches 3× more critical bugs than v2, already used by Brex, Substack, PostHog, Bilt and YC.
- New features include Learning (absorbs team rules from PR comments), MCP server for agent/IDE integration, and Jira/Notion context.
Cloudflare Opens Doors to ‘Vibe Coding’ with VibeSDK: Cloudflare announced VibeSDK, an open-source “vibe coding” platform enabling one-click deployment of personalized AI development environments.
- It includes code generation, a sandbox, and project deployment.
GPT-5-Codex Arrives, Developers Weigh Cost vs. Limit: OpenAI released GPT-5-Codex via the Responses API and Codex CLI, spurring excitement but also concerns about cost and rate limits, priced at $1.25 input, $0.13 cached, $10 output.
- Requests pour in for Cursor/Windsurf integration, GitHub Copilot support, and lower output costs.

Latent Space ▷ #genmedia-creative-ai (4 messages):

Foo Fighters, Artists using AI

Foo Fighters Post Teases AI Use?: The Foo Fighters shared a YouTube video sparking speculation on how artists might use AI, even if in a tongue-in-cheek manner.
AI in artistic expression: Discussion revolves around the evolving role of AI in creative fields, particularly how musicians might playfully integrate AI into their work.

Yannick Kilcher ▷ #general (2 messages):

Paper Reading Events, Yannick's Reading List

Paper Reading Events timing: A member inquired whether paper reading events are announced in advance.
Yannick’s Reading List: A member inquired what Yannick is going to read this weekend.

Yannick Kilcher ▷ #paper-discussion (17 messages🔥):

Diffusion ODE Solver, MiMo-Audio, Diversity is all you need

Diffusion ODE Solver Achieves Speed and Quality Boosts: An independent researcher developed a novel ODE solver for diffusion models that achieves 8-step inference beating DPM++2m’s 20-step inference in FID scores without additional training, as detailed in their paper and code.
MiMo-Audio: Audio Language Models as Few-Shot Learners: Members discussed MiMo-Audio and their technical report, “Audio Language Models Are Few Shot Learners”, highlighting its capabilities in S2T, S2S, T2S, translation, and continuation, as showcased in the demos.
“Diversity is all you need” Paper Presentation Proposed: A member proposed presenting the paper “Diversity is all you need” and encountered voice call issues on Discord.

Yannick Kilcher ▷ #ml-news (12 messages🔥):

Gaia2, Meta Agents Research Environments (ARE), GPT5 Models, Cloudflare Vibesdk, Compilebench

Gaia2 and ARE empower agents eval: Meta introduces Gaia2, the follow-up to the agentic benchmark GAIA, for analyzing complex agent behaviors, released with the open Meta Agents Research Environments (ARE) framework under the CC by 4.0 and MIT licenses.
- ARE simulates real-world conditions to debug and evaluate agents, addressing limitations of existing environments that lack real-world flexibility.
GPT5’s true form factor remains unknown: In the ml-news channel, a user questioned whether GPT5 low and GPT5 high are different models.
- A member responded it’s unknown but suggested it might be similar to their OSS model, where reasoning effort is adjusted by changing the context, or they could be different finetunes from base.
Cloudflare releases Vibesdk: A member shared a link to Cloudflare’s new Vibesdk.
- No further discussion was given.
Introducing Compilebench: A member shared a link to a blog post about Compilebench.
- No further discussion was given.

LM Studio ▷ #general (21 messages🔥):

LM Studio Model Support, GGUF/MLX Models, Qwen-3-omni, Google Gemini Free Tier

LM Studio Supports Limited HF Models: New users asked if all HuggingFace models are available on LM Studio and whether models are validated by the team.
- A member clarified that only GGUF (Windows/Linux/Mac) and MLX Models (Mac Only) are supported, and excluded image/audio/video/speech models.
LM Studio’s Model Search: A user searched for the facebook/bart-large-cnn model and asked how to verify if models are GGUF or MLX.
- A member confirmed that the model is unsupported in LM Studio and that Qwen-3-omni support depends on llama.cpp or MLX compatibility.
Deep Dive into Qwen-3-Omni: A member stated that Qwen-3-omni, which handles text, images, audio, and video, would take a very long time to support.
- Another member noted that the text layer is standard, but the audiovisual layers involve lots of new audio and video decoding stuff.
Google Gifts Gemini to Students: A member shared that Google offers a year of Gemini for free to college students.
- They added, I use it free daily so getting premium for free is nice.

LM Studio ▷ #hardware-discussion (2 messages):

Innosilicon GPU, DirectX12 Support, Ray Tracing Hardware

Innosilicon unveils Fenghua 3 GPU: Innosilicon has revealed its Fenghua 3 GPU, which features DirectX12 support and hardware ray tracing capabilities according to Videocardz.
Local LLaMA Reddit Post: A user shared a link to a Reddit post in r/LocalLLaMA.

aider (Paul Gauthier) ▷ #general (11 messages🔥):

Response API Support, GPT-5-Codex Integration, aider and litellm

Aider Adds Response API Support for GPT-5-Codex!: A member added support for the Responses API to aider, validated with the GPT-5-Codex model, and created a pull request for review.
- This integration addresses the issue where GPT-5-Codex, lacking completions support, failed with aider on the official endpoint, necessitating the use of OR for backward compatibility.
Aider’s litellm Dependency Supports GPT-5?: A member inquired whether something different was needed given that Aider already works with other Responses models via litellm.
- Another member clarified that aider relies on litellm completions, which had a fallback mechanism for handling responses endpoints, but GPT-5-Codex lacks this fallback, prompting the need for explicit Responses API support.
GPT-5 Now Requires Responses Endpoint: A member reported getting an error indicating that GPT-5-Codex is only supported in v1/responses and not in v1/chat/completions.
- This implies that unlike previous models, GPT-5-Codex exclusively uses the Responses API, necessitating updates to handle this specific endpoint.

aider (Paul Gauthier) ▷ #questions-and-tips (8 messages🔥):

aider ollama setup, Aider reads MD file, Context Retransmitted, Prompt Caching

Users seek guidance on using Aider with Ollama: A user is seeking guidance on how to make aider read their MD file with the AI’s purpose when using Ollama.
- The user tried the command aider --read hotfile.md but it didn’t work as expected.
Users want to revert to previous steps: A new user inquired about how to revert to a previous step after using the /ask command multiple times.
- A member suggested manually copying the desired context, using /clear, and then pasting the copied context with the new question.
Context is retransmitted with every request: A user noticed that the context is retransmitted with every chat request when aider is in verbose mode, and they questioned whether this is inefficient.
- A member confirmed this behavior, stating that it’s standard and that many APIs use prompt caching to mitigate costs, while noting that aider gives the user control over what to include in context.
Aider sorts file context alphabetically: A user pointed out that aider sorts file context alphabetically instead of preserving the added order.
- They made a PR for that but gave up since nothing is merged anymore.

Modular (Mojo 🔥) ▷ #general (18 messages🔥):

RISC-V Performance, Tenstorrent's MMA accelerator + CPU combos, RISC-V 32-bit and 64-bit, RISC-V Bringup, RISC-V ISA

RISC-V Performance Lags Behind Phone Cores: Members discussed that RISC-V cores are currently almost universally slower than the cores in modern smartphones, except perhaps microcontroller SoCs.
- One member noted that the fastest RISC-V device they encountered was still slow enough that someone cross-compiled SPECint from an UltraSPARC T2 because it was faster than a native compilation.
Tenstorrent Hopes to Close RISC-V Performance Gap: A member mentioned Tenstorrent’s MMA accelerator + CPU combos as a potential solution and also mentioned that Tenstorrent’s “tiny” cores are very small in-order cores used to drive 140 matrix/vector units.
- The same member noted that Tenstorrent’s Ascalon cores are the best hope for changing the RISC-V performance landscape in the next 5 years.
RISC-V Bringup Challenges: A member shared that RISC-V 64-bit is vaguely functional, but needs a lot of bringup work, and also can’t use vectors.
- Another member explained that any chain of if-elif-else statements that uses architectures needs to have RISC-V added, and much needs to be locked behind a requires which doesn’t exist in the language yet.

OpenAI ▷ #annnouncements (1 messages):

Stargate Sites, Oracle, SoftBank, 10-Gigawatt Commitment

OpenAI Announces Five New Stargate Sites: OpenAI announced five new Stargate sites in partnership with Oracle and SoftBank, advancing their 10-gigawatt commitment ahead of schedule.
- Details can be found in their blog post.
Stargate Project gains Momentum: The collaboration with Oracle and SoftBank is accelerating the deployment of compute resources for OpenAI.
- This puts the project ahead of its originally planned schedule for achieving the 10-gigawatt target.

OpenAI ▷ #ai-discussions (14 messages🔥):

Codex Fallback, Sora Issues, Ternary System Study, Github Copilot Alternative, kilocode

Codex Lacks Model Fallback: A user asked if Codex has a model fallback feature, similar to switching to gpt-5-mini after exhausting GPT-5 usage.
- No confirmation or denial was offered, but the community did not seem to think so.
Sora’s Got Video Generation Snags: A user inquired about the timeline for fixing issues in Sora’s video generation capabilities.
- No response was provided in the chat log, but the community seems aware of issues related to the product.
VSCode Copilot competitor incoming?: A user expressed interest in an OpenAI-made “Github Copilot” extension for VSCode and IDEs.
- Despite knowing about Codex CLI, the user appreciates Github Copilot’s code snippet suggestions and would switch if OpenAI offered a similar product.
GPT-5-Minimal Model Assessed: According to this image, the GPT-5-Minimal model performed worse than Kimi k2, but High is the best overall for agentic use cases.
- One user clarified that High (only via API) < Medium < Low < Minimal < Fast/Chat (non-thinking).
Models assigned to different agent roles in kilocode: Members mentioned that users are assigning different models to different agent roles in kilocode.
- One user pointed to a blog post about Codex IDE in VSCode being less than a month old.

OpenAI ▷ #prompt-engineering (1 messages):

GPT4o Translations, Chain of Thought

GPT4o Translation Quality Suffers From Chain of Thought: A member found that GPT4o’s translation quality decreases when using a chain of thought prompt compared to a direct translation prompt.
- The member shared the prompt used: When user paste something in other language that isn’t english, Identify the language, then: - {do a 3 short bullet point as a chain of thought} {Your translation goes here}
Direct Translation beats Chain of Thought Translation for GPT4o: The user experimented with GPT4o as a translator using a chain of thought prompt.
- The result was that direct translation without the chain of thought yielded better quality results.

OpenAI ▷ #api-discussions (1 messages):

GPT4o translation, Chain of thought in translation

GPT4o translation quality degrades with chain of thought: A member observed that asking GPT4o to perform a chain of thought before translating text results in lower quality translations compared to direct translation.
- The user shared a specific prompting strategy which asks GPT4o to identify the input language and outline a three-step thought process before providing the translation, but found this method to be less effective.
Direct Translation Outperforms Chain-of-Thought for GPT4o: A user tested GPT4o as a translator, comparing direct translations with those preceded by a chain-of-thought prompt.
- The results indicated that GPT4o’s direct translations were superior in quality and adaptation compared to those using the chain-of-thought approach.

DSPy ▷ #show-and-tell (4 messages):

DSPy profiles, dspy-profiles, LLM behavior

DSPy gets profiles package for configuration: A member announced the release of dspy-profiles, a lightweight package for DSPy that manages configurations with toml, enabling quick setup swaps and tidy projects, also published to Xitter.
- The tool allows easy switching of LLM behavior with a single command, and is available as decorators and context managers, aiming to eliminate context boilerplate.
Configurations for Different Environments: One member expressed excitement about dspy-profiles, inquiring whether many projects benefit from varied configurations.
- The author mentioned managing dev/prod environments as the initial motivation and indicated that it now facilitates better context switching.

DSPy ▷ #general (8 messages🔥):

GEPA Multimodality Performance Issue, Passing images and PDFs into DSPy, VLMs for Data Extraction, OCR Approaches for Data Extraction, Best PDF or Image Parsing Stuff

GEPA Multimodality Plagued by Performance Problems: A member reported a severe performance issue with GEPA Multimodality, linking to a related GitHub issue.
- The user indicated that their use case requires catering to multiple users.
Passing PDFs & Images into DSPy Explored: A member inquired about passing images or PDFs into DSPy for data extraction.
- Another member pointed out that one can pass images into DSPy with this dspy.ai API primitive.
VLMs and OCR debated for data extraction: One user asked if VLMs might be better than LLMs for extracting chart information from images and PDFs.
- Another member noted they did not know if OCR approaches are better for data extraction, while another mentioned that you can pass a VLM via dspy.LM for this.
Query for the Best PDF and Image Parsers: A member asked for suggestions for the best PDF or image parsing tools.
- No specific suggestions were provided in the messages.

DSPy ▷ #examples (5 messages):

Prompt Optimization, GEPA, AI Safety Research, Trusted Monitor, Comparative Metric with Feedback

Prompt Optimization Enables AI Control Research: A member published a post, Prompt optimization can enable AI control research, explaining how they used DSPy’s GEPA to optimize a trusted monitor.
- They then evaluated it using inspect and the code can be found here: dspy-trusted-monitor.
Comparative Metric Boosts GEPA Performance: A member introduced a comparative metric with feedback, passing one positive and one negative sample through the classifier at a time, and scored the pair based on whether the positive sample score was greater than the negative sample score.
- This allowed the reflection LM to learn the right signals for the classifier and create a robust optimized prompt.
GEPA Readme Links to Trusted Monitor Project: A member thanked the other for including the project on the GEPA readme and is interested in doing a short writeup about the comparative metric itself.
- The other responded that they’d love to do a writeup on the comparative metric, and are curious if this is a robust strategy for classification.

tinygrad (George Hotz) ▷ #general (12 messages🔥):

High-Level IRs like Triton, Multi-Layer IR Stack, Hardware-Incomplete vs Complete IRs, Search and Learning in Compilers, Graph-Based Models for Compilers

Triton’s Abstraction Level Spurs Debate: Discussion highlights the benefits of high-level IRs like Triton, but also points out the need for a multi-layer stack to interface with lower-level hardware.
- The Gluon project is mentioned, with a desire for it to interoperate with Triton, though its current Nvidia-specific nature is a limitation.
Single IR Inadequacy Acknowledged: The consensus is that a single high-level IR is insufficient for all users and use-cases, citing the divergent needs of PyTorch users seeking speedups versus those optimizing mission-critical HPC projects.
- This is because there is not really going to be this goldilocks zone where the abstraction level of the IR is just right for all users and use-cases.
UOps Leverage Bitter Lesson: Tinygrad’s vision involves leveraging the bitter lesson to combine the benefits of incomplete and complete IRs, using UOps as a hardware-incomplete representation.
- The goal is to search over the space of rendered programs that implement the UOps to find the fastest one.
Search and Neural Compilers Highlighted: Emphasis is placed on the importance of search and neural compilers, with a particular interest in GNNs or other graph-based models.
- The suggestion is to create a multi-stage compiler that utilizes graph-based models per stage.

Nous Research AI ▷ #general (6 messages):

TRL Assessor, Nous Tek

TRL Assessor Inquiry: A member inquired about a TRL (Technology Readiness Level) assessor and whether it’s worthwhile to red team their own stack using a new ecosystem.
- Two other members suggested moving the conversation to a specific channel, <#1366812662167502870>.
“Nous Tek” Praise: A member wrote “Nous tek” which is a positive affirmation.
- Another member immediately offered to answer questions.

Nous Research AI ▷ #ask-about-llms (6 messages):

Distributed Learning, Code Genetics, Model Non-Homology

Distributing the Training Load: A member inquired about the feasibility of training an AI model using distributed learning across multiple VPSs, leveraging resources like Kubernetes and Google Cloud.
- They were interested in using the setup to accelerate training cycles with datasets derived from operational data, along with concerns about safety rails for hardware management.
Code Genetics and Model Parameter Tuning: A member explored using code genetics via OpenMDAO to automate adjustable parameters and Terraform for infrastructure control.
- They questioned the necessary audit systems and methods for vetting synthetic data, aiming to influence parameters of models already in use, as opposed to techniques like Nate Lora.
Model Homology Concerns: A member explained that after pretraining to a stable loss value, models fix tokenized structures, creating a solid “world state” that is hard to shift without collapsing the structure.
- They noted that while fine-tuning can generate task vectors around a manifold, comparing datasets requires a common base, as models become non-homologous otherwise.

Eleuther ▷ #general (3 messages):

AI Behavioral Coherence, Mathematical AI Constraints, Davinci Architecture

Researcher Probes AI Behavioral Coherence: An independent researcher is developing mathematical frameworks for AI behavioral coherence, aiming for real-time semantic control over language models without retraining.
- The current work focuses on validating cross-model consistency and exploring how mathematical constraints can enhance AI system interpretability.
Davinci’s Architecture Revealed: A member stated that Davinci is essentially GPT-2’s transformer architecture but with locally-banded dense and sparse attention patterns and a 4x FFN.
- This information is available in the GPT-3 paper, according to another member.

Eleuther ▷ #research (8 messages🔥):

Zero Knowledge Proofs, SwiGLU up-projection, Model Tampering Defenses

ZKML for Model Integrity?: A member suggested using Zero Knowledge Proofs (ZKML) to allow inference providers to prove they haven’t finetuned/replaced/lowered the quality of the model, or prove that the training process only used certain data.
- They noted that it is very slow right now.
SwiGLU ‘Defense’ Against Finetuning: One member suggested making a model non-finetuneable post-hoc by multiplying random terms in the SwiGLU up-projection by large values and applying the inverse values in the down-projection.
- They claimed everyone’s standard AdamW recipe will fail and they will be too lazy to fix it, while even working with default quantization recipes.
Model Tampering Defenses: A member argued that the possibility of mitigating concerns about releasing models by making them harder to fine-tune is an open technical problem, not something that can be determined a priori.
- They also mentioned that their recent paper has increased tamper resistance by 3 OOMs over prior work.

Moonshot AI (Kimi K-2) ▷ #general-chat (3 messages):

pydantic-ai lib

Pydantic-AI Library Plug and Play Component: A member suggested using the pydantic-ai library due to its neat implementation of a specific flow.
- They said that it includes a plug-and-play component that can accomplish the task in approximately 10 lines of code.
Example Topic: This is another topic.
- Details about the topic.

Windsurf ▷ #announcements (2 messages):

GPT-5-Codex, Figma MCP server, Windsurf update, Remote Figma integration

GPT-5-Codex Lands on Windsurf!: The new GPT-5-Codex model from OpenAI is now live in Windsurf, and is impressing users with longer running and design related tasks, as per this announcement.
- It’s free for paid users for a limited time, while free tier users can access it at 0.5x credits, so remember to reload Windsurf to see it!
Official Figma MCP server launched!: A new official Figma MCP server is now available in the Windsurf MCP store, and is discussed in this post.
- Users can now paste Figma links directly into Windsurf with the new and improved integration which doesn’t require the Figma desktop app.
Migrate to New Figma MCP Server!: Users of the previous Figma Dev Mode MCP server are advised to install the new official Figma MCP server.
- This migration ensures access to Figma’s new remote MCP server, enabling better integration with Windsurf.

MCP Contributors (Official) ▷ #mcp-dev-summit (2 messages):

MCP Dev Summit, Apify & Jentic Happy Hour

Apify & Jentic Host Happy Hour: Apify & Jentic are hosting a happy hour, find details on the Luma website.
- A member plans to attend both happy hour events.
Dev Summit Tickets Running Out Fast: The Dev Summit is expected to sell out in about two days, similar to last time when tickets sold out a week before the event.
- If you’re considering attending, grab your tickets now!