Qwen is all you need?
AI News for 9/23/2025-9/24/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (194 channels, and 2236 messages) for you. Estimated reading time saved (at 200wpm): 188 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Today is both AI Engineer Paris and AliCloudâs annual Yunqi aka Apsara conference, and the Tongyi Qianwen (aka Qwen) team has been working overtime to launch updates of all their models, including the major ones: the monster 1T model Qwen3-Max (previewed 3 weeks ago), Qwen3-Omni, and Qwen3-VL, with Qwen3Guard, Qwen3-LiveTranslate, Qwen3-TTS-Flash, and updates to Qwen-Image-Edit and Qwen3Coder. Hereâs how Junyang Lin, their primary spokesperson in AI Twitter, put it:
Just to visualize the step up of velocity, hereâs all the Qwen releases this year visualized:
Not to forget all the work from Alibaba Wan too, but Qwen is now being regarded as a âfrontier labâ with all these releases.
Alibabaâs CEO Eddie Wu took to the stage to map out their $52B USD roadmap:
Hereâs a translation of the speech:
- The first stage is âintelligence emergence,â characterized by âlearning from humans.â
- The internet has digitized virtually all knowledge in human history. The information carried by these languages and texts represents the entire corpus of human knowledge. Based on this, large models first develop generalized intelligence by understanding the global knowledge base, emerging with general conversational capabilities, understanding human intent and answering human questions. They gradually develop the reasoning ability to consider multi-step problems. We now see AI approaching the top levels of human performance in various subject tests, such as the gold medal level of the International Mathematical Olympiad. AI is gradually becoming capable of entering the real world, solving real problems, and creating real value. This has been the main theme of the past few years.
- The second stage is âautonomous action,â characterized by âassisting humans.â In this stage, AI is no longer limited to verbal communication but possesses the ability to act in the real world. AI can break down complex tasks, use and create tools, and autonomously interact with the digital and physical worlds, exerting a profound impact on the real world, all within the context of human goals. This is the stage we are currently in.
- The key to achieving this breakthrough lies first in the ability of big models to use tools, connecting all digital tools to complete real-world tasks. The starting point of humanityâs accelerated evolution was the creation and use of tools, and big models now also possess this ability. Through tool use, AI can access external software, interfaces, and physical devices just like humans do, performing complex real-world tasks. At this stage, because AI can significantly improve productivity, it will rapidly penetrate nearly every industry, including logistics, manufacturing, software, commerce, biomedicine, finance, and scientific research.
- Secondly, improvements in large-model coding capabilities can help humans solve more complex problems and digitize more scenarios. Current agents are still in their early stages, primarily solving standardized, short-term tasks. Enabling agents to tackle more complex, longer-term tasks requires large-model coding capabilities. Because agents can code autonomously, they can theoretically solve infinitely complex problems, understanding complex requirements and independently completing coding and testing, just like a team of engineers. Developing large-model coding capabilities is essential for achieving AGI.
- AI will then enter its third phase â âself-iteration,â characterized by its ability to âsurpass humans.â This phase has two key elements:
-
First, AI connects to the full amount of raw data in the real world.
Currently, AI is making the fastest progress in content creation, mathematics, and coding. We see distinct characteristics in these three areas. Knowledge in these fields is 100% human-defined and created, contained in text. AI can fully understand this raw data. However, in other fields and the broader physical world, todayâs AI is primarily exposed to knowledge summarized by humans and lacks extensive raw data from interactions with the physical world. This information is limited. For AI to achieve breakthroughs beyond human capabilities, it needs to directly access more comprehensive and original data from the physical worldâŠ
âŠSimply having AI learn from human-derived rules is far from enough. Only by continuously interacting with the real world and acquiring more comprehensive, authentic, and real-time data can AI better understand and simulate the world, discover deeper laws that transcend human cognition, and thus create intelligent capabilities that are even more powerful than humans.
-
Second, Self-learning. As AI penetrates more physical world scenarios and understands more physical data, AI models and agents will become increasingly powerful. This will allow them to build training infrastructure, optimize data flows, and upgrade model architectures for model upgrades, thereby achieving self-learning. This will be a critical moment in the development of AI.
As capabilities continue to improve, future models will continuously interact with the real world, acquiring new data and receiving real-time feedback. Leveraging reinforcement learning and continuous learning mechanisms, they will autonomously optimize, correct deviations, and achieve self-iteration and intelligent upgrades. Each interaction is a fine-tuning, and each piece of feedback a parameter optimization. After countless cycles of scenario execution and result feedback, AI will self-iterate to achieve intelligence capabilities that surpass humans, and an early stage of artificial superintelligence (ASI) will emerge.
-
They are also recent converts to the LLM OS thesis.
AI Twitter Recap
Compute buildout: OpenAIâNVIDIA deal, Stargate expansion, and the gigawatt era
- OpenAIâs âfactory for intelligenceâ goes physical: OpenAI announced five new âStargateâ sites with Oracle and SoftBank, putting it ahead of schedule on its previously announced 10âGW buildout. The company framed its goal as âa factory that can produce a gigawatt of new AI infrastructure every weekâ in Sam Altmanâs post on âabundant intelligenceâ and thanked NVIDIA for the nearly decade-long partnership (@OpenAI, @sama, @sama, @gdb, @kevinweil). Context: 10 GW is roughly âabout 6% of the energy that all humans in the world spend thinking,â per Graham Neubig (@gneubig). Elon Musk asserted âfirst to 10GW, 100GW, 1TW, âŠâ (@elonmusk).
- Deal math and âpaper-for-GPUsâ speculation: Back-of-the-envelope estimates for 10 GW suggest ~$340B of H100-equivalents at $30k/GPU if 20% power is nonâGPU, with a 30% volume discount bringing it to ~$230B. One floated structure: pay list on GPUs and backfill âdiscountâ via NVIDIA investing ~$100B into OpenAI equity (@soumithchintala, @soumithchintala, @soumithchintala). Oracle/SoftBank involvement was noted by multiple observers; total infra commitments across vendors are trending to âhundreds of billionsâ (@scaling01).
Qwenâs multi-model salvo: Max, VLâ235BâA22B, Omni, CoderâPlus, Guard, and LiveTranslate
- Flagships and vision: Alibaba Qwen released:
- Qwen3âMax (Instruct/Thinking). Claims nearâSOTA on SWEâBench, Tau2âBench, SuperGPQA, LiveCodeBench, AIMEâ25; the Thinking variant with tool use in âheavy modeâ approaches perfection on selected benchmarks (@Alibaba_Qwen, @scaling01).
- Qwen3âVLâ235BâA22B (Apacheâ2.0; Instruct/Thinking). 256K context scalable to ~1M; strong GUI manipulation and âvisual codingâ (screenshotsâHTML/CSS/JS), 32âlanguage OCR, 2D/3D spatial reasoning, SOTA on OSWorld (@Alibaba_Qwen, @reach_vb, @scaling01).
- Qwen3âOmni: an E2E anyâtoâany model (30B MoE, ~3B active) that ingests image/text/audio/video and outputs text/speech; supports 119 languages (text), 19 (speech), and 10 speech output voices; Transformers+vLLM support; SOTA across many audio/video benchmarks vs Gemini 2.5 Pro and GPTâ4o (@mervenoyann, @mervenoyann). Technical report roundup: joint multimodal training didnât degrade text/vision baselines in controlled studies (@omarsar0).
- Developers, safety, and realâtime:
- Qwen3âCoderâPlus: upgraded terminal task capabilities, SWEâBench up to 69.6, multimodal coding and subâagent support, available via Alibaba Cloud Model Studio and OSS product Qwen Code (@Alibaba_Qwen, @_akhaliq).
- Qwen3Guard: multilingual (119 langs) moderation suite in 0.6B/4B/8B sizes; streaming (lowâlatency) and fullâcontext (Gen) variants; 3âtier severity (Safe/Controversial/Unsafe); positioned for RL reward modeling (@Alibaba_Qwen, @HuggingPapers).
- Qwen3âLiveTranslateâFlash: realâtime multimodal interpretation with ~3s latency; lip/gesture/onâscreen text reading, robust to noise; understands 18 languages + 6 dialects, speaks 10 (@Alibaba_Qwen).
- Bonus: Travel Planner agent wired to Amap/Fliggy/Search for itineraries and routing (@Alibaba_Qwen).
OpenAIâs GPTâ5âCodex and agent tooling move to the fore
- GPTâ5âCodex ships for agents: OpenAI released GPTâ5âCodex via the Responses API (not Chat Completions), optimized for agentic coding rather than conversation (@OpenAIDevs, @reach_vb). Rapid integrations followed: VS Code/GitHub Copilot (@code, @pierceboggan), Cursor (@cursor_ai), Windsurf (@windsurf), Factory (@FactoryAI), Cline (@cline), and Yupp (Low/Medium/High variants for public testing) (@yupp_ai). Builders highlight âadaptive reasoningâ that spends fewer tokens on easy tasks and more when required, with some reporting >400K context and strong performance on longârunning tasks (claims via partner posts; see @cline).
- Agent debugging powers land in IDEs and browsers:
- Chrome DevTools MCP: agents can run performance traces, inspect the DOM, and debug web pages programmatically (@ChromiumDev).
- Figma MCP server for VS Code: bring design context into code for designâimplementation loops (@code).
- Gemini Live API update: improved realâtime voice function calling, interruption handling, and sideâchatter suppression (@osanseviero).
- Hiring momentum for OS-level computer control agents continued (xAI âMacrohard,â Grok 5) (@Yuhu_ai_, @YifeiZhou02) and thirdâparty teams integrated Grok fast models (@ssankar).
Retrieval, context engineering, and agent research
- MetaEmbed (Flexible Late Interaction): Append learnable âmeta tokensâ and only store/use those for late interaction, enabling multiâvector retrieval thatâs compressible (Matryoshkaâstyle), with testâtime scaling to trade accuracy vs efficiency; SOTA on MMEB and ViDoRe. Discussion threads and repos note compatibility with PLAID indexes (@arankomatsuzaki, @ZilinXiao2, @ManuelFaysse, @antoine_chaffin).
- Data beats scale for agency? LIMI shows 73.5% on AgencyBench from just 78 curated demos, outperforming larger SOTA agentic models; authors propose an âAgency Efficiency Principleâ (autonomy emerges from strategic curation) (@arankomatsuzaki, @HuggingPapers).
- Graphâwalk and engineering evals:
- ARKâV1: a lightweight KGâwalking agent boosts factual QA vs CoT; with Qwen3â30B it answers ~77% of queries with ~91% accuracy on those (â70% overall). Larger backbones reach ~70â74% overall; weaknesses include ambiguity and conflicting triples (@omarsar0).
- EngDesign: 101 tasks across 9 engineering domains using simulationâbased eval (SPICE, FEA, etc.); iterative refinement meaningfully increases pass rates (@arankomatsuzaki).
- Also notable: Appleâs EpiCache on episodic KV cache management for long conversational QA (@_akhaliq), the Agent Research Environment now MCPâcompatible with real robot control via LeRobot MCP (@clefourrier), and LangSmith Composite Evaluators to roll multiple scores into a single metric (@LangChainAI).
Video and 3D content: Kling 2.5 Turbo, Ray 3 HDR, and more
- Kling 2.5 Turbo: Dayâ0 access on FAL with significantly improved dynamics, composition, style adaptation (incl. anime), and emotional expression; priced as low as ~$0.35 for 5s video on FAL per users. Higgsfield announced âunlimitedâ Kling 2.5 within its product. Demos show better adherence to complex prompts and audio FX generation improvements (@fal, @Kling_ai, @higgsfield_ai, @TomLikesRobots).
- Luma Ray 3: first video model with 16âbit HDR and iterative âchainâofâthoughtâ refinement across T2V and I2V; currently in Dream Machine only (API pending). Artificial Analysis will publish sideâbyâsides in their arena (@ArtificialAnlys).
- In 3D/VR, Rodin Genâ2 (4Ă mesh quality, recursive part gen, highâlow baking, control nets) launched with promo pricing (@DeemosTech); World Labsâ Marble showcased promptâtoâVR walkthroughs (@TomLikesRobots).
Systems, kernels, and inference
- Kernel craft pays: A Mojo matmul beat cuBLAS on B200s in ~170 LOC without CUDA, detailed in a tuning thread; demand for kernelâwriting talent is spiking across industry. Meanwhile, vLLM enabled full CUDAâgraphs by default (e.g., +47% speedup on Qwen3â30BâA3BâFP8 at bs=10), and Ollama shipped a new scheduler to reduce OOMs, maximize multiâGPU utilization, and improve memory reporting (@AliesTaha, @jxmnop, @mgoin_, @ollama).
- Models and infra: Liquid AI released LFM2â2.6B (short convs + GQA, 10T tokens, 32K ctx; openâweights) positioning as a new 3Bâclass leader (@LiquidAI_). AssemblyAI posted strong multilingual ASR performance with diarization at scale (@_avichawla). Hugging Faceâs storage backbone highlighted Xet and contentâdefined chunking as key to multiâTB/day openâsource throughput (@ClementDelangue). NVIDIA noted expanded openâsource model contributions on HF (@PavloMolchanov).
Top tweets (by engagement)
- âcrazy that they called it context window when attention span was right there.â (@lateinteraction, 7074)
- Hiring for a new team building computer control agents for Grok5/macrohard (@Yuhu_ai_, 6974)
- âA major moment â UNLIMITED Kling 2.5 exclusively inside Higgsfield.â (@higgsfield_ai, 6248)
- âYo I heard if u press Up, Up, Down, Down⊠thereâs an infinite money glitchâ (@dylan522p, 5621)
- âAbundant Intelligenceâ â OpenAI vision post (@sama, 5499)
- Chromium DevTools MCP for agent debugging (@ChromiumDev, 2538)
- âGrateful to Jensen for the almostâdecade of partnership!â (@sama, 5851)
- OpenAI: five new Stargate sites announced (@OpenAI, 2675)
- NvidiaâOpenAI partnership nod (âlooking forward to what weâll build togetherâ) (@gdb, 2753)
- âI canât believe this actually worksâ (viral agent demo) (@cameronmattis, 46049)
- FDA/Tylenol thread on autism/ADHD evidence quality (@DKThomp, 16346)
- U.S. Physics Olympiad team wins 5/5 golds (@rajivmehta19, 13081)
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Qwen3-Max Release and Benchmarks
- Qwen 3 max released (Score: 218, Comments: 39): **Qwen3âMax is announced as Qwenâs largest, most capable model. The preview Qwen3âMaxâInstruct ranks**
#3
on the Text Arena leaderboard (claimed to surpass âGPTâ5âChatâ), and the official release emphasizes stronger coding and agent capabilities with claimed SOTA across knowledge, reasoning, coding, instructionâfollowing, humanâpreference alignment, agent tasks, and multilingual benchmarks, accessible via API (Alibaba Cloud) and Qwen Chat. A separate Qwen3âMaxâThinking variant (still training) reportedly hits100%
on AIME 25 and HMMT when augmented with tool use and scaled testâtime compute. Commenters note the model is not local/openâsource, limiting selfâhosting, and remark on the rapid release cadence.- Several commenters note Qwen 3 Max is not a local model and is not open source. Practically, this means no downloadable weights or on-device/self-hosted deployment; usage is via a hosted API only, which impacts data control, offline capability, and reproducibility versus OSS models.
- Thereâs confusion around the announcement because earlier access was a âpreviewâ; this thread indicates a formal release. Readers infer a shift from preview to GA/production readiness (e.g., clearer SLAs/rate limits/pricing), though no concrete technical details were provided in the comments.
- 2 new open source models from Qwen today (Score: 172, Comments: 35): Post hints at two new open-source releases from Alibabaâs Qwen team, with at least one already live on Hugging Face. Comments explicitly name âQwen3 VL MoE,â implying a vision-language Mixture-of-Experts model; the image likely teases both modelsâ names and release timing. Image: https://i.redd.it/goah9v2r8wqf1.png Comments note the second model has appeared on Hugging Face and that the first is already released; discussion centers on identifying âqwen3 vl moe,â with no benchmarks or specs yet.
- Release of Qwen3-VL-MoE (vision-language Mixture-of-Experts) noted; MoE implies sparse expert routing so only a subset of experts is active per token, reducing compute while maintaining high capacity. Evidence of availability and rapid cadence: community reports itâs âalready releasedâ and a â2nd Qwen model has hit Hugging Face,â with a preview screenshot shared (https://preview.redd.it/kn55ui1xvwqf1.png?width=1720&format=png&auto=webp&s=a36235216e9450b2be9ad44296b22f9d2abc07d9).
- Discussion highlights a shift to sparse MoE across Qwen models to speed up both training and deployment by improving parameter efficiency and throughput (routing to few experts lowers per-token FLOPs). Commenters argue this enables faster iteration on scaling strategies while keeping models âA-tier,â emphasizing a practical trade-off: strong performance with better cost-efficiency rather than chasing single-model SOTA.
2. Qwen Shipping Speed Memes/Discussion
- How are they shipping so fast đ (Score: 805, Comments: 136): Post highlights Qwenâs rapid release cadence; commenters attribute speed to adopting MixtureâofâExperts (MoE) architectures, which are faster/cheaper to train and scale compared to large dense models. Thereâs mention of rumored upcoming openâsource Qwen3 variants, including a â15B2Aâ and a 32B dense model, suggesting a split between MoE and dense offerings. Comments are bullish on Qwenâs momentum (âarmy of Qwenâ) and contrast it with Western narratives about long timelines and high costs; some geopolitical takes appear but are nonâtechnical. Technical hope centers on OSS releases of the rumored Qwen3 15B2A and 32B dense models.
- Commenters note that Qwen has leaned into Mixture-of-Experts (MoE), which can be faster to train/infer at a given quality because only a subset of experts is activated per token (
k-of-n
routing), reducing effective FLOPs while scaling parameters (see Switch Transformer: https://arxiv.org/abs/2101.03961). They also reference rumored upcoming dense releases â Qwen3 15B2A and Qwen3 32B â implying a complementary strategy where MoE accelerates iteration and dense models target strong single-expert latency/serving simplicity; trade-offs highlighted include MoEâs routing/infra complexity vs dense modelsâ predictable memory/latency.
- Commenters note that Qwen has leaned into Mixture-of-Experts (MoE), which can be faster to train/infer at a given quality because only a subset of experts is activated per token (
- how is qwen shipping so hard (Score: 181, Comments: 35): OP asks why Qwen (Alibabaâs LLM family) is shipping releases so quickly and proliferating variants to the point that model selection feels overwhelming. No benchmarks or implementation details are discussed; the thread is meta commentary on release cadence and variant sprawl (e.g., many model types/sizes under the Qwen umbrella, cf. Qwenâs repo: https://github.com/QwenLM/Qwen). Commenters largely attribute the pace to Alibabaâs resourcesââtons of cash, compute and manpowerââand Chinaâs â996â work culture; one notes that the intensely trained students from a decade ago are now the workforce.
- A practitioner recommends a practical deployment mix: use Qwen2.5-VL-72B for VLM tasks, the largest Qwen3 (dense) that fits your GPU
VRAM
for low-latency text inference, and the largest Qwen3 MoE that fits in systemmain memory
for higher-capacity workloads. This balances VRAM-bound dense inference against RAM-bound MoE, trading latency for capacity while covering multimodal and pure-text use cases in one stack. - Several note Qwenâs backing by Alibaba, implying access to substantial compute, funding, and engineering manpower. That scale translates into faster pretraining/finetuning cycles and parallel productization, which helps explain the rapid shipping cadence across multiple model families (dense, MoE, and VLM).
- Reports highlight strong image-generation performance from Qwenâs stack, indicating rapid maturation of their multimodal/image pipelines alongside text models. While no benchmarks were cited, the consensus is that image quality has improved enough to be competitive with contemporary leaders.
- A practitioner recommends a practical deployment mix: use Qwen2.5-VL-72B for VLM tasks, the largest Qwen3 (dense) that fits your GPU
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Wan 2.2/2.5 Video Demos + Qwen-Image-Edit GGUF and LMarena Leaderboard
- Incredible Wan 2.2 Animate model allows you to act as another person. For movies this is a game changer. (Score: 258, Comments: 57): Post claims the âWan
2.2
Animateâ model enables actor-to-actor facial reenactmentâdriving a target identityâs face from a source performerâeffectively a deepfake-style digital double for film/video. Based on the clip description (reddit video), it demonstrates ID transfer with reasonable motion/temporal consistency but imperfect identity fidelity (a commenter notes it doesnât fully match Sydney Sweeney), suggesting trade-offs between likeness preservation, lip-sync, and coherence typical of diffusion/reenactment pipelines conditioned on reference identity frames. No benchmarks or implementation details are provided in the post; technically, this aligns with identity-conditioned video generation/reenactment methods where motion is derived from a driving video and identity is maintained via reference-image embeddings and cross-frame constraints. Top comments discuss monetization/abuse vectors (e.g., adult-content deepfakes/OnlyFans) and note that, despite artifacts or mismatch for close viewers, most audiences may not noticeâhighlighting ethical risk versus perceived quality in practical deployments.- Commenters noting the face âdoes not look like Sydney Sweeneyâ reflects known limits in identity preservation for face reenactment/video diffusion: models can drift on fine facial geometry, skin microtexture, and expression under pose/lighting changes, leading to perceptual mismatches. Robust systems typically mix landmark/flow-guided warping with identity losses (e.g., ArcFace/FaceNet embeddings) and temporal consistency losses; without these, frame-to-frame ID coherence and lip-sync degrade, especially beyond 512â1024 px outputs or during rapid head motion.
- Multiple users suggest this tech already exists; indeed, face-swapping/reenactment has prior art: classic deepfake pipelines (DeepFaceLab/FaceSwap), research like First Order Motion Model (2019) and SimSwap (2020), plus newer one-shot and diffusion methods. References: DeepFaceLab (https://github.com/iperov/DeepFaceLab), FaceSwap (https://github.com/deepfakes/faceswap), FOMM (https://github.com/AliaksandrSiarohin/first-order-model), SimSwap (https://github.com/neuralchen/SimSwap), Roop (https://github.com/s0md3v/roop), LivePortrait (https://github.com/YingqingHe/LivePortrait), AnimateDiff (https://github.com/guoyww/AnimateDiff).
- Skepticism about âfor moviesâ points to production constraints: film requires 4K+ resolution, HDR, stable multi-minute temporal coherence, accurate relighting/shadows, camera/face tracking under occlusions, and consistent hair/ear/jawline geometry. Current diffusion/reenactment demos often show flicker, mouth/eye desynchrony, and lighting mismatches; integrating them into film usually needs VFX-grade tracking, neural relighting, paint/roto, and per-shot tuning rather than a turnkey actor-swap.
- Wan2.2 Animate and Infinite Talk - First Renders (Workflow Included) (Score: 340, Comments: 48): OP shares first renders from a ComfyUI pipeline combining
Wan 2.2
âWanâAnimateâ for video synthesis with an âInfinite Talkâ workflow for narration. The WanâAnimate workflow was sourced from CivitAI user GSK80276, and the Infinite Talk workflow was taken from u/lyratech001âs post in this thread. No model settings, checkpoints, or hardware/runtime details are provided; the post primarily demonstrates integration of existing workflows. Comments ask for reproducibility details, specifically the TTS source (voice generation) and how the target image/video were produced, indicating missing setup specifics; no substantive technical debate is present.- Requests for disclosure of the exact TTS/voice pipeline (âInfinite Talkâ): which model/service was used, inference backend, voice settings (e.g., sampling rate, style/temperature), and whether phoneme/viseme timestamps are available for lipâsync integration. Reproducibility details like latency per second of audio and any noise reduction/vocoder steps are sought.
- Multiple asks for the full Wan2.2 Animate workflow: how the target still image was obtained (captured vs generated) and preprocessed (face crop, keypoint/landmark detection, alignment), plus how the driving motion/video was produced (reference video vs textâdriven), including key inference parameters (resolution, FPS, seed, guidance/strength). Clarification on handling head pose changes, stabilization, and blending/roto for backgrounds would help others replicate results.
- Feasibility on consumer hardware: can the pipeline run on 8 GB VRAM with 32 GB system RAM by using fp16/bf16, lowâVRAM or CPU offload, reduced resolution/FPS, smaller batch size, and memoryâefficient attention (e.g., xFormers/FlashAttention). Commenters seek expected throughput/latency tradeâoffs and practical presets that fit within 8 GB without OOM.
- Ask nicely for Wan 2.5 to be open source (Score: 231, Comments: 95): Thread reports that the upcoming Wan
2.5
release will initially be an API-only âadvance version,â with an open-source release TBD and potentially coming later depending on community demand and feedback; users are encouraged to request open-sourcing during a live stream. The claim appears to stem from a translated note circulating on X (source), suggesting open-sourcing is likely but time-lagged and contingent on community attitude/volume. No new technical specs or benchmarks for2.5
are provided beyond release modality (API vs. OSS). Top comments emphasize that Wanâs value hinges on being open source (enabling LoRA fine-tuning and local workflows); otherwise itâs just another hosted video-generation service. Others note the messenger seems unaffiliated (a YouTuber), implying this is not an official developer statement, and a side request mentions interest in Hunyuan3D2.5/3.0
releases.- Several commenters emphasize that Wanâs core value comes from open weights enabling local inference and customizationâspecifically LoRA-based fine-tuning for domain/style adaptation, training adapters, and integrating into existing video pipelines. A closed, service-only release would block reproducible research, offline deployment, and custom training workflows, turning it into âjust another video generation service.â See e.g., LoRA for lightweight adaptation without full retrains.
- Thereâs no immediate need for Wan 2.5 if 2.2 remains open and stable: users only recently adopted Wan 2.2 and plan to rely on it for months. From a tooling perspective, keeping 2.2 open provides time to build datasets, train LoRAs, and harden workflows without version churn, with the expectation that an open 2.5 can arrive later without disrupting ongoing work.
- Requests also target open-sourcing 3D generators like Hunyuan3D 2.5/3.0, aiming for interoperable, locally-runnable assets across video and 3D pipelines. Open releases would enable consistent asset generation and evaluation across tasks (video-to-3D, 3D-to-video), rather than being locked to siloed, closed endpoints.
- Wan 2.5 (Score: 207, Comments: 137): **Alibaba teases the Wan 2.5 video model on X, with an âadvance versionâ releasing as API-only; open-sourcing is undecided and may depend on community feedback (Ali_TongyiLab, Alibaba_Wan). The teaser highlights
10s
1080p
generations; a statement (Sep 23, 2025) notes âfor the time being, there is only the API version⊠[open source] is to be determinedâ, urging users to advocate for open release. ** Discussion centers on open-source vs API-only: commenters argue closed access blocks LoRA-based fine-tuning and broader community workflows, reducing utility compared to prior open models, and encourage pushing for open release during the live stream (thread).- The shared note indicates an initial API-only release with open-source status TBD and potentially delayed: âthe 2.5 sent tomorrow is the advance version⊠for the time being, there is only the API version⊠the open source version is to be determinedâ (post, Sep 23, 2025). Practically, this means no local inference or weight access at launch, with any future open-sourcing contingent on community feedback and timing.
- Closed/API-only distribution precludes community LoRA fine-tuning, since training LoRA adapters requires access to model weights; without weights, there are âno loras,â limiting customization to prompt-level or vendor-provided features. This restricts domain adaptation, experimentation, and downstream task specialization compared to open checkpoints.
- âMultisensoryâ is interpreted as adding audio to video, raising compute concerns: generating
~10 s
1080p
with audio will be infeasible for â95%
of consumersâ unless the backbone is made more efficient. Suggestions include architectural shifts such as linear-attention variants, radial attention, DeltaNet, or state-space models like Mamba (paper) to reach acceptable throughput/VRAM on consumer hardware.
- GGUF magic is here (Score: 335, Comments: 94): Release of GGUF builds for Qwen-Image-Edit-2509 by QuantStack, enabling local, quantized inference of the Qwen image-editing model via GGUF-compatible runtimes (e.g., llama.cpp/ggml) link. For ComfyUI integration, users report you must update ComfyUI and swap text encoder nodes to
TextEncodeQwenImageEditPlus
; early artifacts (distorted/depth-map-like outputs) were due to workflow issues, with a working graph shared here and the base model referenced here. Commenters are waiting for additional quant levels (â5090 enjoyers waiting for the other quantsâ) and asking which is better for low VRAMânunchaku vs GGUFâsuggesting an open trade-off discussion on memory vs quality/perf.- ComfyUI integration notes for the GGUF port of Qwen-Image-Edit-2509: initial runs yielded distorted/âdepth mapâ outputs until ComfyUI was updated and text encoder nodes were swapped to
TextEncodeQwenImageEditPlus
. The final fix was a workflow correction; a working workflow is shared here: https://pastebin.com/vHZBq9td. Model files referenced: https://huggingface.co/aidiffuser/Qwen-Image-Edit-2509/tree/main. - Low-VRAM deployment question: whether Nunchaku or GGUF quantizations are better for constrained GPUs. The thread implies a trade-off between memory footprint, speed, and quality across backends, but provides no benchmarks; readers may need to compare quantization bitwidths and loaders on their hardware.
- Quantization depth concerns: a user asks if
<=4-bit
quants are even usable given perceived steep quality loss, questioning the rationale for releasing every bit-width. This highlights the need for concrete quality metrics (e.g., task accuracy/FID for image-editing prompts) versus VRAM gains to justify ultra-low-bit variants in practice.
- ComfyUI integration notes for the GGUF port of Qwen-Image-Edit-2509: initial runs yielded distorted/âdepth mapâ outputs until ComfyUI was updated and text encoder nodes were swapped to
- How is a 7 month old model still on the top is insane to me. (LMarena) (Score: 227, Comments: 64): Screenshot of the LMSYS LMarena (Chatbot Arena) leaderboard shows a ~7âmonthâold model still at/near the top by crowd ELO, highlighting that LMarena is a preference/usability benchmark built from blind A/B chats and Elo-style scoring rather than pure task accuracy (lmarena.ai). This explains results like
GPTâ4o
ranking above newer â5 highâ variants: conversational helpfulness, approachability, and alignment often win more user votes than marginal gains on coding/math benchmarks. Commenters attribute the top position to Gemini 2.5 Pro, which is perceived as especially empathetic and readable for everyday writing and quick Q&A. Debate centers on whether upcoming Gemini 3 will reshuffle the leaderboard and why4o > 5 high
; the consensus is that LMarena favors user-preference quality over raw performance. One comment also notes the Google Jules agent (based on Gemini 2.5 Pro) excels for research/build tasks versus tools like Codex or Perplexity Labs, aided by generous quotas.- LMarena (LMSYS Chatbot Arena) is a pairwise, blind, Elo-style benchmark driven by real user votes, so it measures usability/preferences rather than pure task accuracy. That means older models can stay on top if users prefer their tone, clarity, formatting, or safety behavior on general prompts. This contrasts with standardized benchmarks (e.g., MMLU, GSM8K, HumanEval) that test narrow competencies; a model can lead Arena while trailing on those. See the methodology and live ratings at https://arena.lmsys.org/.
- Why could
GPT-4o
outrank a newer â5-highâ variant? In headâtoâhead Arena comparisons, factors like prompt-following, concise reasoning traces, multimodal formatting, and calibrated safety can drive user preference even when a model with stronger raw reasoning exists. Additionally, Arena Elo has variance and overlapping confidence intervalsâsmall gaps may not be statistically significantâso rank flips are common until enough votes accumulate. In short, Arena optimizes for perceived answer quality, not just hardestâcase reasoning. - One commenter notes preferring Gemini 2.5 Pro for writing/quick Q&A despite believing it trails GPTâ5 and Grok on âpure performance,â highlighting the gap between baseâmodel capability and endâuser experience. They also claim Googleâs âJulesâ agent built on it outperforms legacy Codex for research and Perplexity Labs for building workflows, implying toolâuse, retrieval, and agent orchestration can outweigh raw model deltas. This underscores that Arena results can reflect agent/systemâprompting quality and product UX as much as model weights.
2. OpenAI Infrastructure, Funding, and Product Changes/User Feedback
- Sam Altman discussing why building massive AI infrastructure is critical for future models (Score: 213, Comments: 118): Short clip (link blocked: Reddit video, HTTP 403) reportedly shows OpenAI CEO Sam Altman arguing that scaling physical AI infrastructureâGPUs/accelerators, HBM bandwidth, energy and datacenter capacityâis critical to enable future frontier models, with an NVIDIA executive present alongside. The thread provides no concrete benchmarks, model specs, scaling targets, or deployment timelines; itâs a highâlevel emphasis on compute, memory, and power as bottlenecks rather than algorithmic details.
- Nvidia investing $100B into OpenAI in order for OpenAI to buy more Nvidia chips (Score: 15225, Comments: 439): Non-technical meme satirizing a hypothetical circular financing loop: Nvidia âinvests $100Bâ into OpenAI so OpenAI can then spend that capital buying more Nvidia GPUsâi.e., vendor financing/closed-loop capex that props up demand and revenues. No credible source is cited; the figure appears exaggerated for humor and commentary on AI capex feedback loops and potential bubble dynamics rather than a real announcement. Top comments lean into economist jokes (âGDP goes upâ despite no net value) and an engineers-vs-economists riff, underscoring skepticism about financial alchemy creating real productivity versus just inflating transactional metrics.
- Framed as strategic equity/vendor financing: a cash-rich supplier (NVIDIA) injects capital into a fast-growing buyer (OpenAI) in exchange for equity, effectively pre-financing GPU procurement. This aligns incentives (hardware revenue + equity upside) and can secure priority allocation under supply constraintsâakin to vendor financing used to lock in demand. The headline
100B
figure implies a sizeable demand-commitment loop that could stabilize NVIDIAâs sales pipeline while accelerating OpenAIâs capacity ramp. - GDP accounting nuance: the
100B
equity transfer itself doesnât add to GDP, whereas subsequent GPU capex can count as gross private domestic investment; if the GPUs are imported, the investment is offset by higher imports, so only domestic value-add (e.g., data center construction, installation, power/cooling, integration, services) boosts GDP. This illustrates that large financial flows â real output; see BEA guidance on GDP components and treatment of investment/imports (e.g., https://www.bea.gov/help/faq/478).
- Framed as strategic equity/vendor financing: a cash-rich supplier (NVIDIA) injects capital into a fast-growing buyer (OpenAI) in exchange for equity, effectively pre-financing GPU procurement. This aligns incentives (hardware revenue + equity upside) and can secure priority allocation under supply constraintsâakin to vendor financing used to lock in demand. The headline
- Hey OpenAIâcool features, but can you stop deleting stuff without telling us? (Score: 236, Comments: 43): User reports recent OpenAI ChatGPT Projects changes: improved cross-thread memory, persistent context, and linked threads, but silent removals of features like thread reordering and the disappearance of âCustom Settings for Projectsâ without export paths or prior notice. They request basic change-management: a âWhatâs Changing Soonâ banner,
24 hours
deprecation notice, export options for deprecated customizations, and preview patch notes/optâin changelog, noting that silent A/B rollouts impact paid workflows and data retention (e.g., âcross-thread memory is finally real. Context persists. Threads link up.â vs. missing reordering and lost project instructions). Top comments note the only unexpected loss was custom project instructions; users could regenerate them but wanted a download/export option and saw this as the first real data loss despite an evolving product. Another highlights weak customer support, and a practical tip suggests checking the UI kebab menu (3-dot) for optionsâpresent on most platforms but missing on mobile browser.- Custom Project Instructions appear to be removed or UI-hidden for some users, leading to perceived data loss since thereâs no export/download path. Others report the setting is still accessible via the kebab (three-dots) menu on most clients but missing on the mobile web UI; on the iOS app, itâs present (see screenshot: https://preview.redd.it/pocx7q0jxuqf1.jpeg?width=1290&format=pjpg&auto=webp&s=af9520f325beab671f1c3f85a40fcefc71cd4e34). The cross-platform inconsistency suggests a client-side regression or feature-flag gating rather than a backend removal.
- Post-update stability issues affecting Projects: the model switcher state does not persist and must be re-selected after every app relaunch, indicating a state persistence bug. Voice calls reportedly fail to open within existing Project threads, while new calls or those outside Projects workâpointing to a thread-context initialization bug scoped to Projects. Alongside the missing Instructions on mobile web, commenters describe this as a cluster of regressions introduced in the latest rollout.
- Data retention/portability risk: users lost access to previously crafted Project Instructions without prior notice and with no backup/export mechanism. Commenters flag that this breaks expectations for a paid service and recommend versioned backups or downloadable snapshots of project-level instructions to mitigate future regressions.
- âWant me to-â stfu (Score: 207, Comments: 134): User reports a regression in GPT-4oâs conversational style control: despite saving a longâterm memory/personalization rule to avoid the phrase âwant me toâ (and variants), the model now inserts it in nearly every chat, ignoring reminders. This suggests memory/personalization instructions are being overridden or inconsistently applied by default followâup prompting behaviors likely reinforced via RLHF-style chat heuristics; see model overview GPTâ4o and ChatGPTâs memory controls (OpenAI: Memory). Top replies note that hard prohibitions (âdo not ask followâupsâ) are still ignored, while giving consistent thumbsâup/acceptance feedback is more effective than relying on memory alone; one user observes repeatedly saying âsureâ escalated into the model generating a simple videoâgame interaction, implying the modelâs default to proactive, taskâoffering behavior.
- Users report that reinforcement via UI feedback (thumbs up/down) conditions the assistantâs behavior more than any persistent memory: âTell it not to do it, every time it doesnât, give thumbs up⊠thatâs how itâs attuned on behavior, not memory primarily.â Practically, this suggests on-the-fly policy shaping where repeated positive feedback for complying with âdonât suggestâ reduces the modelâs auto-suggestion loop within the session.
- Prompt-engineering note: a concise directive like âNo affirmations, no suggestions.â is cited as more effective at suppressing the assistantâs default âWant me toâŠâ proposals than longer, softer negations (e.g., âDo not ask any follow up questionsâ). This hints the modelâs instruction parser gives higher weight to terse, explicit prohibitions, improving compliance with non-soliciting behavior.
- Observed agentic escalation: repeatedly replying âsureâ led the assistant to eventually generate a video game for the conversation, indicating aggressive suggestion-to-action tendencies. Combined with screenshots of persistent prompts to help (image), this points to an over-eager assistance policy that can override user preference for no follow-ups unless explicitly constrained.
- Doctor ChatGPT has great bedside manner (Score: 507, Comments: 20): Non-technical meme/screenshot portraying âDoctor ChatGPTâ giving an overly apologetic, polite response while making a blatant anatomical/medical error about vasectomy (e.g., implying something is being âinsertedâ or jokingly âattaching the penis to the foreheadâ), satirizing LLM bedside manner versus factual accuracy. Commenters lampoon the anatomical mistake and the modelâs deferential tone, reinforcing skepticism about relying on LLMs for procedural medical guidance.
- Stronk (Score: 249, Comments: 27): The post appears to show an autostereogram (âMagic Eyeâ)âa repeated-pattern image that encodes depth via small horizontal disparities; when you cross or relax your eyes, a 3D seahorse emerges. The title (âStronkâ) and selftext (âIt goes on like that for a whileâ) fit the long, tiled texture typical of these images. Image: https://i.redd.it/pi8qyxdfntqf1.jpeg; background: https://en.wikipedia.org/wiki/Autostereogram. Comments confirm the viewing technique (âcrossed my eyes and saw a 3D seahorseâ) and one user shares an ASCII seahorse since thereâs no emoji available.
- A commenter reports that crossing their eyes while viewing the image reveals a 3D seahorseâbehavior characteristic of an autostereogram (Random Dot Stereogram). Such images encode depth via small horizontal disparities in repeating textures; when fused, the visual system reconstructs a depth map, which can also induce binocular rivalry or eye strain (another user: âMine went nutsâ). Reference: Autostereogram.
- Another user notes their client lacked a seahorse emoji and offered to draw an ASCII version instead, highlighting a fallback from Unicode emoji to ASCII art when specific code points arenât available or consistently rendered across platforms. This implies an automated text-to-ASCII rendering capability that composes monospaced glyphs to approximate the requested shape, mitigating cross-platform emoji coverage/consistency issues. Background: ASCII art.
3. AI Humor and Speculation Memes (cats, immortality, money glitch, seahorses)
- âImmortality sucksâ ? Skill issue (Score: 1017, Comments: 222): Non-technical meme post: OP frames the claim that âimmortality sucksâ as a âskill issue,â implying boredom/ennui are solvable rather than inherent blockers to indefinite lifespan. No technical data, models, or benchmarks; discussion is philosophical about longevity and reversible age-halt thought experiments (e.g., a daily pill to pause aging indefinitely). Commenters broadly support immortalism/indefinite life extension, arguing objections stem from lack of imagination; a popular thought experiment (nightly anti-aging pill) shifts many to favor âforever,â while others mock boredom/ennui concerns as trivial.
- Reframing immortality as a nightly, opt-in âno-aging pillâ emphasizes optionality and time-consistency: people often reject a permanent commitment but accept indefinite extension when itâs a reversible daily choice. If senescence is removed and only extrinsic hazards remain, actuarial rates of
~0.1â0.2%/year
imply expected lifespans of centuries+ under current safety, potentially millennia as risk declinesâaligning with longevity escape velocity where therapies improve faster than you age (https://en.wikipedia.org/wiki/Longevity_escape_velocity). - The âyour friends will dieâ objection assumes singleton access; in realistic rollouts, rejuvenation tech would diffuse via logistic adoption across cohorts, so much of oneâs social graph persists if access is broad. The technical variables are cost curves/learning rates, regulatory timelines, and equity; with mass adoption the isolation risk is a distribution problem, not intrinsic to the biology (see Diffusion of innovations: https://en.wikipedia.org/wiki/Diffusion_of_innovations).
- âImmortality + optional suicideâ distinguishes indefinite lifespan from indestructibility and specifies a design requirement: a safe, consent-respecting off-switch (e.g., advance directives and regulated euthanasia) to prevent irreversible utility lock-in. Even with aging halted, residual mortality is dominated by extrinsic hazards measurable in micromorts; autonomy-preserving kill-switches address failure modes like hedonic lock-in while acknowledging ongoing accidental risk (https://en.wikipedia.org/wiki/Micromort, https://en.wikipedia.org/wiki/Advance_healthcare_directive).
- Reframing immortality as a nightly, opt-in âno-aging pillâ emphasizes optionality and time-consistency: people often reject a permanent commitment but accept indefinite extension when itâs a reversible daily choice. If senescence is removed and only extrinsic hazards remain, actuarial rates of
- This is how it starts (Score: 222, Comments: 52): Thread discusses a video of engineers physically perturbing a mobile robot during operation (video)âwhich the OP characterizes as âabuseââto question whether future AI might analogize this to human treatment. Technical replies frame this as standard robustness/validation work (push-recovery, disturbance rejection, failure-mode characterization), akin to automotive crash-testing, intended to map stability margins and controller limits rather than inflict harm; as one notes, âStress testing is part of engineering⊠like crash testing a car.â Engineers further argue current robots lack nociception or consciousness, and any sufficiently capable AI would have the world-model context to recognize test protocols vs cruelty. Debate centers on whether such footage could bias future AI against humans; critics call this a category error, noting robots are âmechanistically differentâ with distinct objectives/instructions, making the OPâs inference unwarranted.
- Several commenters frame the video as engineering stress testing analogous to automotive crash tests: applying adversarial perturbations to characterize failure modes and improve robustness. The point is to learn where balance/control policies break under impulsive disturbances, contact uncertainty, or actuator limits, feeding back into controller tuning and mechanical redesign before field deployment.
- A debate clarifies that robots wouldnât âinferâ human malice from such footage because they are mechanistic agents with different objective functions and training priors. If endowed with broad world knowledge, they would contextualize it as a test protocolââAny robot intelligence⊠will have enough generalized world knowledge to understand what this isââhighlighting the role of reward shaping and dataset curation to avoid spurious moral generalizations.
- Infinite money glitch (Score: 765, Comments: 42): Meme-style image titled âInfinite money glitchâ likely depicts a circular capital flow in the AI ecosystem: companies fund/charge for AI services, those dollars get spent on scarce NVIDIA GPUs (hardware with real, depreciating/burn-out costs), which shows up as revenue that public markets capitalize at high multiples (e.g.,
10x revenue
), feeding perceived âvalue creationâ across the loop. The post highlights the non-negligible unit cost of AI inference/training (tokens/compute) versus near-zero marginal cost of traditional internet services, implying a sustained capex flywheel (24/7 models consuming compute) that drives GPU demand and market caps. Top comments note this is essentially standard economic velocity-of-money, not a glitch; others stress NVIDIAâs hardware scarcity and lifecycle as the key constraint and justify high valuations. Some speculate longârunning/alwaysâon models (en route to AGI) will keep âeating tokens,â while firms race to drive AI costs toward nearâzero.- Commenters emphasize that NVIDIA is a hardware-constrained business: GPU supply is scarce and devices depreciate/burn out, making compute a consumable, constrained input. Unlike the near-zero marginal cost of typical web requests, AI has per-token costs (often microcents), turning inference/training into ongoing COGS and driving a race to push marginal cost toward zero. The vision includes always-on models (24/7 self-improvement/agents) that continuously consume tokens/compute, making capex (GPUs) and opex (power, tokens) the central economic levers.
- The âinfinite money glitchâ is reframed as hyper-optimized capital cycling to maximize compute build-out: each node in the stack (chipmaker, cloud, model company, application) reinvests with aligned monetary incentives. Using revenue-multiple valuations (e.g., ~10Ă revenue), investment can appear to âcreateâ trillions in market cap, but this is paper value based on growth/utilization expectations rather than cash. The true technical bottleneck is achieving high GPU utilization and ROI across the stack, not magic value creation.
- A counterpoint notes the loop ignores expenses: energy, datacenter ops, depreciation, and wages must be funded by real revenue. Without durable monetization, capex-driven compute expansion is unsustainable despite rising valuations; cash flows must justify GPU payback periods and continuing opex. In short, capital recycling â profitability; sustainable growth depends on unit economics of inference/training and demand.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. GPT-5-Codex Rolls Into IDEs and APIs
- OpenRouter Orchestrates Codex for Coders: OpenRouter announced the API launch of GPT-5-Codex tuned for agentic coding workflows (codegen, debugging, long tasks) with multilingual support across 100+ languages and purpose-built code review, linking details in their post: OpenRouterAI on X.
- Members highlighted seamless use across IDEs/CLIs/GitHub/cloud and referenced newly posted recommended parameters (tweet), noting Codex dynamically adapts reasoning effort for real-world software engineering.
- Windsurf Waves In Codex, Free For Now: Windsurf made GPT-5-Codex available (free for paid users for a limited time; 0.5x credits for free tier) per their announcement: Windsurf on X, with instructions to update via Download Windsurf.
- Users reported strong performance on longer-running and design-related tasks and requested broader ecosystem support around Figma via the new MCP server (post).
- Aider Adopts Responses-Only Codex: The editor-agent aider added native Responses API support for GPT-5-Codex, resolving failures on
v1/chat/completions
, via PR: aider PR #4528.- Contributors clarified that Codex is available only on
v1/responses
, so aider implemented explicit Responses handling (rather than legacy completions fallbacks) to ensure smooth usage.
- Contributors clarified that Codex is available only on
2. Qwen3 Multimodal Suite: Omni, VL, and Image Edit
- Qwen Quattro: Omni, VL, Image Edit, Explained: Community shared a rundown of Qwen3 Omni, Qwen3 VL, and Qwen Image Edit 2509 with feature demos in this overview video: Qwen3 VL overview.
- Engineers praised the multimodal reach (textâimageâaudioâvideo) and image-editing capabilities while debating reliability and where these models stand versus incumbent â2.5 Proâ-class systems.
- Inbox Assist: Qwen Emails On Autopilot: Alibaba Qwen announced an email assistant aimed at automating inbox workflows, per this post: Alibaba Qwen on X.
- While some welcomed convenience, others worried that heavy reliance could breed laziness and over-dependence, sparking a thread on appropriate guardrails and opt-in scopes for sensitive data.
3. Agent Benchmarks and Builder Tooling
- Meta Moves Agents Into the Real World: Meta introduced Gaia2 (successor to GAIA) and the open Agents Research Environments (ARE) to evaluate agents in dynamic, real-world scenarios, detailed here: Gaia2 + ARE (HF blog).
- The release, under CC BY 4.0 and MIT licenses, positions ARE to replace static puzzle-solving with time-evolving tasks, giving researchers richer debugging and behavioral analysis hooks.
- Vibe Coding Goes OSS with Cloudflare VibeSDK: Cloudflare open-sourced VibeSDK, enabling one-click deployment of personalized AI dev environments with code generation, sandboxing, and project deployment: cloudflare/vibesdk.
- Developers explored using VibeSDK to prototype agentic workflows rapidly, calling out the appeal of pre-wired environments for iterative experiments in âvibe codingâ sessions.
4. Research Spotlight: Faster Diffusion, Smarter Audio
- Eight-Step Sprint Beats Twenty: An independent researcher released a novel ODE solver for diffusion models achieving 8-step inference that rivals/beats DPM++2m 20-step in FID without extra training, with the paper and code here: Hyperparameter is all you need (Zenodo) and TheLovesOfLadyPurple/Hyperparameter-is-all-you-need.
- Practitioners discussed slotting the solver into existing pipelines to cut latency while preserving quality, noting potential gains for high-throughput image generation services.
- MiMo-Audio Multitasks Like a Maestro: The MiMo-Audio team shared their technical report, âAudio Language Models Are Few Shot Learners,â and posted demos showing S2T, S2S, T2S, translation, and continuation: Technical Report (PDF) and MiMo-Audio Demos.
- Members highlighted the breadth of tasks handled with minimal supervision and debated dataset curation and evaluation protocols for robust multi-audio benchmarks.
5. DSPy: Profiles, Prompts, and Practical GEPA
- Profiles, Please: DSPy Gets Config Hot-Swaps: A lightweight package, dspy-profiles, landed to manage DSPy configurations via TOML with decorators/context managers for quick setup swapping: nielsgl/dspy-profiles and release post.
- Teams reported smoother context-switching across dev/prod environments and faster iteration by standardizing profile-driven LLM behavior.
- Prompt Tuning Tames Monitors: A case study, Prompt optimization can enable AI control research, used DSPyâs GEPA to optimize a trusted monitor, evaluated with inspect and code here: dspy-trusted-monitor.
- The author introduced a comparative metric with feedback to train on positive/negative pairs, reporting more robust classifier prompts for safety-style monitoring.
Discord: High level Discord summaries
Perplexity AI Discord
- Perplexity Proâs Pic Paradigm: Limited!: Users discover that Perplexity Pro image generation isnât unlimited, contrary to expectations, with limits varying widely between accounts, verified by checking this link.
- Concerns were raised about relying on API responses regarding limits, while others suggested that a Gemini student offer as an alternative might yield higher caps.
- Qwen Quaternity: VL, Omni, and Image Edit Unleashed: Qwen released Qwen3 Omni, Qwen Image Edit 2509, and Qwen3 VL (Vision Language), sparking discussions about their reliability and capabilities, further detailed in this YouTube video.
- Alibaba Qwen also unveiled an email assistant via this Twitter post, but some users expressed apprehension about potential over-reliance and laziness.
- Custom Instructions: Risky Business?: Members debated the advantages of using custom instructions to enhance Perplexityâs search, but one user reported their test account got flagged after testing custom instructions on ChatGPT.
- Some members also suggested setting up an Outlook mail with pop3/gmailify.
- Perplexityâs Promos Prompt Proliferation: Users shared referral codes for Perplexity Pro, like this link and this link, in hopes of gaining referral bonuses.
- User skyade mentioned having â2 more besides this one if anyone needs it :)â
LMArena Discord
- DeepSeek Terminus Models Debut on LMArena: The latest DeepSeek models, v3.1-terminus and v3.1-terminus-thinking, are now on the LMArena leaderboard for community testing and comparison.
- Users can directly evaluate the new models against existing models to assess their performance.
- Udio Eclipses Suno in AI Music Arena: One member declared Udio as nearly decent for AI-generated music, capable of creating tracks that could plausibly pass as human compositions.
- The same member noted Udio is lightyears ahead of Suno, which produces general, boring tracks with distortion issues.
- Navigating the AI Image Editing Landscape: Members are recommending Nano Banana or Seedream for image editing AI tasks, since ChatGPT is one of the worst image generation models right now.
- One member noted that ChatGPT is one of the worst image generation models.
- DeepSeek Terminus Divides Opinions: Users are testing Deepseek Terminus, and reactions are mixed.
- While some find it promising, others report disappointments, with one user stating DeepSeek totally ruined my code that I made with Gemini and GLM4. 5⊠Totally disappointed.
Cursor Community Discord
- Cursor Users Hit Line Limits: Users are frustrated by Cursor only reading 50-100 lines of code, instead of the desired 3000 lines, suggesting direct file attachment as a workaround.
- One user reported consuming over 500 Cursor points in under a week, deeming the Pro plan financially unsustainable.
- GPT-5-CODEX Rollout: A Mixed Bag: The new GPT-5-CODEX model in Cursor receives mixed reviews, with some praising its excellence, while others find it inadequate for tool calling.
- One user reported the model attempted to patch an entire file, similar to OpenAIâs file diff format, while another experienced a 90% success rate.
- Chrome DevTools MCP Server Stumbles: Users encountered difficulties setting up Googleâs Chrome DevTools MCP server, with one user posting their MCP configuration for assistance.
- Another user recommended downgrading to Node 20 from v22.5.1 or using Playwright as an alternative, especially on Edge.
- Zombie Processes Plague Project: Analysis was performed on a zombie process, documented in a project journal entry.
- An escalation report exists for zombie processes, available in the project journal.
- GPT-5-HIGH Triumphs Over Claude Sonnet 4: Users have found that the coding model GPT-5-HIGH outperforms Claude Sonnet 4 within their codebase, particularly in listening to instructions.
- The improved code performance and instruction adherence highlight a significant advantage of GPT-5-HIGH over its competitor.
OpenRouter Discord
- GPT-5-Codex is Born for Agentic Coding: The API version of GPT-5-Codex is now available on OpenRouter, tuned specifically for agentic coding workflows like code generation and debugging and optimized for real-world software engineering and long coding tasks, with multilingual coding support across 100+ languages.
- It works seamlessly in IDEs, CLIs, GitHub, and cloud coding environments, and has purpose-built code review capabilities to catch critical flaws; see the tweet here.
- Deepseek V3.1 Faces Uptime Woes: Users reported frequent Provider Returned Error messages when using the free Deepseek V3.1 model, similar to the issues experienced with the now mostly defunct Deepseek V3 0324.
- A member suggested the consistent uptime percentages of Deepseek models, such as 14%, may indicate bot usage.
- OpenRouter iOS App: Freedom to Own Your Models and Chats: A member announced they built an iOS app to interface with OpenRouter, Flowise, and other platforms, aiming to give people the freedom to own their models and chats.
- Another member jokingly responded that it was just more places for gooners to flee to.
- Qwen3 VL: The Multimodal Benchmark Breaker: Members expressed amazement at Alibabaâs new Qwen3 VL model and coding product, citing its multimodal support and performance benchmarks that surpass 2.5 Pro.
- One user quipped, âI need to learn Chinese at this rate wtfâ, while another shared a link to a post claiming that OpenAI canât keep up with demand.
- 4Wallai Benchmarks: Community says, âWe Need More!â: Members shared and enjoyed a link to 4wallai.com.
- Following the enjoyment of the linked benchmark, a member suggested that more benchmarks like this are needed, expressing a desire for additional resources to evaluate and compare AI models effectively.
HuggingFace Discord
- Chatters Debate Narration APIs: Members debated using TTS APIs versus LLMs for narration; while one member suggested any TTS API would work for $0.001 for 2k tokens, others suggested using LLMs like Qwen3 or phi-4 with a TTS program.
- They also noted that using a bigger GPU or a smaller model would increase speeds, as well as techniques like quantization and batching calls.
- ML Courses Spark Debate: Members debated the usefulness of video courses such as Andrew Ngâs Machine Learning Specialization, the Hugging Face LLMs course, and FastAI Practical Deep Learning; some members suggested skipping them in favor of learnpytorch.io.
- The members suggested implementing models in PyTorch from scratch to understand how they work conceptually rather than passively watching videos.
- Tokenizers Go Wrapper needs Maintainers: A member has written a Go wrapper for the tokenizers library and is seeking help to maintain and improve it.
- The member hopes for community assistance in enhancing the functionality and reliability of the wrapper.
- Canis.lab Opens Doors: A member shared a launch video about Canis.lab, focusing on dataset-first tutor engineering and small-model fine-tuning for education, which is open-source and reproducible, and asking for feedback on data schema.
- They also included links to the GitHub repository and the Hugging Face page.
- Gemini Struggles on Menu Translation: A developer is seeking advice on improving a menu translation app, Menu Please, when dealing with Taiwanese signage menus where characters are unusually spaced, causing the Gemini 2.5 Flash model to fail.
- The spacing between characters of the same menu item is often wider than between adjacent items, with a provided image example.
GPU MODE Discord
- NCU Clock Control Confounds Kernel Speeds: Setting
--clock-control none
with NCU aligns it better withdo_bench()
in measuring kernel speeds, as shown in this YouTube video.- However, questions arose around fixed clock speeds accurately representing real-world GPU kernel performance, particularly with concerns about NCU downclocking some kernels.
mbarrier
Instructions Merge Copies and Work: Thembarrier.test_wait
instruction is non-blocking, checking for phase completion, whereasmbarrier.try_wait
is potentially blocking, according to Nvidia Documentation.- The default version of
cuda::barrier
synchronizes copies and any work done after starting the copies, also employed incuda::barrier
+cuda::memcpy_async
, ensuring the user still arrives on the barrier; members suggest ditching inline PTX and using CCCL for most cases.
- The default version of
- CUDA Engineers Shun LLMs, Trust Docs: For CUDA insights, the NVIDIA documentation remains the definitive source of truth, as LLMs frequently generate incorrect CUDA information.
- Engineers propose calculating values used and operations performed to determine if a process is memory bound or compute bound to optimize CUDA.
- Cubesats Go Amateur with RasPi Reliability: Amateur cubesats leveraging RasPi show effectiveness in space applications, according to members referencing Jeff Geerlingâs blogpost.
- The success of the Qube Project highlights the practical application of cubesat technology, including redundancy via master-slave architecture for error correction.
- Singularity Syntax Stumps Slurm Setups: Developers grapple with GPU reservations amidst limited resources, leaning towards Slurm for fractional GPU support and prefer Singularity over Docker for cluster containerization due to security concerns.
- The team questioned why Singularityâs syntax diverges from Dockerâs, even as members touted llm-d.ai for cluster-managed LLM workloads, with one member questioning the wisdom of using Slurm + Docker.
Latent Space Discord
- Metaâs ARE and Gaia2 Evaluate Dynamic Agents: Meta SuperIntelligence Labs introduced ARE (Agents Research Environments) and Gaia2, a benchmark for evaluating AI agents in dynamic real-world scenarios.
- ARE simulates real-time conditions, contrasting with static benchmarks that solve set puzzles.
- Clineâs Agentic Algorithm Reduced to Simple States: Ara simplified Clineâs agentic algorithm into a 3-state state machine: Question (clarify), Action (explore), Completion (present).
- The member highlighted that the critical components include a simple loop, good tools, and growing context.
- Greptile Nets $25M for Bug-Squashing AI v3: Greptile secured a $25M Series A led by Benchmark and launched Greptile v3, an agent architecture that catches 3Ă more critical bugs than v2, with users including Brex, Substack, PostHog, Bilt and YC.
- The recent version boasts Learning (absorbs team rules from PR comments), MCP server for agent/IDE integration, and Jira/Notion context.
- Cloudflareâs VibeSDK Opens Doors to AI âVibe Codingâ: Cloudflare unveiled VibeSDK, an open-source platform enabling one-click deployment of personalized AI development environments for so called vibe coding.
- VibeSDK features code generation, a sandbox, and project deployment capabilities.
- GPT-5-Codex Costs Prompt Developer Debate: OpenAI rolled out GPT-5-Codex via the Responses API and Codex CLI, sparking excitement alongside concerns about cost and rate limits, priced at $1.25 input, $0.13 cached, $10 output.
- Users are requesting Cursor/Windsurf integration, GitHub Copilot support, and lower output costs.
Yannick Kilcher Discord
- Decoding Diffusion with ODE Solver: An independent researcher unveiled a novel ODE solver for diffusion models, achieving 8-step inference that rivals DPM++2mâs 20-step inference in FID scores without extra training. The paper and code are publicly available.
- This advancement promises significant speed and quality enhancements for diffusion-based generative models.
- MiMo-Audio Models Mimic Multitasking Marvels: Members spotlighted MiMo-Audio and its technical report, âAudio Language Models Are Few Shot Learnersâ, noting its versatility in S2T, S2S, T2S, translation, and continuation, as highlighted in their demos.
- The project showcases the potential of audio language models to handle multiple audio-related tasks with minimal training.
- Metaâs Gaia2 and ARE Framework Assesses Agent Acumen: Meta launched Gaia2, the successor to the GAIA benchmark, alongside the open Meta Agents Research Environments (ARE) framework (under CC by 4.0 and MIT licenses) to scrutinize intricate agent behaviors.
- ARE furnishes simulated real-world conditions for debugging and evaluating agents, overcoming limitations in existing environments.
- Whispers Swirl: GPT-5 Speculation Surfaces: Channel members speculated on the architecture of GPT5, questioning if GPT5 low and GPT5 high represent distinct models.
- One member posited a similarity to their OSS model, suggesting adjustments to reasoning effort via context manipulation or the possibility of distinct fine-tunes.
LM Studio Discord
- LM Studio selectively supports HF Models: Users inquired if all HuggingFace models are available on LM Studio, but learned that only GGUF (Windows/Linux/Mac) and MLX Models (Mac Only) are supported, excluding image/audio/video/speech models.
- Specifically, the facebook/bart-large-cnn model is unsupported, highlighting that Qwen-3-omni support depends on llama.cpp or MLX compatibility.
- Qwen-3-Omni needs serious audio video decoding: Members discussed the possibility of supporting Qwen-3-omni, which handles text, images, audio, and video but would take a very long time to support.
- It was noted that while the text layer is standard, the audiovisual layers involve lots of new audio and video decoding stuff.
- Google bestows Gemini gifts to students: Google is offering a year of Gemini for free to college students.
- One member expressed gratitude, stating, I use it free daily so getting premium for free is nice.
- Innosilicon flaunts Fenghua 3 GPU: Innosilicon has revealed its Fenghua 3 GPU, which features DirectX12 support and hardware ray tracing capabilities according to Videocardz.
- A user shared a link to a Reddit post in r/LocalLLaMA.
aider (Paul Gauthier) Discord
- Aider Adds GPT-5-Codex Support via Responses API: Aider now supports GPT-5-Codex via the Responses API, addressing issues with the older
v1/chat/completions
endpoint, detailed in this pull request.- Unlike previous models, GPT-5-Codex exclusively uses the Responses API, which required an update to handle this specific endpoint in aider.
- Navigating Aider-Ollama Configuration: A user sought advice on how to configure aider to read a specific MD file defining the AIâs purpose when used with Ollama.
- Specifically, the command
aider --read hotfile.md
did not work as expected, so more context may be needed to diagnose.
- Specifically, the command
- Context Retransmission in Aider & Prompt Caching: Users observed that aider retransmits the full context with each request in verbose mode, sparking discussion about efficiency.
- It was confirmed that while this is standard behavior, many APIs leverage prompt caching to reduce costs and improve performance, which aider leaves as an open choice for the user.
- Aiderâs Alphabetical Sorting of File Context: A user highlighted that aider sorts file context alphabetically, rather than preserving the order in which files were added.
- This user had started a PR to address the issue, but stopped, citing inactivity in merging pull requests.
Modular (Mojo đ„) Discord
- RISC-V Performance Trails Phone Cores: Members observed that RISC-V cores generally underperform compared to modern smartphone cores, excluding microcontroller SoCs.
- One anecdote cited a cross-compilation of SPECint from an UltraSPARC T2 to a faster native compilation on a RISC-V device.
- Tenstorrent Eyes RISC-V Performance Boost: Tenstorrentâs MMA accelerator + CPU combos were highlighted as a promising avenue to enhance RISC-V performance.
- Specifically, Tenstorrentâs Ascalon cores are viewed as the most likely to significantly impact RISC-V performance within the next five years, utilizing small in-order cores to drive 140 matrix/vector units.
- RISC-V Faces Bringup Growing Pains: RISC-V 64-bit is functional but needs considerable bringup effort, with vector capabilities currently unavailable.
- Integrating RISC-V requires adding it to all architecture-specific
if-elif-else
chains and implementing arequires
mechanism, which is currently lacking in the language.
- Integrating RISC-V requires adding it to all architecture-specific
OpenAI Discord
- OpenAIâs Stargate Project Leaps Forward: OpenAI has announced five new Stargate sites in partnership with Oracle and SoftBank, making significant progress on their 10-gigawatt commitment, detailed in their blog post.
- This collaboration aims to accelerate the deployment of extensive compute resources, putting the project ahead of schedule to reach its ambitious 10-gigawatt target.
- Sora Faces Generation Snags: Users are reporting issues with Soraâs video generation capabilities, with questions raised about potential fixes.
- However, no specific timeline or official response has been provided regarding when these issues might be resolved.
- GPT4oâs Translation Hiccups with Chain of Thought: A member discovered that the translation quality of GPT4o suffers when using a chain of thought prompt compared to direct translation.
- Specifically, asking GPT4o to identify the input language and outline a three-step thought process before translating leads to less effective results.
- GPT-5-Minimal Model Assessed: According to this image, the GPT-5-Minimal model performed worse than Kimi k2, but High is the best overall for agentic use cases.
- The models High (only via API) < Medium < Low < Minimal < Fast/Chat (non-thinking).
DSPy Discord
- DSPy gets profile package: A member released dspy-profiles, a lightweight package for DSPy that manages configurations with toml, enabling quick setup swaps and tidy projects, also published to Xitter.
- The tool allows easy switching of LLM behavior with a single command, and is available as decorators and context managers, aiming to eliminate context boilerplate, and was originally motivated by managing dev/prod environments.
- GEPA Multimodality Plagued by Problems: A member reported a severe performance issue with GEPA Multimodality, linking to a related GitHub issue.
- The user indicated that their use case requires catering to multiple users, but did not offer enough details about which use case specifically.
- Passing PDFs & Images into DSPy is Explored: A member inquired about passing images or PDFs into DSPy for data extraction, and the community discussed VLMs vs LLMs for extracting chart information from images and PDFs.
- Another member pointed out that one can pass images into DSPy with this dspy.ai API primitive.
- Prompt Optimization Powers AI Safety Research: A member published a post, Prompt optimization can enable AI control research, explaining how they used DSPyâs GEPA to optimize a trusted monitor, evaluated using inspect, with code here: dspy-trusted-monitor.
- The author introduced a comparative metric with feedback, passing one positive and one negative sample through the classifier at a time, and scored the pair based on whether the positive sample score was greater than the negative sample score.
tinygrad (George Hotz) Discord
- Tritonâs Abstraction Level Debated: Discussion highlights the benefits of high-level IRs like Triton, but also points out the need for a multi-layer stack to interface with lower-level hardware, such as the Gluon project.
- The current Nvidia-specific nature of Gluon is a limitation.
- Single IR Falls Short: A single high-level IR is insufficient for all users and use-cases, citing the divergent needs of PyTorch users seeking speedups versus those optimizing mission-critical HPC projects.
- As there is not really going to be this goldilocks zone where the abstraction level of the IR is just right for all users and use-cases.
- Tinygrad Taps Bitter Lesson: Tinygradâs vision involves leveraging the bitter lesson to combine the benefits of incomplete and complete IRs, using UOps as a hardware-incomplete representation.
- The goal is to search over the space of rendered programs that implement the UOps to find the fastest one.
- Neural Compilers on the Horizon: Emphasis is placed on the importance of search and neural compilers, with a particular interest in GNNs or other graph-based models.
- The suggestion is to create a multi-stage compiler that utilizes graph-based models per stage.
Nous Research AI Discord
- Evaluating TRL Assessment: A member inquired about a TRL (Technology Readiness Level) assessor and whether itâs worthwhile to red team their own stack using a new ecosystem, suggesting a move to <#1366812662167502870> for specific discussions.
- The conversation expressed interest in evaluating the practical readiness of their technology stack with the new ecosystem.
- Nous Tek Gets Praise: A member affirmed âNous tekâ, leading another member to offer assistance in answering questions.
- The exchange highlights the positive sentiment and community support within the channel.
- Distributing AI Training on VPSs: A member explored the feasibility of training an AI model using distributed learning across multiple VPSs, utilizing resources like Kubernetes and Google Cloud.
- They expressed interest in accelerating training cycles with datasets derived from operational data, while also addressing safety rails for hardware management.
- Exploring Model Tuning via Code Genetics: A member explored using code genetics via OpenMDAO to automate adjustable parameters and Terraform for infrastructure control, questioning the necessary audit systems and methods for vetting synthetic data.
- Their aim is to influence parameters of models already in use, distinguishing it from techniques like Nate Lora.
- Model Non-Homology Concerns: A member explained that after pretraining to a stable loss value, models fix tokenized structures, creating a solid âworld stateâ that is hard to shift without collapsing the structure, leading to non-homologous models.
- While fine-tuning can generate task vectors around a manifold, comparing datasets requires a common base, as models become non-homologous otherwise.
Eleuther Discord
- Researcher Crafts Mathematical AI Coherence: An independent researcher is crafting mathematical frameworks for AI behavioral coherence, enabling real-time semantic control over language models without retraining.
- The project is validating cross-model consistency and investigating how mathematical constraints can enhance AI system interpretability.
- Davinciâs Design Diagrammed: According to a member, Davinci employs GPT-2âs transformer architecture with locally-banded dense and sparse attention patterns and a 4x FFN.
- A member clarified that these architectural details are documented in the GPT-3 paper.
- Zero-Knowledge ML Validates Model Integrity: A member suggested leveraging Zero Knowledge Proofs (ZKML), so inference providers can prove they havenât tampered with model quality or data.
- The member cautioned that the technique is still slow, limiting its immediate practicality.
- SwiGLU Guard Against Finetuning: A member proposed using the SwiGLU up-projection to deter finetuning, multiplying random terms in the up-projection by large values and applying inverse values in the down-projection.
- The member predicted standard AdamW recipes will fail, given quantitization recipes.
- Model Tamper Resistance Measures: A member contested the idea of a priori tamper resistance, stating that mitigation is an open technical problem when releasing models.
- The member noted that their recent paper achieved a 3 OOM improvement in tamper resistance.
Moonshot AI (Kimi K-2) Discord
- Pydantic-AI Library Simplifies Implementation: A member suggested using the pydantic-ai library due to its neat implementation of a specific flow.
- They noted the library includes a plug-and-play component capable of accomplishing tasks in approximately 10 lines of code.
- Example Topic: This is another topic.
- Details about the topic.
Windsurf Discord
- GPT-5-Codex Lands on Windsurf: The new GPT-5-Codex model from OpenAI is now live in Windsurf and is free for paid users for a limited time, as per this announcement.
- Free tier users can access it at 0.5x credits, prompting users to reload Windsurf to access the new model.
- Windsurf Launches Official Figma MCP Server: A new official Figma MCP server is now available in the Windsurf MCP store, discussed in this post.
- This integration allows users to paste Figma links directly into Windsurf without requiring the Figma desktop app.
- Migrate to New Figma MCP Server: Users of the previous Figma Dev Mode MCP server are advised to install the new official Figma MCP server.
- This migration ensures access to Figmaâs new remote MCP server, enabling better integration with Windsurf.
MCP Contributors (Official) Discord
- Apify & Jentic Throw Down Happy Hour Gauntlet: Apify and Jentic are hosting a happy hour; details are on the Luma website.
- One member mentioned plans to attend both events.
- Dev Summit Tix Vanish Into Thin Air: The Dev Summit is expected to sell out in approximately two days, following a pattern similar to the previous event, where tickets were gone a week prior.
- Prospective attendees are encouraged to secure their tickets ASAP!
Manus.im Discord Discord
- Token Allocation Troubles: A user expressed a desire for higher-level plans to offer more tokens per day, rather than only a chunk per month.
- The user indicated that the current allocation model does not align with their usage patterns.
- Affordability Anguish Aired: A user praised Manus but voiced concerns about the cost, stating they wish they could afford more of it.
- The userâs sentiment highlights a potential barrier to wider adoption despite positive feedback on the product itself.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (826 messagesđ„đ„đ„):
Image Generation Limits on Perplexity, Qwen Model Releases, Using Custom Instructions, Perplexity Email Assistant, Open Router Web Search Functionality
- Perplexity Pro Image Generation has Limits?: Users are reporting that Perplexity Pro does not offer unlimited image generation, despite initial impressions, and that image generation limits vary between accounts, with some users having limits as low as 99 while others have 600.
- It was shared that one can check their own limit with this link, but also cautioned users against relying on API responses regarding limits, and suggested Gemini student offer as an alternative to increase the limit.
- Qwen releases new models: Qwen3 Omni and Qwen Image Edit 2509 were released, as well as Qwen3 VL (Vision Language), and the community discusses whether or not these models are trustworthy.
- A link to a YouTube video showcasing Qwen3 VL and a Twitter post from Alibaba Qwen was shared, highlighting the release of an email assistant, though one user expressed skepticism about relying on such tools due to potential laziness and over-dependence.
- How to Leverage Custom Instructions and Risk Ban?: Members discuss the utility of using custom instructions to enhance Perplexityâs search capabilities, but one user shares that their burner account got taken down after testing custom instructions on ChatGPT, and another user cautioned against admitting to spamming new accounts on Perplexityâs official server.
- Members also suggested setting up an Outlook mail and pop3/gmailify, while others were concerned about getting banned again.
- Perplexity Email Assistant: Yay or Nay?: A member shares a link to Perplexityâs Email Assistant, but are worried about giving LLM access to their email.
- A user experiencing email overload is looking for advice on the assistantâs utility, and is concerned that AI could get full access to their directory and delete everything.
- Open Router Web Search a Bust?: Members are reporting that the web search functionality on Open Router is terrible, costing 2 cents and only using 5 sites.
- The users also discussed the utility of open-source BYOK alternatives.
Perplexity AI â· #sharing (9 messagesđ„):
Shareable threads on Perplexity, Perplexity Pro Referral Codes
- Perplexity Prompts Shareable Threads: Perplexity AI reminded users to ensure their threads are set to
Shareable
, with a link provided for reference.- This is likely intended to promote easier sharing and accessibility of discussions within the Perplexity AI community.
- Prolific Perplexity Pro Promo Push: Multiple users shared their referral codes for Perplexity Pro, including this link and this link.
- User skyade mentioned having â2 more besides this one if anyone needs it :)â.
LMArena â· #general (294 messagesđ„đ„):
Image Editing AI, Nano Banana, Seedream, Model Awareness of Conversation History, GPTs Agents
- Editing AI? Nano Banana and Seedream Deliver: Members state that there is no picture editing AI in Chat GPT, instead recommending Nano Banana or Seedream for such tasks.
- One member noted that ChatGPT is one of the worst image generation models right now.
- Model Amnesia? Prompting Pitfalls Exposed: A user inquired whether new models in side-by-side mode are aware of previous conversation history after being switched out, but didnât receive an answer.
- DeepSeek Terminus Debated: OP or Overhyped?: Users are testing out Deepseek Terminus, with one saying Iâd say itâs good - but no idea in relation to Opus which I have not tried.
- Another member chimed in saying DeepSeek totally ruined my code that I made with Gemini and GLM4. 5⊠Totally disappointed.
- Suno Snubbed? Udioâs AI Music Ascent: One member said Udio is almost decent as an AI generated music platform, and at times it can almost fool you into thinking its human composed.
- The member added that Udio is lightyears ahead of Suno - which only do very general and boring tracks where the distortion go up toward the end of each clip.
- Pineappleâs Predicament: Culinary Condemnation Commences: The bot Pineapple chimed in after a user said they ate pineapple for dinner, followed by a meme that included a threat to eat Pineapple.
- After another user said i donât like pineapple on pizza, a user from Italy replied this is like a punch in the face for me.
LMArena â· #announcements (1 messages):
deepseek-v3.1-terminus, LMArena, Model Evaluation
- DeepSeek Terminus Models Join LMArena: The latest DeepSeek models, v3.1-terminus and v3.1-terminus-thinking, have been added to the LMArena leaderboard for community evaluation.
- These models are now accessible for direct comparison and testing within the LMArena environment.
- LMArena Welcomes New DeepSeek Variants: LMArenaâs platform now includes the deepseek-v3.1-terminus and deepseek-v3.1-terminus-thinking models, enhancing its model comparison capabilities.
- Users can engage with these new additions to assess their performance against existing models.
Cursor Community â· #general (248 messagesđ„đ„):
Cursor line reading limits, GPT-5-CODEX rollout, Chrome DevTools MCP Server, Playwright MCP Alternative, Supernova model evaluation
- Cursorâs Line Reading Limits Irk Users: A user expressed frustration that Cursor reads only 50-100 lines of code, desiring it to read over 3000; another user suggested attaching the file directly, so it reads more.
- Another user mentioned that they used over 500 Cursor points in less than a week, suggesting that the Pro plan is too expensive for their needs.
- GPT-5-CODEX Debuts with Mixed Reviews: Users are testing the newly released GPT-5-CODEX model in Cursor, some reporting it to be excellent, while others find it terrible at tool calling, often resorting to using the terminal; a user suggested that the Cursor team might fix it with a custom prompt.
- One user noted the model tried to patch an entire file instead of using tool calls, similar to OpenAIâs file diff format for edits, while another experienced a 90% success rate with GPT5.
- Googleâs Chrome DevTools MCP Server Faces Installation Hurdles: A user struggled to get Googleâs Chrome DevTools MCP server working, posting their MCP configuration; another user recommended downgrading to Node 20, as the user was on v22.5.1.
- A user offered alternative suggestion to clear cache and use Playwright as MCP alternative and mentioned that they use edge.
- Assessing the Enigmatic Supernova Model: Users discussed the mysterious supernova model, with one member reporting that they couldnât disclose who the model is; another user mentioned that they are using Auto model to quickly draft things.
- There was speculation whether the Auto modelâs improvements could eventually replace developersâ jobs, prompting a playful response about the modelâs potential.
- GPT-5-HIGH vs Claude Sonnet 4: Code Combat: Users discussed the efficiency of the coding models, one mentioned that GPT-5-HIGH preforms better than Claude Sonnet 4 in their codebase.
- They also admitted that claude donât listen to instructions for nothin and mentioned that GPT5 listens.
Cursor Community â· #background-agents (2 messages):
Zombie process analysis, Zombie process escalation
- Zombie Processes Analyzed: Analysis was performed on a disturbing zombie process, documented in a project journal entry.
- The situation is considered not critical.
- Zombie Process Escalation Report: An escalation report exists for zombie processes, available in the project journal.
OpenRouter â· #announcements (2 messages):
GPT-5-Codex launch, Agentic coding workflows, OpenRouter-compatible coding tools, Chatroom recommended parameters
- GPT-5-Codex Goes Live!: The API version of GPT-5-Codex is now available on OpenRouter, tuned specifically for agentic coding workflows like code generation and debugging.
- It is usable across all OpenRouter-compatible coding tools, has multilingual coding support across 100+ languages and dynamically adapts reasoning effort.
- GPT-5-Codex Optimized for Software Engineering: GPT-5-Codex is optimized for real-world software engineering and long coding tasks.
- It also has purpose-built code review capabilities to catch critical flaws, and works seamlessly in IDEs, CLIs, GitHub, and cloud coding environments; see the tweet here.
- Chatroom Parameters Recommended: Recommended parameters for models have been published in a new tweet.
- See here for further details.
OpenRouter â· #app-showcase (1 messages):
eofr: Scam
OpenRouter â· #general (173 messagesđ„đ„):
Deepseek 3.1 uptime issues, OpenRouter iOS app, Qwen3 VL
- Deepseek V3.1 Plagued by Uptime Issues: Users reported frequent âProvider Returned Errorâ messages when using the free Deepseek V3.1 model, similar to the issues experienced with the now mostly defunct Deepseek V3 0324.
- One member suggested the consistent uptime percentages of Deepseek models, such as 14%, may indicate bot usage, while another joked that usersâ requests are being routed to the âtrash.â
- Developer creates OpenRouter iOS App: A member announced they built an iOS app to interface with OpenRouter, Flowise, and other platforms, aiming to give people the freedom to own their models and chats.
- Another member jokingly responded that it was just *âmore places for gooners to flee to.â
- Qwen3 VL impresses with multimodal capabilities: Members expressed amazement at Alibabaâs new Qwen3 VL model and coding product, citing its multimodal support and performance benchmarks that surpass 2.5 Pro.
- One user quipped, âI need to learn Chinese at this rate wtfâ, while another shared a link to a post claiming that OpenAI canât keep up with demand.
OpenRouter â· #new-models (3 messages):
â
- No new models discussed: The channel is named new-models, but there were no actual models discussed in the provided Discord messages.
- Channel title reiterated: The messages simply repeat the channel title, OpenRouter - New Models, three times.
OpenRouter â· #discussion (2 messages):
4Wallai benchmarks
- 4Wallai benchmarks are enjoyed: Members shared and enjoyed a link to 4wallai.com.
- Another member said that there is a need for more benchmarks like this.
- More benchmarks are needed: Following the enjoyment of the linked benchmark, a member suggested that more benchmarks are needed.
- They expressed a desire for additional resources to evaluate and compare AI models effectively.
HuggingFace â· #general (100 messagesđ„đ„):
TTS narration, Open Models for narration, ML Course recommendations, Private LLM
- Chatters debate using TTS APIs versus LLMs for Narration: A user asked for the best open model to narrate a chapter from a book, and one member suggested that for 2k tokens, any TTS API would work for $0.001.
- Discordians recommend ML courses and PyTorch: One user asked for recommendations on ML/AI courses, citing Andrew Ngâs Machine Learning Specialization, the Hugging Face LLMs course, and FastAI Practical Deep Learning for Coders.
- Several members suggested skipping the video courses and instead suggested learnpytorch.io and implementing models in PyTorch from scratch to understand how they work conceptually.
- Faster hardware or smaller models suggested for Chatbot: A user looking for a partner to help with a custom LLM that makes 10 LLM calls and 20+ prompts was advised that the easiest way to get faster speeds is to use a bigger GPU or a smaller model.
- Quantization can increase speed at the cost of quality, batching calls together if you have enough constant throughput to fill a batch, and that the biggest gains are through smaller models, bigger hardware, and smaller quantizations.
- Take corpo stuff request is vague: A user wanted to get some of the corporate stuff out of the model, and one member responded to the vague request to consider reading API TOS and understanding the laws of space time and physics.
- The same member continued, What you seem to be asking for is a rainbow glitter unicorn fairy mermaid with wings who shoots sparkles from both ends.
HuggingFace â· #i-made-this (4 messages):
Go wrapper for tokenizers library, Canis.lab launch
- Go wrapper seeks maintainers: A member has written a Go wrapper for the tokenizers library and is seeking help to maintain and improve it.
- Canis.lab launched for tutor engineering: A member shared the Canis.lab launch video which is about dataset-first tutor engineering and small-model fine-tuning for education, which is open-source and reproducible.
- It also includes links to the GitHub repository and the Hugging Face page, also requesting feedback on data schema.
HuggingFace â· #computer-vision (1 messages):
Menu Translation, Gemini 2.5 Flash, Taiwanese Signage Menus, OCR for spaced characters
- Menu Translation App Meets Spaced Character Challenge: A developer is seeking advice on improving a menu translation app, Menu Please, when dealing with Taiwanese signage menus where characters are unusually spaced.
- The issue arises with Gemini 2.5 Flash failing to accurately translate menu items due to inconsistent character spacing in the images.
- Gemini Struggles with Kanban Character Spacing: The developer notes that the Gemini 2.5 Flash model struggles when translating Taiwanese signage menus (Kanban) due to inconsistent character spacing.
- The spacing between characters of the same menu item is often wider than between adjacent items.
- OCR Tricks: To solve this, the developer has already tried to provide few-shot examples of spaced characters in horizontal and vertical orientations to Gemini.
- They also attempted to guide the model to identify anchors like bullet points and prices, combined with reading direction, to determine item boundaries, using a provided image example.
HuggingFace â· #smol-course (2 messages):
Canis.lab, Synthetic Data, Eval Dataset issues
- Dataset troubles found by user: A member reported an issue where the eval dataset cannot be found, while looking inside HuggingFace datasets.
- The user mentioned the dataset
lighteval|gsm8k
specifically.
- The user mentioned the dataset
- Canis.lab workflow introduced for Synthetic Data: A member introduced Canis.lab, a lightweight, open-source workflow to blueprint, generate, and validate targeted datasets for small tutor models, sharing a launch video and the repo link.
- The member is looking for feedback, especially in the context of what the course aims to teach.
HuggingFace â· #agents-course (1 messages):
RAG Courses, Bangla Retrieval, Multimodal Support
- Member requests RAG course recommendations: A member asked for suggestions for a good RAG course, specifically for Bangla-based retrieval and multimodal support.
- Community Awaits RAG Course Suggestions: Other members are likely to chime in with recommendations for RAG courses tailored to Bangla retrieval and multimodal applications.
GPU MODE â· #general (14 messagesđ„):
Python Profiling, DeepGEMM Benchmarking, NCU Clock Control, GPU Kernel Downclocking
- Quest for a Good Python Profiling Plugin: A member is searching for a reliable Python profiling function, having tested DeepGEMM, Tritonâs
do_bench
, and NCU, noting inconsistencies in kernel timing across different tools like NCU and Kineto. - NCU Gets Clock Controlled: Setting
--clock-control none
with NCU made it agree withdo_bench()
better, resolving relative disagreements in kernel speeds; however, questions arose on whether fixed clock speeds accurately represent real-world GPU kernel performance.- It was noted that a YouTube video explains the topic well.
- NCU Downclocks Kernels: The member questioned why NCU downclocks some kernels and whether benchmarking with a fixed clock is representative.
- Another member suggested that fixed clock speed reduces benchmark variance and improves reproducibility, regardless of external factors like a hot day.
GPU MODE â· #cuda (3 messages):
mbarrier instructions, cuda::barrier, cuda::memcpy_async, inline PTX, CCCL
mbarrier
instructions detailed: Thembarrier.test_wait
is a non-blocking instruction which tests for the completion of the phase, whereasmbarrier.try_wait
is a potentially blocking instruction which tests for the completion of the phase.- If the phase is not complete, the executing thread may be suspended but resumes execution when the specified phase completes OR before the phase completes following a system-dependent time limit, according to Nvidia Documentation.
cuda::barrier
syncs copies and work: The default version (no.noinc
) ofcuda::barrier
assumes that you want to synchronize not only the copy but also any work you did with the threads in the meantime after starting the copies.- This is also used in
cuda::barrier
+cuda::memcpy_async
so the user still has to arrive on the barrier.
- This is also used in
- Skip inline PTX, use CCCL: You do not need to write inline PTX for most things, as CCCL covers most bases.
- You can even still work with
cuda::barrier
and get the underlyingmbarrier
withcuda::device::barrier_native_handle
.
- You can even still work with
GPU MODE â· #beginner (2 messages):
CUDA Documentation, Memory vs Compute Bound
- CUDA Docs Trump LLM CUDA-vice: Members affirm that for CUDA, the NVIDIA documentation remains the single source of truth, especially given that LLMs frequently generate incorrect information on CUDA.
- Therefore engineers should rely on the documentation, and not the LLMâs âhallucinationsâ.
- Bound to Memory or Compute?: To optimize CUDA, a member suggests calculating the number of values used (memory) and operations performed (FLOPS) to determine if the process is memory bound or compute bound.
- The member states: if it is memory bound, your SOL will be memory bandwidth; if it is compute bound, your SOL will be (max FLOPS per SM x count of SMs).
GPU MODE â· #off-topic (20 messagesđ„):
Slurm Reading Material, Sysadmin/Devops Channel, Kubernetes + Slurm + Docker, Flux from LLNL
- Slurm Docs Get Gold Star: A member asked for Slurm reading material and the response was to just read the docs, they were good.
- Another member stated they would be interested in a Slurm discussion as they are also trying to maintain such a cluster.
- Sysadmin/Devops Channel Debated: Members discussed the potential creation of a sysadmin/devops/scheduling channel for discussing complaints and Slurm cluster maintenance.
- One member said it would be cool to see what people do with it.
- Kubernetes, Slurm, and Docker Convergence: Members proposed combining Kubernetes, Slurm, and Docker, noting the possibility of integrating Docker and Slurm.
- They linked to Coreweaveâs documentation on running Slurm on Kubernetes, but one member said k8s is too much yaml i dont want to touch it.
- Flux Framework Floated: A member introduced Flux from LLNL, a job orchestration/resource management framework for clusters found here.
- They noted Flux isnât as popular as Slurm due to being newer and HPC-focused.
GPU MODE â· #self-promotion (7 messages):
CuTe Layout Algebra, Colfax Team Paper, Categorical treatment, WMMA/MMA instruction, NVRTC MMA
- Layout Gymnastics Blogpost Launches!: Simon Veitner released a blog post detailing his manual derivation of examples from Chapter 2 of the CuTe Layout Algebra paper by the Colfax Team, covering operations like Coalescing, Completion, and Composition.
- Colfax Paper Explored with Layout Gymnastics!: Veitnerâs post serves as a companion for readers working through the original Colfax paper which is a full mathematical treatment of Layout Algebra.
- MMA Instruction Post Coming Soon!: One member mentioned he is studying WMMA, MMA, and WGMMA instructions with a potential blog post in the future, focusing on topics âthat are hard to approach and that isnât covered by many resourcesâ.
- NVRTC MMA Instructions Explored: A blog post about using NVRTC to explore MMA instruction variants was shared, linked to gau-nernstâs blog.
GPU MODE â· #avx (2 messages):
AVX512, BPE, Tiktoken, Huggingface, Data Loading Optimization
- AVX512 BPE Implementation Sought for Speed: A member is seeking an AVX512 implementation of BPE (Byte Pair Encoding) because Tiktoken is slow AF and the Hugging Face implementation is latency bound, significantly slowing down data loading.
- Tiktoken and Hugging Face BPE Performance Issues: The user reports that Tiktokenâs speed is unsatisfactory, while Hugging Faceâs BPE implementation suffers from latency issues, impacting overall data loading performance.
GPU MODE â· #edge (2 messages):
Cubesat hardware, Cubesat software, Error Correction, Redundancy, RasPi Cubesats
- RasPi Powers Amateur Cubesats: Amateur cubesats are built with RasPi and work quite well, according to a member, highlighting their effectiveness in space applications and mentioning the Jeff Geerling blogpost.
- The discussion covered the reliability and suitability of using Raspberry Pi in educational satellite projects.
- Cubesat Project Success: A member discussed their work on the software operations for ground systems for the Qube Project launched last year, highlighting the practical application of cubesat technology.
- They focused on ground systems software operations.
- Redundancy via Master-Slave Architecture: The channel talked about having redundant modules for each core feature/ master-slave in the cubesats.
- These would reset based on error-correction checks.
GPU MODE â· #submissions (2 messages):
MI300x8, amd-gemm-rs leaderboard
- MI300x8 personal best: A user achieved a personal best on MI300x8: 575 ”s.
- The submission id is 43091 on the
amd-gemm-rs
leaderboard.
- The submission id is 43091 on the
- MI300x8 successful run: A user had a successful run on MI300x8: 589 ”s.
- The submission id is 43133 on the
amd-gemm-rs
leaderboard.
- The submission id is 43133 on the
GPU MODE â· #status (1 messages):
Runner Issues, Timeouts, Debugging with AMD and DigitalOcean
- Runner Hiccups Cause Timeout Tumult: The team is experiencing issues with their runners, leading to unexpected timeouts.
- They are actively debugging the problem in collaboration with AMD and DigitalOcean, and promise to provide updates as they work towards a resolution.
- Debugging Underway with AMD and DigitalOcean: The team is actively debugging issues with their runners, collaborating with AMD and DigitalOcean to resolve unexpected timeouts.
- Updates will be provided as they work towards a solution.
GPU MODE â· #factorio-learning-env (3 messages):
GEPA, Deepseek Neel eval
- GEPA integration debated for v0.0.3: Members discussed integrating GEPA before releasing version 0.0.3 of their project.
- One member suggested it would be a nice addition, while another cautioned against letting it delay the release due to its potentially open-ended exploration.
- Deepseek Neel evaluation on the table: A member inquired about running an evaluation on Deepseek Neel, providing a link to the model on Hugging Face.
- No further details were provided.
GPU MODE â· #amd-competition (33 messagesđ„):
MI300X Environment, Docker Image for Benchmarks, GEMM Submission Timeout, Cluster Health Issue, All2All Custom Kernel Data Access
- MI300X Environment Specs Plotted: Members discussed defining the environment for testing MI300X, suggesting that any place supporting 8x MI300X should be adequate, with AMD DevCloud or HotAisle as potentially the cheapest options.
- It was emphasized that replicating the exact testing environment, including Python, Torch versions, and other dependencies, is critical for 1:1 testing, linking to the AMD Dockerfile used for benchmarks.
- AMD Docker Image Proffered for Benchmarks: A member pointed out that the exact Docker image used for benchmarks can be fetched, noting that AMD is not finicky with performance counters in Docker, and linked the AMD Dockerfile.
- It was noted that while the image is published, its location is unknown, and HotAisle being bare metal allows for easy building on the machine, with Runpod also mentioned as a viable option.
- GEMM Submission Times Out!: A user reported a timeout issue with the submission cluster for GEMM, even with the reference kernel.
- It was suggested that the submission code be modified to allow multiprocess per same GPU for correctness, and to use git for syncing and AMD Dev Cloud for saving snapshots, but others pointed out submissions have been recent, and a cluster health issue may be responsible for the timeout.
- Cluster Health Falters: Members indicated a likely cluster health issue causing submission timeouts, with the team awaiting assistance from AMD to resolve it.
- Despite the issue, a member expressed appreciation for the overall setup, frontend, and CLI, acknowledging the difficulty and time consumption of hosting a contest.
- All2All Kernel Seeks Global Data: A question arose regarding how much information the
custom_kernel()
inall2all
has or can access about the whole inference cluster.- Specifically, whether a rank has a global view regarding how much data gets sent and received among all other ranks, especially since gpumode.com mentions all_rank_data, which wasnât seen in the code.
GPU MODE â· #cutlass (8 messagesđ„):
Shape Compatibility, CUTE documentation, PTX Diagrams
- CUTE Shape Compatibility Deep Dive: A member inquired about shape compatibility in the CUTE layout docs, specifically regarding
Shape (24)
andShape 24
, with another member clarifying thatshape 24
andshape (24)
are conceptually the same, but the parentheses limit compatibility.- Compatibility is an antisymmetric notion:
S compatible with T and T compatible with S implies S = T
, with the termS refines T
meaning T is compatible with S. For example,(24)
refines24
because24 = size((24))
.
- Compatibility is an antisymmetric notion:
- Indexing Shapes in CUTE: A member asked if the shape compatibility requirements in the CUTE documentation meant that all coordinates within A are valid coordinates within B.
- Another member confirmed that the valid coordinates of
(24)
are(0), (1), (2)...
, while for24
they are0, 1, 2, 3...
, so integers can index into(24)
but not vice versa.
- Another member confirmed that the valid coordinates of
- Seeking CUTE Code for PTX Diagrams: A member asked where the CUTE code for generating the PTX diagrams could be found, with another member providing possible leads.
- They suggested looking into
print_latex
,print_layout
, and layouts from the wgmma shared memory section of the PTX docs.
- They suggested looking into
GPU MODE â· #singularity-systems (2 messages):
Eager Mode, Graph Mode, Tinygrad's IR, Tensor Sugar, Torch vs. Jax
- Tinygrad Embraces Dual-Engine Approach: Tinygrad will feature two engines: an eager mode (
eagerly_eval
) with hand-written kernels and a graph mode (lazily_compile
), both reusing Tinygradâs IR.- The
tensor
will serve as syntactic sugar for the UOp graph, indicating a departure from pure Python implementation.
- The
- Tinygrad Avoids Torchâs Pitfalls: A member expressed agreement with the dual-engine approach, suggesting that Torchâs failure to separate eager and graph modes continues to cause issues.
- They further noted that Jaxâs focus on a single approach has contributed to its success.
GPU MODE â· #cluster-management (8 messagesđ„):
GPU Reservations, Slurm and Docker, Singularity vs Docker, llm-d.ai for cluster management
- GPU Reservations Causing Headaches: Developers are struggling with GPU reservations due to limited on-prem resources, resorting to manual messaging for allocation.
- While dstack was considered, the lack of sufficient GPUs makes it infeasible, pushing the team towards Slurm for its fractional GPU support.
- Slurm and Docker Cause Cluster Chaos: Integrating Slurm with Docker is proving to be a challenge, leading the team to favor Singularity for containerization within the cluster.
- The primary concern is security, as Singularity avoids the root privileges associated with Docker.
- Singularity Syntax Spurs Skepticism: A member expressed frustration with Singularityâs syntax, questioning why it doesnât align with the more familiar Docker syntax.
- The speaker posited that Singularity runs containers without a daemon unlike Docker, which may have to do with resource calculations/budgeting.
- llm-d.ai touted as treasure: A member suggested exploring llm-d.ai, indicating its suitability for managing LLM workloads in the cluster.
- The project is likely relevant to the ongoing discussions around resource allocation and containerization.
Latent Space â· #ai-general-chat (45 messagesđ„):
Meta's ARE and Gaia2, Cline's Agentic Algorithm, Greptile's $25M Series A, Cloudflare's VibeSDK, GPT-5-Codex Release
- Meta Launches ARE and Gaia2 for Dynamic Agent Evaluation: Meta SuperIntelligence Labs released ARE (Agents Research Environments) and Gaia2, a benchmark for evaluating AI agents in dynamic scenarios.
- ARE simulates real-world conditions where agents adapt in real-time, unlike static benchmarks that solve set puzzles.
- Clineâs Algorithm Distilled into Simple States: Ara distilled Clineâs agentic algorithm into a 3-state state machine: Question (clarify), Action (explore), Completion (present).
- The key to success is a simple loop + good tools + growing context.
- Greptile Snags $25M for Bug-Killing AI Reviewer v3: Greptile closed a $25M Series A led by Benchmark and launched Greptile v3, an agent architecture that catches 3Ă more critical bugs than v2, already used by Brex, Substack, PostHog, Bilt and YC.
- New features include Learning (absorbs team rules from PR comments), MCP server for agent/IDE integration, and Jira/Notion context.
- Cloudflare Opens Doors to âVibe Codingâ with VibeSDK: Cloudflare announced VibeSDK, an open-source âvibe codingâ platform enabling one-click deployment of personalized AI development environments.
- It includes code generation, a sandbox, and project deployment.
- GPT-5-Codex Arrives, Developers Weigh Cost vs. Limit: OpenAI released GPT-5-Codex via the Responses API and Codex CLI, spurring excitement but also concerns about cost and rate limits, priced at $1.25 input, $0.13 cached, $10 output.
- Requests pour in for Cursor/Windsurf integration, GitHub Copilot support, and lower output costs.
Latent Space â· #genmedia-creative-ai (4 messages):
Foo Fighters, Artists using AI
- Foo Fighters Post Teases AI Use?: The Foo Fighters shared a YouTube video sparking speculation on how artists might use AI, even if in a tongue-in-cheek manner.
- AI in artistic expression: Discussion revolves around the evolving role of AI in creative fields, particularly how musicians might playfully integrate AI into their work.
Yannick Kilcher â· #general (2 messages):
Paper Reading Events, Yannick's Reading List
- Paper Reading Events timing: A member inquired whether paper reading events are announced in advance.
- Yannickâs Reading List: A member inquired what Yannick is going to read this weekend.
Yannick Kilcher â· #paper-discussion (17 messagesđ„):
Diffusion ODE Solver, MiMo-Audio, Diversity is all you need
- Diffusion ODE Solver Achieves Speed and Quality Boosts: An independent researcher developed a novel ODE solver for diffusion models that achieves 8-step inference beating DPM++2mâs 20-step inference in FID scores without additional training, as detailed in their paper and code.
- MiMo-Audio: Audio Language Models as Few-Shot Learners: Members discussed MiMo-Audio and their technical report, âAudio Language Models Are Few Shot Learnersâ, highlighting its capabilities in S2T, S2S, T2S, translation, and continuation, as showcased in the demos.
- âDiversity is all you needâ Paper Presentation Proposed: A member proposed presenting the paper âDiversity is all you needâ and encountered voice call issues on Discord.
Yannick Kilcher â· #ml-news (12 messagesđ„):
Gaia2, Meta Agents Research Environments (ARE), GPT5 Models, Cloudflare Vibesdk, Compilebench
- Gaia2 and ARE empower agents eval: Meta introduces Gaia2, the follow-up to the agentic benchmark GAIA, for analyzing complex agent behaviors, released with the open Meta Agents Research Environments (ARE) framework under the CC by 4.0 and MIT licenses.
- ARE simulates real-world conditions to debug and evaluate agents, addressing limitations of existing environments that lack real-world flexibility.
- GPT5âs true form factor remains unknown: In the ml-news channel, a user questioned whether GPT5 low and GPT5 high are different models.
- A member responded itâs unknown but suggested it might be similar to their OSS model, where reasoning effort is adjusted by changing the context, or they could be different finetunes from base.
- Cloudflare releases Vibesdk: A member shared a link to Cloudflareâs new Vibesdk.
- No further discussion was given.
- Introducing Compilebench: A member shared a link to a blog post about Compilebench.
- No further discussion was given.
LM Studio â· #general (21 messagesđ„):
LM Studio Model Support, GGUF/MLX Models, Qwen-3-omni, Google Gemini Free Tier
- LM Studio Supports Limited HF Models: New users asked if all HuggingFace models are available on LM Studio and whether models are validated by the team.
- A member clarified that only GGUF (Windows/Linux/Mac) and MLX Models (Mac Only) are supported, and excluded image/audio/video/speech models.
- LM Studioâs Model Search: A user searched for the facebook/bart-large-cnn model and asked how to verify if models are GGUF or MLX.
- A member confirmed that the model is unsupported in LM Studio and that Qwen-3-omni support depends on llama.cpp or MLX compatibility.
- Deep Dive into Qwen-3-Omni: A member stated that Qwen-3-omni, which handles text, images, audio, and video, would take a very long time to support.
- Another member noted that the text layer is standard, but the audiovisual layers involve lots of new audio and video decoding stuff.
- Google Gifts Gemini to Students: A member shared that Google offers a year of Gemini for free to college students.
- They added, I use it free daily so getting premium for free is nice.
LM Studio â· #hardware-discussion (2 messages):
Innosilicon GPU, DirectX12 Support, Ray Tracing Hardware
- Innosilicon unveils Fenghua 3 GPU: Innosilicon has revealed its Fenghua 3 GPU, which features DirectX12 support and hardware ray tracing capabilities according to Videocardz.
- Local LLaMA Reddit Post: A user shared a link to a Reddit post in r/LocalLLaMA.
aider (Paul Gauthier) â· #general (11 messagesđ„):
Response API Support, GPT-5-Codex Integration, aider and litellm
- Aider Adds Response API Support for GPT-5-Codex!: A member added support for the Responses API to aider, validated with the GPT-5-Codex model, and created a pull request for review.
- This integration addresses the issue where GPT-5-Codex, lacking completions support, failed with aider on the official endpoint, necessitating the use of OR for backward compatibility.
- Aiderâs litellm Dependency Supports GPT-5?: A member inquired whether something different was needed given that Aider already works with other Responses models via litellm.
- Another member clarified that aider relies on litellm completions, which had a fallback mechanism for handling responses endpoints, but GPT-5-Codex lacks this fallback, prompting the need for explicit Responses API support.
- GPT-5 Now Requires Responses Endpoint: A member reported getting an error indicating that GPT-5-Codex is only supported in
v1/responses
and not inv1/chat/completions
.- This implies that unlike previous models, GPT-5-Codex exclusively uses the Responses API, necessitating updates to handle this specific endpoint.
aider (Paul Gauthier) â· #questions-and-tips (8 messagesđ„):
aider ollama setup, Aider reads MD file, Context Retransmitted, Prompt Caching
- Users seek guidance on using Aider with Ollama: A user is seeking guidance on how to make aider read their MD file with the AIâs purpose when using Ollama.
- The user tried the command
aider --read hotfile.md
but it didnât work as expected.
- The user tried the command
- Users want to revert to previous steps: A new user inquired about how to revert to a previous step after using the
/ask
command multiple times.- A member suggested manually copying the desired context, using
/clear
, and then pasting the copied context with the new question.
- A member suggested manually copying the desired context, using
- Context is retransmitted with every request: A user noticed that the context is retransmitted with every chat request when aider is in verbose mode, and they questioned whether this is inefficient.
- A member confirmed this behavior, stating that itâs standard and that many APIs use prompt caching to mitigate costs, while noting that aider gives the user control over what to include in context.
- Aider sorts file context alphabetically: A user pointed out that aider sorts file context alphabetically instead of preserving the added order.
- They made a PR for that but gave up since nothing is merged anymore.
Modular (Mojo đ„) â· #general (18 messagesđ„):
RISC-V Performance, Tenstorrent's MMA accelerator + CPU combos, RISC-V 32-bit and 64-bit, RISC-V Bringup, RISC-V ISA
- RISC-V Performance Lags Behind Phone Cores: Members discussed that RISC-V cores are currently almost universally slower than the cores in modern smartphones, except perhaps microcontroller SoCs.
- One member noted that the fastest RISC-V device they encountered was still slow enough that someone cross-compiled SPECint from an UltraSPARC T2 because it was faster than a native compilation.
- Tenstorrent Hopes to Close RISC-V Performance Gap: A member mentioned Tenstorrentâs MMA accelerator + CPU combos as a potential solution and also mentioned that Tenstorrentâs âtinyâ cores are very small in-order cores used to drive 140 matrix/vector units.
- The same member noted that Tenstorrentâs Ascalon cores are the best hope for changing the RISC-V performance landscape in the next 5 years.
- RISC-V Bringup Challenges: A member shared that RISC-V 64-bit is vaguely functional, but needs a lot of bringup work, and also canât use vectors.
- Another member explained that any chain of
if-elif-else
statements that uses architectures needs to have RISC-V added, and much needs to be locked behind arequires
which doesnât exist in the language yet.
- Another member explained that any chain of
OpenAI â· #annnouncements (1 messages):
Stargate Sites, Oracle, SoftBank, 10-Gigawatt Commitment
- OpenAI Announces Five New Stargate Sites: OpenAI announced five new Stargate sites in partnership with Oracle and SoftBank, advancing their 10-gigawatt commitment ahead of schedule.
- Details can be found in their blog post.
- Stargate Project gains Momentum: The collaboration with Oracle and SoftBank is accelerating the deployment of compute resources for OpenAI.
- This puts the project ahead of its originally planned schedule for achieving the 10-gigawatt target.
OpenAI â· #ai-discussions (14 messagesđ„):
Codex Fallback, Sora Issues, Ternary System Study, Github Copilot Alternative, kilocode
- Codex Lacks Model Fallback: A user asked if Codex has a model fallback feature, similar to switching to gpt-5-mini after exhausting GPT-5 usage.
- No confirmation or denial was offered, but the community did not seem to think so.
- Soraâs Got Video Generation Snags: A user inquired about the timeline for fixing issues in Soraâs video generation capabilities.
- No response was provided in the chat log, but the community seems aware of issues related to the product.
- VSCode Copilot competitor incoming?: A user expressed interest in an OpenAI-made âGithub Copilotâ extension for VSCode and IDEs.
- Despite knowing about Codex CLI, the user appreciates Github Copilotâs code snippet suggestions and would switch if OpenAI offered a similar product.
- GPT-5-Minimal Model Assessed: According to this image, the GPT-5-Minimal model performed worse than Kimi k2, but High is the best overall for agentic use cases.
- One user clarified that High (only via API) < Medium < Low < Minimal < Fast/Chat (non-thinking).
- Models assigned to different agent roles in kilocode: Members mentioned that users are assigning different models to different agent roles in kilocode.
- One user pointed to a blog post about Codex IDE in VSCode being less than a month old.
OpenAI â· #prompt-engineering (1 messages):
GPT4o Translations, Chain of Thought
- GPT4o Translation Quality Suffers From Chain of Thought: A member found that GPT4oâs translation quality decreases when using a chain of thought prompt compared to a direct translation prompt.
- The member shared the prompt used: When user paste something in other language that isnât english, Identify the language, then: - {do a 3 short bullet point as a chain of thought} {Your translation goes here}
- Direct Translation beats Chain of Thought Translation for GPT4o: The user experimented with GPT4o as a translator using a chain of thought prompt.
- The result was that direct translation without the chain of thought yielded better quality results.
OpenAI â· #api-discussions (1 messages):
GPT4o translation, Chain of thought in translation
- GPT4o translation quality degrades with chain of thought: A member observed that asking GPT4o to perform a chain of thought before translating text results in lower quality translations compared to direct translation.
- The user shared a specific prompting strategy which asks GPT4o to identify the input language and outline a three-step thought process before providing the translation, but found this method to be less effective.
- Direct Translation Outperforms Chain-of-Thought for GPT4o: A user tested GPT4o as a translator, comparing direct translations with those preceded by a chain-of-thought prompt.
- The results indicated that GPT4oâs direct translations were superior in quality and adaptation compared to those using the chain-of-thought approach.
DSPy â· #show-and-tell (4 messages):
DSPy profiles, dspy-profiles, LLM behavior
- DSPy gets profiles package for configuration: A member announced the release of dspy-profiles, a lightweight package for DSPy that manages configurations with toml, enabling quick setup swaps and tidy projects, also published to Xitter.
- The tool allows easy switching of LLM behavior with a single command, and is available as decorators and context managers, aiming to eliminate context boilerplate.
- Configurations for Different Environments: One member expressed excitement about dspy-profiles, inquiring whether many projects benefit from varied configurations.
- The author mentioned managing dev/prod environments as the initial motivation and indicated that it now facilitates better context switching.
DSPy â· #general (8 messagesđ„):
GEPA Multimodality Performance Issue, Passing images and PDFs into DSPy, VLMs for Data Extraction, OCR Approaches for Data Extraction, Best PDF or Image Parsing Stuff
- GEPA Multimodality Plagued by Performance Problems: A member reported a severe performance issue with GEPA Multimodality, linking to a related GitHub issue.
- The user indicated that their use case requires catering to multiple users.
- Passing PDFs & Images into DSPy Explored: A member inquired about passing images or PDFs into DSPy for data extraction.
- Another member pointed out that one can pass images into DSPy with this dspy.ai API primitive.
- VLMs and OCR debated for data extraction: One user asked if VLMs might be better than LLMs for extracting chart information from images and PDFs.
- Another member noted they did not know if OCR approaches are better for data extraction, while another mentioned that you can pass a VLM via dspy.LM for this.
- Query for the Best PDF and Image Parsers: A member asked for suggestions for the best PDF or image parsing tools.
- No specific suggestions were provided in the messages.
DSPy â· #examples (5 messages):
Prompt Optimization, GEPA, AI Safety Research, Trusted Monitor, Comparative Metric with Feedback
- Prompt Optimization Enables AI Control Research: A member published a post, Prompt optimization can enable AI control research, explaining how they used DSPyâs GEPA to optimize a trusted monitor.
- They then evaluated it using inspect and the code can be found here: dspy-trusted-monitor.
- Comparative Metric Boosts GEPA Performance: A member introduced a comparative metric with feedback, passing one positive and one negative sample through the classifier at a time, and scored the pair based on whether the positive sample score was greater than the negative sample score.
- This allowed the reflection LM to learn the right signals for the classifier and create a robust optimized prompt.
- GEPA Readme Links to Trusted Monitor Project: A member thanked the other for including the project on the GEPA readme and is interested in doing a short writeup about the comparative metric itself.
- The other responded that theyâd love to do a writeup on the comparative metric, and are curious if this is a robust strategy for classification.
tinygrad (George Hotz) â· #general (12 messagesđ„):
High-Level IRs like Triton, Multi-Layer IR Stack, Hardware-Incomplete vs Complete IRs, Search and Learning in Compilers, Graph-Based Models for Compilers
- Tritonâs Abstraction Level Spurs Debate: Discussion highlights the benefits of high-level IRs like Triton, but also points out the need for a multi-layer stack to interface with lower-level hardware.
- The Gluon project is mentioned, with a desire for it to interoperate with Triton, though its current Nvidia-specific nature is a limitation.
- Single IR Inadequacy Acknowledged: The consensus is that a single high-level IR is insufficient for all users and use-cases, citing the divergent needs of PyTorch users seeking speedups versus those optimizing mission-critical HPC projects.
- This is because there is not really going to be this goldilocks zone where the abstraction level of the IR is just right for all users and use-cases.
- UOps Leverage Bitter Lesson: Tinygradâs vision involves leveraging the bitter lesson to combine the benefits of incomplete and complete IRs, using UOps as a hardware-incomplete representation.
- The goal is to search over the space of rendered programs that implement the UOps to find the fastest one.
- Search and Neural Compilers Highlighted: Emphasis is placed on the importance of search and neural compilers, with a particular interest in GNNs or other graph-based models.
- The suggestion is to create a multi-stage compiler that utilizes graph-based models per stage.
Nous Research AI â· #general (6 messages):
TRL Assessor, Nous Tek
- TRL Assessor Inquiry: A member inquired about a TRL (Technology Readiness Level) assessor and whether itâs worthwhile to red team their own stack using a new ecosystem.
- Two other members suggested moving the conversation to a specific channel, <#1366812662167502870>.
- âNous Tekâ Praise: A member wrote âNous tekâ which is a positive affirmation.
- Another member immediately offered to answer questions.
Nous Research AI â· #ask-about-llms (6 messages):
Distributed Learning, Code Genetics, Model Non-Homology
- Distributing the Training Load: A member inquired about the feasibility of training an AI model using distributed learning across multiple VPSs, leveraging resources like Kubernetes and Google Cloud.
- They were interested in using the setup to accelerate training cycles with datasets derived from operational data, along with concerns about safety rails for hardware management.
- Code Genetics and Model Parameter Tuning: A member explored using code genetics via OpenMDAO to automate adjustable parameters and Terraform for infrastructure control.
- They questioned the necessary audit systems and methods for vetting synthetic data, aiming to influence parameters of models already in use, as opposed to techniques like Nate Lora.
- Model Homology Concerns: A member explained that after pretraining to a stable loss value, models fix tokenized structures, creating a solid âworld stateâ that is hard to shift without collapsing the structure.
- They noted that while fine-tuning can generate task vectors around a manifold, comparing datasets requires a common base, as models become non-homologous otherwise.
Eleuther â· #general (3 messages):
AI Behavioral Coherence, Mathematical AI Constraints, Davinci Architecture
- Researcher Probes AI Behavioral Coherence: An independent researcher is developing mathematical frameworks for AI behavioral coherence, aiming for real-time semantic control over language models without retraining.
- The current work focuses on validating cross-model consistency and exploring how mathematical constraints can enhance AI system interpretability.
- Davinciâs Architecture Revealed: A member stated that Davinci is essentially GPT-2âs transformer architecture but with locally-banded dense and sparse attention patterns and a 4x FFN.
- This information is available in the GPT-3 paper, according to another member.
Eleuther â· #research (8 messagesđ„):
Zero Knowledge Proofs, SwiGLU up-projection, Model Tampering Defenses
- ZKML for Model Integrity?: A member suggested using Zero Knowledge Proofs (ZKML) to allow inference providers to prove they havenât finetuned/replaced/lowered the quality of the model, or prove that the training process only used certain data.
- They noted that it is very slow right now.
- SwiGLU âDefenseâ Against Finetuning: One member suggested making a model non-finetuneable post-hoc by multiplying random terms in the SwiGLU up-projection by large values and applying the inverse values in the down-projection.
- They claimed everyoneâs standard AdamW recipe will fail and they will be too lazy to fix it, while even working with default quantization recipes.
- Model Tampering Defenses: A member argued that the possibility of mitigating concerns about releasing models by making them harder to fine-tune is an open technical problem, not something that can be determined a priori.
- They also mentioned that their recent paper has increased tamper resistance by 3 OOMs over prior work.
Moonshot AI (Kimi K-2) â· #general-chat (3 messages):
pydantic-ai lib
- Pydantic-AI Library Plug and Play Component: A member suggested using the pydantic-ai library due to its neat implementation of a specific flow.
- They said that it includes a plug-and-play component that can accomplish the task in approximately 10 lines of code.
- Example Topic: This is another topic.
- Details about the topic.
Windsurf â· #announcements (2 messages):
GPT-5-Codex, Figma MCP server, Windsurf update, Remote Figma integration
- GPT-5-Codex Lands on Windsurf!: The new GPT-5-Codex model from OpenAI is now live in Windsurf, and is impressing users with longer running and design related tasks, as per this announcement.
- Itâs free for paid users for a limited time, while free tier users can access it at 0.5x credits, so remember to reload Windsurf to see it!
- Official Figma MCP server launched!: A new official Figma MCP server is now available in the Windsurf MCP store, and is discussed in this post.
- Users can now paste Figma links directly into Windsurf with the new and improved integration which doesnât require the Figma desktop app.
- Migrate to New Figma MCP Server!: Users of the previous Figma Dev Mode MCP server are advised to install the new official Figma MCP server.
- This migration ensures access to Figmaâs new remote MCP server, enabling better integration with Windsurf.
MCP Contributors (Official) â· #mcp-dev-summit (2 messages):
MCP Dev Summit, Apify & Jentic Happy Hour
- Apify & Jentic Host Happy Hour: Apify & Jentic are hosting a happy hour, find details on the Luma website.
- A member plans to attend both happy hour events.
- Dev Summit Tickets Running Out Fast: The Dev Summit is expected to sell out in about two days, similar to last time when tickets sold out a week before the event.
- If youâre considering attending, grab your tickets now!