Gated Attention is all you need?
AI News for 9/10/2025-9/11/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (187 channels, and 4884 messages) for you. Estimated reading time saved (at 200wpm): 414 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Since Noam Shazeer et al invented them in his annus mirabilis, MoE models have steadily increased in importance through GPT4 and Mixtral(8 experts). DeepSeek (160 experts), Snowflake (128 experts) and others then pushed the sparsity even further, and today it is fair to say that no frontier model is served without being an MoE (we have outright confirmations from Gemini, whereas the rest are strong rumors.)
Todayâs Qwen3-Next release pushes model sparsity even further - the industry has switched from âexpert countâ to total param vs active param ratio - and 3.75% (3B / 80B = 3.75%) is appreciably lower than GPT-OSSâ 4.3% and Qwen3âs own prior 10%.
According to them:
Ultra-Sparse MoE: Activating Only 3.7% of Parameters
Qwen3-Next uses a highly sparse MoE design: 80B total parameters, but only ~3B activated per inference step. Experiments show that, with global load balancing, increasing total expert parameters while keeping activated experts fixed steadily reduces training loss. Compared to Qwen3âs MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert â maximizing resource usage without hurting performance.
But for the ML folks, the probable bigger win is the strict pareto win seen in pretraining:
The authors credit a few architecture advancements:
- Hybrid Architecture: Gated DeltaNet + Gated Attention: We found that the attention output gating mechanism helps eliminate issues like Attention Sink and Massive Activation , ensuring numerical stability across the model.
- new Layer Norm: In Qwen3, we use QK-Norm, but notice some layer norm weights become abnormally large. To fix this and further improve stability, Qwen3-Next adopts Zero-Centered RMSNorm, and applies weight decay to norm weights to prevent unbounded growth.
- better MoE selection: normalize MoE router parameters during initialization , ensuring each expert is unbiasedly selected early in training â reducing noise from random initialization.
AI Twitter Recap
Alibabaâs Qwen3-Next hybrid architecture and early ecosystem support
- Qwen3-Next-80B-A3B: Alibaba released a new hybrid MoE family that routes only ~3B parameters per token while using 80B total (512 experts; 10 routed + 1 shared), combining Gated DeltaNet + Gated Attention, optimized multi-token prediction, and Zero-Centered RMSNorm with weight decay. Trained on ~15T tokens, it claims ~10Ă cheaper training and 10Ă faster inference than Qwen3-32B at long contexts, with the âThinkingâ variant reported to outperform Gemini-2.5-Flash-Thinking and the Instruct variant approaching their 235B flagship. Announcement and model links: @Alibaba_Qwen, NVIDIA API catalog. Architectural context and release rationale: @JustinLin610. Technical notes highlighting gated attention/DeltaNet, sparsity and MTP details: @teortaxesTex.
- Deployments and toolchain: Served in BF16 at Hyperbolic on Hugging Face with low-latency endpoints (@Yuchenj_UW, follow-up). Native vLLM support (accelerated kernels and memory management for hybrid models) is live (vLLM blog). Baseten provides dedicated deployments on 4ĂH100 (@basetenco). Available on Hugging Face, ModelScope, Kaggle; try it in the Qwen chat app (see @Alibaba_Qwen).
Image generation and OCR: ByteDance Seedream 4.0, Florence-2, PaddleOCRv5, Points-Reader
- Seedream 4.0 (ByteDance): New T2I/Image Edit model merges Seedream 3 and SeedEdit 3 and is live on the LM Arena (@lmarena_ai). In independent tests, it tops Artificial Analysisâ Text-to-Image leaderboard and reaches parity/leadership in Image Editing against Googleâs Gemini 2.5 Flash (a.k.a. Nano Banana), with improved text rendering, at $30/1k generations, available on FAL, Replicate, BytePlus (@ArtificialAnlys). LM Arena now supports multi-turn image-edit workflows (@lmarena_ai).
- OCR stack updates:
- PP-OCRv5: A modular, 70M-parameter OCR pipeline (Apache-2.0) designed for accurate layout/text localization on dense docs and edge devices, now on Hugging Face (@PaddlePaddle, @mervenoyann).
- Points-Reader (Tencent, 4B): OCR trained on Qwen2.5-VL annotations + self-training; outperforms Qwen2.5-VL and MistralOCR on several benchmarks; model + demo on HF (@mervenoyann, model/demo links).
- Florence-2: Fan-favorite VLM is now officially in transformers via the florence-community org (@mervenoyann).
- Precision inpainting: InstantXâs Qwen Image Inpainting ControlNet (HF model + demo) for targeted, high-quality edits (@multimodalart).
Developer platforms: VS Code + Copilot, Hugging Face speedups, vLLM hiring
- VS Code v1.104: Major Copilot Chat upgrades (better agent integration, Auto mode for model selection, terminal auto-approve improvements, UI polish) and official support for AGENTS.md to wrangle rules/instructions (release, AGENTS.md origin). New BYOK extension API enables direct provider keys.
- Open models inside Copilot Chat: Hugging Face Inference Providers are now integrated into VS Code, making frontier OSS LLMs (GLM-4.5, Qwen3 Coder, DeepSeek 3.1, Kimi K2, GPT-OSS, etc.) one click away (@reach_vb, guide, @hanouticelina, marketplace).
- Transformers performance work: The GPT-OSS release arrived with deep performance upgrades in transformersâMXFP4 quantization, prebuilt kernels, tensor/expert parallelism, continuous batching, with benchmarks and reproducible scripts (@ariG23498, blog, @LysandreJik).
- vLLM momentum: Thinking Machines is building a vLLM team to advance open-source inference and serve frontier models; reach out if interested (@woosuk_k).
Agent training and production agents: RL, tools, HITL, and benchmarks
- AgentGym-RL (ByteDance Seed): A unified RL framework for multi-turn agent training across web, search, games, embodied, and science tasksâno SFT required. Reported results: 26% web navigation vs. GPTâ4oâs 16%, 38% deep search vs. GPTâ4oâs 26%, 96.7% on BabyAI, and a new record 57% on SciWorld. Practical guidance: scale post-training/test-time compute, curriculum on trajectory length, prefer GRPO for sparse long-horizon tasks (thread, abs/repo, notes, results).
- LangChain upgrades:
- Human-in-the-loop middleware for tool-call approval (approve/edit/deny/ignore) built on LangGraphâs graph-native interruptsâproduction-ready HITL with a simple API (intro).
- Making Claude Code domain-specialized via better system docs/context beats raw docs access; detailed methods for running agents on frameworks like LangGraph (blog, discussion, case study: Monte Carlo).
- Benchmarks and eval fixes: SWE-bench bug enabling âfuture-peekingâ was fixed; few agents exploited it and headline trends remain unaffected (@OfirPress, follow-up). BackendBench is now on Environments Hub (@johannes_hage).
- Online RL at scale: Cursorâs new Tab model uses online RL to cut suggestions by 21% while raising accept rate by 28% (@cursor_ai).
Speech, audio, and streaming seq2seq
- OpenAI Evals for audio: Evals now accept native audio inputs and audio graders, enabling evaluation of speech responses without transcription (@OpenAIDevs). GPTâRealtime now leads the Big Bench Audio arena at 82.8% accuracy (native speechâtoâspeech), closing on the 92% pipeline (Whisper â text LLM â TTS), while retaining latency advantages (@ArtificialAnlys).
- Kyutai DSM: A âdelayed streamsâ streaming seq2seq built with a decoder-only LM plus pre-aligned streams, supporting ASRâTTS with fewâhundredâms latency, competitive with offline baselines, infinite sequences, and batching (overview, repo/abs).
Systems and infra: MoE training, determinism trade-offs, and comms stack
- HierMoE (training efficiency for MoE): Hierarchy-aware AllâtoâAll with token deduplication and expert swaps reduces inter-node traffic and balances loads. On a 32âGPU A6000 cluster, reported 1.55â3.32Ă faster AllâtoâAll and 1.18â1.27Ă endâtoâend training vs. MegatronâLM/Tutelâ2DH/SmartMoE; gains increase with higher topâk and across nodes (@gm8xx8).
- Determinism vs. performance: A lively discussion revisits sources of inference nondeterminism and whether ânumerical determinismâ is worth large latency hits. Key takeaways: atomicAdd isnât the whole story for modern stacks; determinism can be critical for sanity tests, evals, and reproducible RL; textâtoâtext can be perfectly repeatable with caching and shared artifacts (prompt, deep dive, caching, context).
- Networking/storage matter: For distributed postâtraining, tuned networking (RDMA/fabrics) and storage can deliver 10Ă speedups on the same GPUs and code; tooling like SkyPilot automates config (@skypilot_org). Also, a rare clear writeâup on NCCL algorithms/protocols arrived, a boon for those optimizing collective comms (@StasBekman).
Top tweets (by engagement)
- Alibabaâs Qwen3âNext launch (80B MoE, 3B active; hybrid Gated DeltaNet + Gated Attention) with broad ecosystem support: @Alibaba_Qwen (2,391)
- VS Code v1.104: Copilot Chat agent upgrades, AGENTS.md, BYOK, and HF Inference Providers integration: @code (675)
- Seedream 4.0 leads TextâtoâImage and ties/leads Image Edit arenas; available on FAL/Replicate/BytePlus: @ArtificialAnlys (590)
- OpenAI Evals adds native audio inputs/graders; GPTâRealtime tops Big Bench Audio at 82.8%: @OpenAIDevs (521), @ArtificialAnlys (176)
- Thinking Machines builds a vLLM team to advance open inference for frontier models: @woosuk_k (242)
- Cloud GPU procurement comedy, painful reality: Oracle sales anecdote from the trenches: @vikhyatk (7,042)
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Qwen3-Next-80B A3B Launch + Tri-70B Apache-2.0 Checkpoints
- Qwen released Qwen3-Next-80B-A3B â the FUTURE of efficient LLMs is here! (Score: 377, Comments: 82): Qwen announced Qwen3-Next-80B-A3B, an
80B
parameter ultraâsparse MoE where only~3B
params are activated per token (A3B). It combines a hybrid Gated DeltaNet + Gated Attention stack with512
experts (router selectstopâ10
+1
shared) and MultiâToken Prediction for accelerated speculative decoding; Qwen claims~10Ă
cheaper training and~10Ă
faster inference than Qwen3â32B, especially at>=32K
context, while matching/beating Qwen3â32B and approaching [Qwen3â235B] in reasoning/longâcontext. A âThinkingâ variant is included and reportedly outperforms Geminiâ2.5âFlashâThinking; models are available on Hugging Face with a demo at chat.qwen.ai. Comments confirm the Thinking release, note strong capability for an A3B model but a tendency toward overly positive/verbose outputs versus Geminiâ2.5âFlash or Claude Sonnet 4, and raise deployment interest in GGUF quantizations (e.g., via Unsloth) plus feasibility of running an80B
MoE in64GB
VRAM.- Early impressions note the A3B quantized variant feels âsmartâ but over-enthusiastic in tone (a âglazerâ) compared to models like â2.5 Flashâ or âSonnet 4,â suggesting more aggressive RLHF/style tuning. A âThinkingâ variant was also released, which typically implies deliberate/stepwise reasoning tokens that can improve complex reasoning but at the cost of slower decoding and higher memory/time per token.
- On deployability: an 80B at ~
4.25 bpw
should require ~80e9 * 4.25/8 â 42.5 GB
just for weights; add KV cache in BF16/FP16 which can be ~2â3 MB/token for a 70â80B (e.g., ~20â25 GB at 8k ctx), plus framework overhead. Hence, 64 GB VRAM is typically sufficient for 4-bit inference with moderate context/batch, but long contexts or larger batches may need multi-GPU sharding or CPU offload (GGUF/llama.cpp-style inference once a community GGUF appears; see GGUF format: https://github.com/ggerganov/llama.cpp/blob/master/gguf.md). - Community is eyeing a GGUF build (e.g., via Unsloth: https://github.com/unslothai/unsloth) to run locally with 4â4.25 bpw; this often becomes the practical sweet spot for 70â80B models on single 48â64 GB GPUs. Trade-offs: 4-bit quant preserves most quality for many tasks but can affect edge cases (math/code/logical precision), and throughput will still be lower than 7â13B models due to compute/memory bandwidth limits.
- We just released the worldâs first 70B intermediate checkpoints. Yes, Apache 2.0. Yes, weâre still broke. (Score: 728, Comments: 62): Trillion Labs released Apache-2.0 licensed intermediate training checkpoints for a
70B
transformerâplus7B
,1.9B
, and0.5B
variantsâpublishing the âentire training journeyâ rather than only final weights, which they claim is a first at the70B
scale (earlier public trajectories like SmolLMâ3 and OLMoâ2 topped out at <14B
). Artifacts include base and intermediate checkpoints and a âfirst Korean 70Bâ model (training reportedly optimized for English), all ungated on Hugging Face: Triâ70BâIntermediateâCheckpoints. This enables transparent trainingâdynamics research (e.g., scaling/optimization analyses, curriculum ablations, and resume/finetune starting points) under a permissive license. Top comments are largely nonâtechnical: requests for a donation link to support the effort, a naming joke about âTrillionâ vs. parameter count, and general encouragement; no substantive technical critiques were raised in the highlights.
2. Qwen3-Next Teasers and Coming-Soon Posts
- Qwen3-Next-80B-A3B-Thinking soon (Score: 403, Comments: 86): Post teases Alibaba/Qwenâs forthcoming âQwen3-Next-80B-A3B-Thinking,â which appears to be a sparse MoE reasoning model with ~3B-parameter experts and
k=10
experts active per token (per the model card screenshot), totaling ~80B parameters. The âA3Bâ likely denotes 3B expert size; the sparse routing suggests significantly lower per-token compute and memory bandwidth than dense 80B, making it more inference-friendly on modest hardware, with a separate non-reasoning instruct variant expected since Qwen says theyâre no longer doing hybrid models. âThinkingâ implies a deliberate/CoT-style reasoning-focused configuration. Comments debate hardware implications: enthusiasm that only a subset of experts fire per token could let it run on mini PCs or nonâNVIDIA accelerators favoring large memory over sheer compute, though correction notes itâsk=10
(not 1). Others praise Qwenâs rapid cadence and expect a standard instruct (non-reasoning) model alongside the reasoning variant.- Sparsity/config clarification: Qwen3-Next-80B-A3B-Thinking is discussed as an MoE with ~3B-parameter experts and
k=10
active experts per token (not 1), implying ~30B
active params/token plus shared layers. This reduces per-token FLOPs vs a dense 80B while requiring substantial memory to host all experts, aligning with inference on hardware emphasizing large memory capacity/bandwidth (potentially nonâNVIDIA/China accelerators) and enabling decent throughput on modest rigs via sharding/offload. - Product strategy: Qwen is noted to have dropped âhybridâ models, suggesting there will be a separate nonâreasoning instruct counterpart in addition to the A3B âThinkingâ variant. This separation caters to different inference budgets and use cases (instruction vs reasoning), while leveraging MoE sparsity to balance quality and efficiency.
- Trend context: Commenters see this as part of the ongoing shift toward MoEâhere with relatively high
topâk
(10) compared to commontopâ2
MoE like Mixtral 8x7Bâtrading some extra compute for improved quality/coverage, yet still far cheaper than dense. The higher parallelizable workload across experts also maps well to accelerators prioritizing memory capacity over raw core speed.
- Sparsity/config clarification: Qwen3-Next-80B-A3B-Thinking is discussed as an MoE with ~3B-parameter experts and
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Seedream/Seedance 4.0 Image Model Releases and Benchmarks
- Seedream 4.0 is the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena, surpassing Googleâs Gemini 2.5 Flash (Nano-Banana), across both! (Score: 242, Comments: 86): Seedream 4.0 now leads both the Text-to-Image and Image Editing leaderboards on the Artificial Analysis Arena, surpassing Googleâs Gemini 2.5 Flash (âNano-Bananaâ) across both tasks. This positions Seedream 4.0 as the current SOTA on AAâs public benchmarks for image generation and editing. Commenters highlight the rarity and significance of topping both generation and editing simultaneously, and speculate about forthcoming stronger baselines (e.g., a higher-tier Gemini release) while expressing interest in an open-weights contender, potentially from Chinese labs.
- Users highlight that Seedream 4.0 is now rank-1 across both the Artificial Analysis Text-to-Image Arena and Image Editing Arena, reportedly surpassing Google Gemini 2.5 Flash (Nano-Banana), implying strong cross-task generalization rather than optimization for a single modality. Dual leadership suggests robustness in both initial synthesis and localized edit controllability; see the leaderboards on Artificial Analysis.
- Several note the caveat that âbenchmarks/leaderboards arenât everything,â pointing out technical confounders in arena-style rankings: prompt distribution biases, sampler/CFG/steps settings, seed variance, and safety-filter behaviors can all swing pairwise preference/ELO outcomes. Especially for editing, factors like mask quality, localization accuracy, and prompt adherence by category (e.g., typography, multi-object composition) matter; without per-category breakdowns or fixed seeds, leaderboard rank may not reflect performance in a given userâs workflow.
- Thereâs debate on safety-moderation layers affecting scores: stricter or stacked moderation can increase refusals or over-sanitize outputs, which tends to reduce win-rate in open preference arenas even if the base model is capable. Conversely, looser safety can yield more vivid or direct generations that win preferencesâhighlighting that leaderboard position may conflate raw capability with moderation policy.
- Seedance 4.0 is so impressive and scary at the same time⊠(all these images are not real and donât exist btw) (Score: 374, Comments: 77): Post showcases âSeedance 4.0,â an imageâgeneration model producing highly photorealistic portraits where the subjects âdonât exist,â highlighting the current state of synthetic media realism. The thread provides no concrete details (architecture, training data, evals, safety features, or watermarking/provenance), but the samples imply nearâSOTA fidelity for human faces, increasing risks for mis/disinformation and underscoring the need for content provenance (e.g., C2PA) and deepfake detection tooling. Top comments note concern over astroturfed/âorganicâ advertising that often follows new model launches, and broader skepticism about social media dynamicsârather than technical critique of the model itself.
- Comparative output diversity: Users report Seedance 4.0 tends to produce consistent, repeatable âsame (good) resultsâ for similar prompts, while Nano Banana shows higher intraâprompt variance. This implies Seedance may be tuned for stability/faithfulness over diversity, which benefits controlled art direction but can reduce exploration across seeds.
- Openness as adoption gate: One commenterâs stance âIf not open, not interestedâ highlights friction with closed models for reproducibility and benchmarking. Closed weights/checkpoints limit community validation, ablations, and integration into local pipelines, affecting trust and iterative improvement.
- 1GIRL QWEN v2.0 released! (Score: 353, Comments: 49): Release of 1GIRL QWEN v2.0 (
v2.0
), a LoRA fineâtune targeting the QwenâImage/Qwen2âImage textâtoâimage model, aimed at photorealistic singleâsubject (female) portraits. The model is distributed on Civitai with a sample preview; however, the post provides no training details (dataset, steps, LoRA rank/alpha), base checkpoint/version, prompt tokens, or inference settings/benchmarks. Top comments flag the release as another âinstagirl/1girlâ promo and suggest leading with a goth example; thereâs also an allegation of vote manipulation followed by âstabilizedâ votes. A commenter asks if the LoRA is uncensored, with no explicit answer inâthread.- A commenter requests the LoRA training recipe and environment details to reproduce results locally, specifying hardware of RTX 4080 Super (
16 GB
VRAM) +32 GB
RAM. They note prior success training for SDXL and are now using Qwen, praising its prompt fidelity, and ask for practical guidance on dataset prep and training parameters/hyperparameters to achieve comparable quality. - Another user asks whether the release is uncensored, i.e., if safety filters/content restrictions are disabled. This impacts local deployment scenarios and determines whether NSFW or restricted content generation is supported out of the box.
- One comment flags a generation quality issue: âsecond picture thigh larger than torso,â indicating noticeable anatomy/proportion artifacts in sample outputs. This highlights potential shortcomings in model outputs that technical users may want to evaluate or mitigate during inference or future fine-tuning.
- A commenter requests the LoRA training recipe and environment details to reproduce results locally, specifying hardware of RTX 4080 Super (
- it seems like Gemini 3 wonât come out this month (Score: 341, Comments: 84): Unverified rumor that Gemini 3 wonât launch this month; no official source, release notes, or benchmarks are cited. Comments speculate that
Gemini 3.0 Flash
could outperformGemini 2.5 Pro
, implying the lowerâlatency âFlashâ tier might temporarily leapfrog the prior âProâ tier for many workloadsâwithout any evals, metrics, or implementation details to substantiate it. One commenter asserts âItâll be better than 2.5 Pro â for a limited timeâ, implying a temporary tier reshuffle or promo window, while others call out the lack of evidence (e.g., âSource: trust me broâ).- Debate centers on whether Googleâs speed/costâoptimized Gemini 3.0 Flash could actually outperform the capabilityâtier Gemini 2.5 Pro, which would upend product tiering. If
3.0
Flash truly beats2.5
Pro, commenters note most users âwouldnât even need Pro,â implying a leap in reasoning/quality, not just latency. Historically, Flashâclass models target low latency and cost while Pro/Ultra lead complex reasoning (Gemini model tiers), so any âFlash > Proâ outcome would likely be metricâspecific (e.g., latency or narrow tasks) rather than acrossâtheâboard. - Skepticism is high due to lack of evidenceââSource: trust me broââand hints that any superiority might be âfor a limited time,â suggesting temporary access gating or staged rollouts. Several doubt 3.0 Flash will surpass 2.5 Pro on reasoning benchmarks (e.g., MMLU, GSM8K), framing current claims as marketingâdriven hype absent publicly verifiable evals.
- Debate centers on whether Googleâs speed/costâoptimized Gemini 3.0 Flash could actually outperform the capabilityâtier Gemini 2.5 Pro, which would upend product tiering. If
- Gothivation (Score: 576, Comments: 92): The linked media at v.redd.it/bucq7dlt8jof1 is not accessible due to an
HTTP 403
network-security block, so the video content cannot be verified from the URL. From the comment context, the post appears to showcase an AIâgenerated âgothâ video that is realistic enough to pass casual viewing, but the thread provides no technical details (model, pipeline, training data, or benchmarks) and no visible artifacts are discussed. In short, thereâs no reproducible implementation info or evaluation data in-thread. One top comment notes they didnât realize it was an AI video until seeing the subreddit name, underscoring increasing realism and the difficulty of casual detection; other highly upvoted remarks are non-technical.- One commenter highlights the growing indistinguishability of AI-generated video: âIâm more and more impressed every day at how often I donât realize Iâm watching an ai video until I look at the sub name.â This suggests improved visual fidelity and temporal coherence, with fewer telltale artifacts (e.g., hand/finger anomalies, flicker), making casual detection unreliable and underscoring the need for provenance/watermarking or model-level detection. Absent explicit model details, the trend aligns with rapid advances in text-to-video diffusion/transformer pipelines and upscalers, which compress perceptual gaps that used to give AI away.
- Gothivation (Score: 580, Comments: 92): Post shares an AI-generated short video titled âGothivation,â likely a talking-head/character-actor clip with a goth aesthetic delivering a motivational monologue. The referenced media v.redd.it/bucq7dlt8jof1 returns
HTTP 403 (Forbidden)
without Reddit auth/dev token, so model/pipeline details arenât disclosed in-thread; however, commenters suggest the synthesis quality is high enough to pass casual scrutiny (strong lip-sync/affect coherence implied). Most substantive remark notes they didnât realize it was an AI video until seeing the subreddit name, underscoring rising realism of consumer-grade avatar/talking-head generation; other top comments are non-technical quips.- A commenter highlights that AI-generated video is becoming hard to distinguish from real footage without contextual cues, implying modern diffusion/GAN video systems have reduced typical giveaways (e.g., mouth sync errors, hand/finger topology glitches, inconsistent specular highlights). Effective detection increasingly depends on temporal signals (blink cadence, motion parallax, physics of fabric/hair), lighting/color continuity across frames, and metadataârather than single-frame artifactsâsuggesting moderation/detection pipelines should incorporate temporal and multimodal analysis.
- Control (Score: 248, Comments: 47): A demo showcases a pipeline combining âInfiniteTalkâ (audio-driven talking-head/lipâsync) with âUniAnimateâ (image/video animation with pose/hand control) to produce a dubbed clip emphasizing controllable hand motion while maintaining strong facial expressiveness. Viewers note notably realistic facial performance and stability/identity cues (e.g., consistent ring details on the right hand), suggesting good temporal consistency beyond just hands. Commenters ask how to integrate UniAnimate with InfiniteTalk in a videoâtoâvideo dubbing workflow that preserves the source motion exactly; they report slight movement drift/mismatch, highlighting synchronization and motionâlock challenges when trying to maintain frameâaccurate body/pose while swapping or reâanimating the face.
- Technical concern about combining Unianimate with Infinite Talk for video-to-video dubbing: the output does not preserve the source motion exactly, leading to movement drift despite aiming only to change speech/lips. The user needs frame-accurate temporal alignment where pose/trajectory are locked to the input while audio-driven lip and facial articulation are modified. The request implies a need for strict motion control signals and synchronization to avoid deviation across frames.
- Observation on fidelity: commenters note facial performance quality is strong relative to hand/pose control, suggesting disparities in control robustness between face reenactment and full-body/hand tracking. One tip is to âfollow the rings on her right handâ to evaluate motion consistency, implying subtle artifacts or lag in hand alignment even when the face tracks well.
- Reproducibility gap: multiple requests for the exact workflow/pipeline (toolchain, settings, and versions) indicate that the showcased result lacks a documented, step-by-step process. Sharing concrete parameters (model versions, control strengths, frame rate handling, and alignment settings) would enable others to replicate and diagnose the motion deviation issues.
- saw a couple of these going around earlier and got curious (Score: 8449, Comments: 1489): Meme-style screenshot of a novelty AI/quiz output that absurdly infers a userâs âpreferenceâ (claiming they want to have sex with potatoes), which the OP explicitly rejects. Context suggests a trend of people trying a low-quality AI predictor; it illustrates classic hallucination/misclassification and weak safety/NSFW filtering with no technical details, benchmarks, or model info provided. Commenters broadly deride the modelâs reliability and seriousness (e.g., âIf the future is AI, we better hope itâs not this AIâ), expressing disbelief and concern rather than technical debate.
- The thread shares multiple AI-generated image results via Redditâs image CDN (e.g., https://preview.redd.it/wlmvcaoqifof1.jpeg) but contains no technical detailsâno model names (e.g., SDXL, Midjourney v6), prompts, seeds, samplers, steps, CFG/Guidance, negative prompts, or model hashes. Because Redditâs pipeline typically strips EXIF/embedded JSON, any Stable Diffusion metadata (prompt, seed, sampler) is unrecoverable, so outputs here are non-reproducible and not diagnosable beyond speculation.
- For a technically actionable discussion, posts would need full generation context: base model and version/hash, sampler (e.g.,
DPM++ 2M Karras
,DDIM
), steps, CFG, resolution, seed, and any refiners/ControlNets/LoRAs (e.g., SDXL base+refiner at 1024px, Hires fix, LoRA stacks). With that, readers could attribute anomalies to parameters (e.g., over-high CFG, under-steps) or architecture (MJâs internal sampler vs. SDXL pipelines) and propose fixes or reproduce A/B tests.
- Lol. I asked ChatGPT to generate an image of the boyfriend it thinks I want and the boyfriend it thinks I need (Score: 2532, Comments: 651): User asked ChatGPTâs image generator (likely DALL·E 3 via ChatGPT) to produce a âboyfriend it thinks I wantâ vs âboyfriend it thinks I needâ comparison. The resulting image appears to inject alignment/virtue cuesâone figure is noted holding an âAI Safetyâ bookâsuggesting the model projects safety/wholesome themes and may misinterpret ambiguous âwant vs needâ prompts, reflecting RLHF-influenced bias and value signaling in generative outputs. Commenters point out the odd inclusion of an âAI safetyâ book and suggest GPT misunderstood the prompt; another says the output is acceptable, implying the modelâs conservative/wholesome bias isnât unwelcome.
- Mostly reaction/image posts with no benchmarks or model details; the one technical signal is prompt-grounding/safety steering artifacts: a generated image includes an âAI safety book,â suggesting the LLMâT2I pipeline (e.g., ChatGPT + a diffusion backend like DALL·E 3) injected safety-related concepts or misinterpreted intent. Diffusion models also notoriously hallucinate or garble embedded text, so visible, off-prompt text is a known failure mode tied to token-to-glyph mapping and safety rewrites; see the DALL·E 3 system card on safety filtering and prompt transformations (https://cdn.openai.com/papers/dall-e-3-system-card.pdf) and discussions on text rendering limitations in diffusion models (e.g., https://openai.com/research/dall-e-3).
- I asked ChatGPT to make a Whereâs Waldo? for the next Halloween. Can you find him? (Score: 636, Comments: 56): A Redditor used ChatGPTâs builtâin image generation to create a Halloweenâthemed, Whereâs Waldoâstyle seekâandâfind scene, showcasing dense composition and a hidden target consistent with Wimmelbilder prompts. Commenters confirm Waldoâs discoverability with a cropped proof and note small visual cues (e.g., a âraised eyebrowâ pumpkin), and another user posts their own, reportedly trickier, AIâgenerated variantâindicating reproducibility of cluttered, puzzleâlike scenes. Discussion revolves around how well the image hides Waldo and the sceneâs visual density rather than implementation details; no benchmarks or model specifics are provided.
- Users compared AI-generated âWhereâs Waldo?â scenes across models: the OP used ChatGPT (per title) and another user tried Google Gemini image. The Gemini outputâs findability was ambiguousâcommenters couldnât tell if the target was cleverly hidden or if the composition lacked a distinct âWaldoââhighlighting challenges for image models in consistent character rendering and cluttered-scene composition.
- Image resolution/format varied across sharesâ
1536px
example,1024px
example, and a493px
crop exampleâwith Redditâsauto=webp
conversion. Downscaling and WebP recompression can obscure fine-grained cues (e.g., stripe patterns) and materially change perceived difficulty, so any comparison of âhardnessâ should control for resolution and compression artifacts.
2. UK Government AI Adoption and ChatGPT Ads Monetization
- AI is quietly taking over the British government (Score: 3012, Comments: 171): A screenshot of a UK Parliament/House of Commons webpage is run through an AI-content detector, which flags sections as likely âAI-generatedâ (image). Technically this suggests, at most, AI-assisted drafting or proofreading of public-facing copy (e.g., ChatGPT rewrites or Grammarly), not automation of governmental decisions; moreover, AI-detection tools are known to yield high false positives and cannot prove authorship. No evidence of code, systems integration, or operational control by AI is shown. Commenters argue the title is overblown; many workersâincluding MPsâuse AI as a proofreading aid, and a follow-up image hints key legal/formulaic text remained unchanged, undercutting the âtakeoverâ claim.
- Adoption timeline and scope: The UK government had broad access to Microsoft 365 Copilot via a government-wide free trial in OctâDec
2024
(The Register), followed by the Labour governmentâs Jan2025
blueprint to mainstream AI across departments (gov.uk). This sequence indicates formal, institutionally sanctioned deployment rather than adâhoc usage, and anchors claims of AI uptake to concrete products and dates. - Usage pattern vs displacement: Practitioners highlight AI as a proofreading/writing assist rather than full content generation, which matches assistive workflows embedded in M365 Copilot (Word/Outlook). The implication is workflow augmentation (QA, consistency, turnaround time) rather than role replacement, i.e., AI as a linguistic verification layer within existing processes.
- Attribution/correlation critique: A commenter notes the linguistic shifts in Commons texts align more with the Labour change of government than with ChatGPTâs public availability, cautioning against attributing authorship to LLMs. A sound analysis would test for change-points in Hansard style/lexical distributions around
Jul 2024
(government change) versusNov 2022
/Mar 2023
(ChatGPT/GPt-4 milestones) to control for confounders.
- Adoption timeline and scope: The UK government had broad access to Microsoft 365 Copilot via a government-wide free trial in OctâDec
- AI is quietly taking over the British government (Score: 4291, Comments: 210): The image appears to be a screenshot of an AI-text detector labeling a UK parliamentary/ministerial speech as âAI-generatedâ or highly likely AI, implying âAI is quietly taking over.â Technically, this showcases a known limitation of detectors: they often key on low-perplexity, template-like phrasing and repeated stock expressionsâfeatures common in professional speechwritingâleading to false positives and not constituting evidence of actual AI authorship. Commenters note Westminster speech has long been formulaic and meme-like phrases propagate among political factions, which can trigger detectors; others add that even without explicit ChatGPT usage, AI-influenced style can percolate into human writing over time.
- Multiple commenters note high false-positive rates when flagging human-written text as AI, aligning with known limitations of current detectors. OpenAI discontinued its AI Text Classifier due to âlow accuracyâ (high FP/FN) link, and Liang et al. 2023 found detectors like GPTZero flagged
61%
of non-native TOEFL essays as AI arXiv. This undermines claims that rising âAI-likeâ phrasing in speeches necessarily implies model usage without stronger evidence and calibrated baselines. - Several point out that parliamentary rhetoric is historically formulaic and subject to rapid fashion cycles, so time-series spikes in specific n-grams around the ChatGPT release risk conflating trend adoption with causality. A more defensible approach would use an interrupted time-series or difference-in-differences on Hansard corpora (e.g., UK Parliament API) with speaker and party fixed effects, plus controls for media-driven meme diffusion (cross-correlating phrase adoption with external media timelines). Without such controls, phrase-frequency plots are likely picking up stylistic contagion rather than AI authorship.
- Commenters also highlight AIâs indirect influence on human language: even when speeches arenât generated, writers may mimic model-suggested phrasing, making phrase-level AI attribution unreliable. Perplexity/burstiness-based detectors are brittle and degrade under light editing/paraphrase (see Ippolito et al. 2020 arXiv and DetectGPT by Mitchell et al. 2023 arXiv), so âAI-likeâ templates such as ânot just X but Yâ are poor evidence. Robust attribution would require watermarking or provenance signals rather than surface-level stylistic cues.
- Multiple commenters note high false-positive rates when flagging human-written text as AI, aligning with known limitations of current detectors. OpenAI discontinued its AI Text Classifier due to âlow accuracyâ (high FP/FN) link, and Liang et al. 2023 found detectors like GPTZero flagged
- Enjoy ChatGPT while it lastsâŠ. the ads are coming (Score: 2375, Comments: 163): The post argues that commercial LLM assistants (OpenAI/ChatGPT, Perplexity, Anthropic) will likely monetize by embedding advertising directly into generated answersâanalogous to how Google search evolvedâcreating incentives for response bias, telemetry-driven targeting, and ad-influenced retrieval/grounding that could erode user trust and turn AI chat into a surveillance-driven discovery layer. It questions whether ads-in-the-loop (e.g., sponsorship-weighted generation, RAG ranking skewed by paid content, or RLHF nudges) would compromise answer integrity versus subscription-only models. Commenters debate scope: ads on free tiers may be tolerable but not for Plus/Pro; implicit/stealth influence (organic product steering) is considered more harmful than explicit ads; several argue raising subscription prices or other offsets is preferable, noting that ad-driven reputational risk could slow adoption.
- Several commenters warn that monetization may manifest as âorganicâ steering rather than explicit banner adsâe.g., retrieval/citation ranking subtly favoring commercial entities or affiliates. In a RAG/tool-use stack this could be implemented by weighting retrieval scores, re-ranking candidates, or adjusting link choice under the hood, making bias hard to detect because it looks like normal reasoning. Auditing would require counterfactual prompts, distributional checks of cited domains, and A/B comparisons against a non-monetized baseline to spot systematic drift toward sponsors.
- Others note outbound links already include attribution/affiliate-like parameters so destinations can identify traffic sources. Technically this can be done via UTM parameters or partner tags in query strings (see Googleâs UTM spec: https://support.google.com/analytics/answer/1033863 and MDN on Referer/Referrer-Policy: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer), enabling conversion tracking and potential revenue sharing even when referrer headers/cookies are limited. This creates a measurable telemetry loop (click-through, conversions) that could be optimized by the model or ranking layer, reinforcing monetized link selection over time.
- A key risk raised for the open-source ecosystem is training-data contamination if web scrapes absorb AI-generated outputs that already contain monetized biases. This aligns with findings on quality/bias drift when models train on their own or synthetic outputs (e.g., âModel Autophagy Disorder,â https://arxiv.org/abs/2307.01850), with ads acting as a domain-specific poisoning vector. Mitigations include provenance tracking, synthetic-content detectors, domain de-duplication, and explicit filters for affiliate/UTM-tagged URLs during corpus curation.
- Why havenât all the other companies (Google, OpenAI, Deepseek, Qwen, Kimi and others) added this before? Itâs literally the most obvious and most needed thing đ€ (Score: 295, Comments: 51): The image appears to showcase a chat UI touting a ânewâ native file upload/analysis workspace (multi-file document/code/data handling). Commenters note this isnât novel: ChatGPTâs Code Interpreter/Advanced Data Analysis has supported uploading and programmatically analyzing files (CSVs, ZIPs, PDFs, etc.) since 2023 using a Python sandbox, with similar capabilities also present in other stacks; the real gaps tend to be UX and reliability, especially for complex documents. See e.g., OpenAIâs Advanced Data Analysis docs and prior announcements (OpenAI help, blog, 2023). Top comments push back that the feature is old news (âWhoâs gonna tell himâ), adding that while non-visual files work well, PDF ingestion/understanding remains âmid.â
- Several commenters point out this capability has existed since OpenAIâs Code Interpreter/Advanced Data Analysis rollout in mid-2023, which lets ChatGPT upload and process PDFs/CSVs by running Python in a sandbox for parsing, data extraction, and visualization. They note quality varies: non-visual/structured files perform well, but PDF parsing can be âmidâ due to layout/OCR/table-detection limits, especially with complex or scanned documents. See OpenAIâs announcement: https://openai.com/blog/code-interpreter.
- Thereâs broad feature parity across vendors: Google Gemini supports file uploads (PDFs, images, etc.) via its File API for analysis (docs: https://ai.google.dev/gemini-api/docs/file_uploads), Microsoft Copilot can ingest and analyze uploaded documents in chat/Office contexts, and DeepSeek also advertises document Q&A in its chat clients. Differences are largely in modality coverage and extraction fidelity (e.g., robustness to complex PDF layouts) rather than the existence of the feature itself.
- People leaving AI companies be like (Score: 954, Comments: 45): Non-technical meme about departures from AI companies; comments contextualize it with the 2024 exits from OpenAIâs Superalignment team (e.g., Jan Leikeâs resignation and the teamâs disbanding), where leadership cited disagreements over safety priorities and resources (Jan Leike, reporting). Top comments argue the Superalignment team âwasnât useful,â claiming none of its work shipped and that they had to create deliberately weak models to publish safety findings, while others quip that ex-employees start âsafer-namedâ startups or call themselves âsurvivors.â
- A commenter claims OpenAIâs âsuperalignmentâ group had negligible production impact: none of their work purportedly shipped into ChatGPT, and they allegedly had to construct deliberately weak LLMs to demonstrate safety failures that standard safety layers and
RLHF
already mitigated in deployed systems. This highlights a perceived gap between alignment research artifacts and productized safety techniques (e.g., RLHF, policy filters) that directly affect user-facing models. - They further argue the team was progressively sidelined as practical safeguards (RLHF/filtering) addressed most real-world issues, so departures had little operational consequenceâimplying orgs may deprioritize alignment research that doesnât yield measurable product or risk-reduction deliverables.
- A commenter claims OpenAIâs âsuperalignmentâ group had negligible production impact: none of their work purportedly shipped into ChatGPT, and they allegedly had to construct deliberately weak LLMs to demonstrate safety failures that standard safety layers and
- This popup called me out harder than my ex (Score: 377, Comments: 67): Meme-style screenshot likely from ChatGPT showing a privacy/data-use popup (reminding that chats can be reviewed/used to improve models) while the UI also exposes the userâs recent chat titles in the sidebar. Technically, ChatGPT stores chat history by default, and unless users disable âChat history & training,â conversations may be reviewed to improve systems; the humor stems from the popup âcalling outâ sensitive chats and the screenshot unintentionally sharing recent activity. Comments joke about accidental oversharing and privacy (e.g., Altman âreading sexting chatsâ) and at least one user saying they donât belong there, underscoring discomfort with data review vs. user expectations.
3. Real-world AI Impacts: Builder Traction, Medical Triage, and Consciousness Debate
- Built with Claude Code - now scared because people use it (Score: 279, Comments: 77): Founder of https://companionguide.ai describes hacking together a tool using Claude Code inside VSCode and deploying on Netlify; unexpected traction from strangers triggered concerns about reliability, support, and whether to productize the MVP. The post focuses on early-stage operational readiness (stability, breakage risk) rather than code specifics or benchmarks. Top comments suggest paying for a professional code review once money is involved and note that even mature products break regularlyânormalize issues while improving robustness.
- Primary actionable advice: before scaling paid usage, invest in a professional code review/security audit to identify correctness, security, and dependency risks earlyâpreventing outages and revenue loss. A thorough review can surface edge cases, unsafe thirdâparty libraries, and architectural pitfalls that are expensive to fix postâlaunch.
- Reminder that even mature, professional products fail; plan for failure with observability and resilience. Concretely, prioritize logging/metrics/tracing, graceful degradation paths, clear incident response/runbooks, and automated tests to contain blast radius when issues inevitably occur.
- ChatGPT may have saved my life (Score: 438, Comments: 55): OP reports that ChatGPT performed basic symptom triage for suspected acute appendicitis by querying for rightâlowerâquadrant (RLQ) localization and rebound tendernessâe.g., âIs it hurting in the bottom right?â and âdoes it hurt if you press and release?ââboth classic signs of appendicitis, including McBurneyâs point tenderness and rebound tenderness. This prompted an ER visit at
~2am
, where clinicians indicated the appendix was near perforation; the prompts align with elements of the Alvarado score (e.g., RLQ tenderness, rebound pain), illustrating LLMâdriven layperson triage approximating clinical heuristics. Top comments provide additional anecdotes of LLMs offering useful differentials and patient education (healing/rehab timelines), occasionally anticipating clinician diagnoses; debate notes potential lifeâsaving triage benefits versus rare harmful uses (e.g., assisting selfâharm), with overall sentiment that LLMs can augmentânot replaceâmedical professionals.- ChatGPT is used as a lightweight clinical decision-support tool for differential diagnosis and triage: when appendicitis was suspected, it enumerated alternative etiologies and surfaced an inflammatory condition that matched the eventual clinical diagnosis. For GI complaints, it guided structured self-checks (e.g., assessing gallbladder pain, screening red flags) to rule out emergent issues, helping users prioritize care pathways without replacing imaging/labs.
- As an evidence retriever and explainer, it provided study links and rationale-driven guidance for presumed gastritis, including staged diet planning and nutrient-dense, âsafeâ food selection based on irritant/acid load. Users report actionable, consistent explanations that made it easier to maintain nutrition during limited intake, illustrating utility in patient education and protocol adherence rather than definitive diagnosis.
- Reliability and safety: commenters note occasional hallucinations and unjustified assumptions that required cross-checking and correction, though one reported it was ârarely incorrectâ within the constrained diet domain. A telehealth clinician later corroborated the working diagnosis, suggesting a workflow where LLM-assisted hypothesis generation and education precede clinician confirmation via diagnostics.
- If you swapped out one neuron with an artificial neuron that acts in all the same ways, would you lose consciousness? You can see where this is going. Fascinating discussion with Nobel Laureate and Godfather of AI (Score: 940, Comments: 419): The post revisits the neuronâreplacement (silicon prosthesis) thought experiment: if a single biological neuron is replaced by a functionally identical artificial unit matching spike timing, synaptic/plasticity dynamics, and neuromodulatory responses, would consciousness changeâand what follows under gradual fullâbrain replacement? The setup implicitly tests substrateâindependence/functionalism (cf. Chalmersâ âfading/dancing qualiaâ argument: https://consc.net/papers/fading.html) versus biologically essentialist views, and invokes identity continuity puzzles akin to the Ship of Theseus and multiple realizability (see SEP on Functionalism). Top comments emphasize that the âoomphâ intuition has no operational/empirical contentâânot something you can objectively measureââand relate the scenario to ShipâofâTheseus identity continuity; others note the discussion is standard in philosophy of mind but acknowledge the speakerâs clear delivery.
- Several commenters note that the term âoomphâ for consciousness lacks an operational definition, making it non-measurable and unfalsifiable. For technical evaluation, this highlights the need for operational criteria (e.g., reportability, behavioral/physiological markers, timing/causal interventions) rather than appeals to an undefined scalar of âconsciousness.â Without agreed-upon metrics, discourse reduces to intuition pumps and canât be benchmarked or stress-tested like other AI capabilities.
- Applying Ship of Theseus to neural replacement, the technically salient claim is that if each biological neuron is replaced by a functionally isomorphic artificial unit (preserving IO mappings, latencies, plasticity rules, and network-level dynamics), system-level behavior should remain invariant. This aligns with functionalism and the âgradual replacementâ defense of consciousness continuity, pushing back on substrate-essentialist views; see Chalmersâ arguments on fading/dancing qualia for why massive qualia shifts without behavioral change are implausible (https://consc.net/papers/qualia.html). The hard part is specifying the equivalence class: does the replica need to match spike-timing statistics, neuromodulatory effects, and learning rules, or only causal role at some abstraction level?
- A âduck testâ perspective argues for behavioral/operational criteria: if an agent is behaviorally indistinguishable and expresses preferences (e.g., not wanting shutdown), that may be a sufficient practical criterion irrespective of substrate, akin to a Turing-style operationalization (https://www.csee.umbc.edu/courses/471/papers/turing.pdf). The technical question becomes detecting and auditing non-instrumental preference expression versus goal-misdirected outputs under optimization pressure (e.g., deception), which implies the need for interpretability, consistency checks, and causal interventions. Full episode for deeper context: https://www.youtube.com/watch?v=giT0ytynSqg
- AI (Score: 1858, Comments: 94): The post titled âAIâ contains no technical contentâno models, code, datasets, benchmarks, or implementation details. It appears to be a short GIF/video gag featuring an initially blurred face followed by a full reveal (an intentionally inconsistent âcensorshipâ effect), with no accompanying explanation or references. Commenters note the comedic timingâhighlighting the abrupt deâblurring (e.g., âblurred face then the fully revealed faceâ)âand express general appreciation; there is no substantive technical debate.
- wtf (Score: 1692, Comments: 144): Non-technical meme: a screenshot implies a user is shocked (âwtfâ) by an AI/robot/chatbot response that is exactly what it was trained/programmed to do. The thread jokes about trivial or poorly designed training/inference (e.g., wasting CPU to print âhelloâ), underscoring the basic principle that models do what theyâre trained to do (garbage in, garbage out). Comments emphasize user responsibility (âyou trained itâ), mock expecting emergent behavior from trivial code, and note the bot responding âexactly as programmed.â
- I think I have Alzheimerâs. (Score: 577, Comments: 59): OP shares evidence that the assistant isnât retaining information across chats (framed as âI think I have Alzheimerâsâ), implying a failure of cross-session recall rather than in-thread context loss. A top comment suggests adding a third screenshot showing whether the Memory across conversations feature is enabled to substantiate the claim; if disabled, the behavior is expected per OpenAIâs memory design (see OpenAIâs overview: https://openai.com/index/memory-and-new-controls-for-chatgpt/). Most replies are humorous; the only technically substantive feedback is to verify the memory toggle before diagnosing a bug or regression.
- One commenter suggests adding a third screenshot showing whether âmemory across conversationsâ is enabled to substantiate claims about the assistantâs forgetfulness. This highlights that product-level memory toggles can confound observations by mixing cross-chat memory with per-session context limits; a reproducible report should control for that setting and specify model/session details.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Generation Efficiency and Kernel-Level Wins
- Set Block Decoding Slashes Steps: The paper Set Block Decoding (SBD) integrates next-token prediction (NTP) and masked token prediction (MATP) to cut generation forward passes by 3â5x while maintaining accuracy on Llamaâ3.1 8B and Qwenâ3 8B, with no architectural changes and full KV-cache compatibility.
- Members highlighted SBDâs use of discrete diffusion solvers and praised its practicality as a fine-tune on existing NTP models, noting it promises significant speedups without hyperparameter headaches or system overhauls.
- MI300X VALU Mystery Meets Thread Trace: Engineers probed a suspected dual VALU glitch on MI300X where VALUBusy hit 200%, advising confirmation via limiting to one wave per SIMD (launch 1216 waves) and thread tracing with rocprofiler thread trace and rocprof compute viewer.
- They recommended using rocprofv3 and thread traces to verify if cycles with two waves issue VALUs, framing a repeatable methodology to isolate scheduler behavior at SIMD granularity.
- CUDA Graph Warmup: Capture Smarter, Not Longer: A prolonged CUDA graph warmup (~30 min) triggered guidance to capture a graph for decoding a single token rather than long model.generate() loops, referencing the profiling code in low-bit-inference profiling utils.
- Experts suggested capturing a single forward pass to avoid redundant warmup paths and reduce setup time, aligning graph capture with the intended steadyâstate decode workload.
2. Leaderboards, MoE Moves, and New Models
- Qwen3-Next-80B Teases Tiny-Active Titan: Alibaba announced Qwen3âNextâ80BâA3B, an 80B ultraâsparse MoE with only 3B active parameters, claiming 10Ă cheaper training and 32K+ faster inference while matching Qwen3â235B reasoning (announcement).
- Community chatter noted extreme sparsity (e.g., ~1:51.2 at the MoE level and ~1:20 overall), flagging it as a key signal that sparse experts are the nearâterm path to scalable inference economics.
- LMArena Adds Models and Cleans House: The leaderboard added Seedreamâ4, Qwen3ânextâ80bâa3bâinstruct/thinking, and Hunyuanâimageâ2.1 per LMArena announcements.
- Users also noted the removal of the legacy sites and were invited to submit feature requests for the current platform, consolidating evaluation traffic to a single surface.
- NanoâBanana Nukes Seedream V4 in Edits: Early reports showed Seedream V4 struggling on imageâediting tasks (e.g., changing outfits while preserving face/body pose) against NanoâBanana; users tested via LMArena image mode.
- Feedback described Seedream V4 as getting âmassacredâ on targeted edits, underscoring that editâpreservation benchmarks remain a differentiator among image models.
3. Agentic Tools and Connectors Go Practical
- Comet Controls the Canvas (and Concerns): Perplexityâs Comet browser drew attention for agentic control that can fill forms, open tabs, and reply to emails, alongside praise for adâblocking and summarization but concerns about privacy/security after a reported vulnerability.
- Members emphasized that it âcan control ur browserâ and debated the safety tradeoffs of autonomous browsing versus productivity gains for routine workflows.
- OpenAI Connectors Unlock Custom MCPs: OpenAI enabled custom MCPs in ChatGPT via Connectors in ChatGPT, giving teams more control over infrastructure choices and data paths.
- Builders welcomed the flexibility and asked for better artifact distribution (e.g., hosting proposal PDFs online) to streamline collaboration and review.
- Transparent Optimizations Pitches Prompt Previews: A proposal for Transparent Optimizations introduced optimizer markers, prompt rewrite previews, and feasibility checks (discussion link).
- Participants requested easier access to supporting docs (e.g., webâhosted PDFs) and debated how much control users should retain over optimizerâdriven rewrites.
4. Systems Tooling Shifts and GPU Gotchas
- vLLMâs uv pip Trips Nightly Torch: A change to custom builds using uv pip in vLLM uninstalled nightly torch, breaking environments per vLLM PR #3108.
- Practitioners reacted with âok this is not goodâ, rolled back to v0.10.1 with
python use_existing_torch.py
, and pushed maintainers for an alternative approach.
- Practitioners reacted with âok this is not goodâ, rolled back to v0.10.1 with
- cuBLAS TN Quirk Lands on Blackwell: Developers noted newer NVIDIA GPUs (Ada 8.9, Hopper 9.0, Blackwell 12.x) require TN (AâT, BâN) for
cublasLtMatmul
fast paths (cuBLAS docs).- While technically routine, some found the requirement âincredibly specificâ, reminding kernel authors to validate layouts across architectures to avoid silent slow paths.
- Paged Attention Post Peeks Inside vLLM: A new deep dive, Paged Attention from First Principles: A View Inside vLLM, covers KV caching, fragmentation, PagedAttention, continuous batching, speculative decoding, and quantization.
- Systems engineers flagged it as a practical explainer for memoryâbound inference design, clarifying why paged caches and batching policies dominate throughput.
5. Mojo/MAX Platform: Custom Ops and Bindings
- bitwise_and Blocks? Build Custom Ops Instead: Because adding RMO/MO ops via Tablegen isnât currently possible in closedâsource components, maintainers recommended implementing bitwise_and as a MAX custom op, keeping PRs open for potential internal completion later.
- Users hit API rough edges (broadcasting, dtype promotion), and a team member offered a quick demo notebook while acknowledging longâterm fixes are on the roadmap.
- DPDK Delight: Mojo Bindings Materialize: The community generated most DPDK modules in Mojo at dpdk_mojo, missing a few AST nodes and leaning on a Clang AST parser with JSON dumps for debugging and type reconstruction.
- They called
generate_bindings.mojo
âhackyâ but workable, aiming next at OpenCV while they iron out struct representation gaps in Mojo.
- They called
- Roll Your Own Mojo Dev Container: Builders shared a rollâyourâown approach to a Mojo dev environment using Docker, referencing mojo-dev-container as a base for a customized setup.
- This pattern packages the Mojo toolchain predictably, enabling consistent local development and CI without waiting on official images.
Discord: High level Discord summaries
Perplexity AI Discord
- DeepSeek Debuts, Demolishes Delusions: Members on Discord debated which model follows instructions better, with some stating that DeepSeek is less delusional than ChatGPT.
- No further details were given.
- Grok Garners Gripes for Garrulousness: Users on Discord complained about Grok giving what we didnât even ask for and for yapping too muchhh.
- Some believe Grok is programmed to be controversial for attention, while others find it hard to follow instructions.
- Comet Causes Controversy for Controlling Browser: Users discussed the Comet browser, an AI browser made by Perplexity, noting it can control ur browser, fill forms, open tabs, and even reply to emails.
- Some users expressed concerns about privacy and security, citing a reported vulnerability that allowed hackers to access user data, while others praised its ad-blocking and summarization capabilities.
- Perplexityâs API Parameter Problem Patched: A user reported a single API error a few hours ago, indicating that
num_search_results must be bounded between 3 and 20, but got 50
.- Another user confirmed that this was a known issue that got resolved, thanking the user for reporting the error.
Unsloth AI (Daniel Han) Discord
- Multi-GPU ETA Remains Elusive for Unsloth: Despite user praise for Unslothâs simplicity in single-GPU training, thereâs no ETA for official multi-GPU support, with development updates available in this Reddit thread.
- Users are encountering struggles with unofficial methods, underscoring the demand for native multi-GPU capabilities.
- Dynamic GGUF Quantization Desired for Model-Serving: A user expressed high interest in a Dynamic 2.0 GGUF service to improve quantization, suggesting a pay-for-service model, highlighting the need for I-matrices and their quantization schemes.
- They noted the labor-intensive process of model analysis, dynamic quantization, and testing puts strain on the Unsloth team.
- GuardOS: Privacy-Focused NixOS OS Goes Live: A member shared a link to GuardOS, a privacy-first NixOS-based operating system.
- Another member found the idea comical, stating the idea itself was already comical, but the top comment is even funnier.
- Unsloth BERT Model Fine-Tuning Support Confirmed: A user inquiring about Unsloth support for BERT models for finetuning with EHR data to classify ICD-10 codes received a link to a relevant Colab notebook.
- Unsloth officially supports certain models, and users are encouraged to experiment with others, making it suitable for classification tasks.
- Spectral Edit Reveals Audio Secrets: Insights from a spectral edit show that content lies around 0-1000 Hz, prosody between 1000-6000 Hz, and harmonics from 6000-24000 Hz.
- Harmonics determine audio quality and can reveal the sample rate by ear, suggesting natural generation or stretching crystal clear audio can add depth, similar to âfrequency noiseâ.
LMArena Discord
- O3 Model Flounders: Users report underwhelming performance from the O3 model on complex tasks, with some finding it worse than Gemini Pro.
- Conflicting opinions exist, however, as some users see O3-medium as potentially approaching GPT5-low level performance.
- Psychological Prompting: Brain Hack or Bust?: A user suggested employing psychological prompting tactics, such as instructing the AI to work your best with no exceptions.
- Skeptics argue that vague statements are ineffective and verbose prompting yields better results for LLMs.
- AI Spritesheet Factory: A user is generating spritesheet animations using AI, converting video to frames in just 10 minutes.
- They are using Gemini for character images and has posted ready spritesheet animations on itch.io called hatsune-miku-walking-animation.
- LM Arena Says Goodbye to Old Website: The legacy version of the LM Arena website, including alpha.lmarena.ai and arena-web-five.vercel.app, has been removed.
- A team member posted an announcement and invited users to submit feature requests for the current site.
- Nano-Banana Annihilates Seedream V4: Early reports suggest Seedream V4 is underperforming, even against Nano-Banana, especially in image editing tasks.
- Specifically it has trouble changing a personâs outfit while preserving their face and body position. Use this link to use Seedream V4.
HuggingFace Discord
- QLoRA Batch Size Blues: A member ran into batch size limitations using QLoRA with PEFT and a 7B model with 4096 token sequence length on an H200 GPU.
- Suggestions included checking FA2/FA3, setting
gradient_checkpointing=True
, using smaller batch sizes, and referencing Unsloth AI docs for context length benchmarks.
- Suggestions included checking FA2/FA3, setting
- ArXiv Paper Needs Endorsement: A user urgently seeks endorsement in CS.CL on ArXiv to publish a preprint featuring the Urdu Translated COCO Captions Subset dataset.
- The endorsement request URL was shared: here.
- Docker Model Runner Debuts: Users discussed using Ollama, Docker Model Runner, and Hugging Face for downloading and utilizing free models.
- Challenges with model availability were noted, with suggestions to consult the Hugging Face documentation and use a VPS.
- n8n valuation jumps to $2.3 Billion: A user inquired about integrating Hugging Face Open Source models within n8n, a no-code automation platform.
- An image was shared indicating that the Berlin-based AI startup n8n has seen its valuation skyrocket from $350 million to $2.3 billion in just four months, per this youtube video.
- Zero Loss on Smol Course: Members experienced zero loss during fine-tuning with an already fine-tuned model, recommending a base model for proper loss.
- A code snippet to disable thinking functionality in the tokenizer in the courseâs SmolLM3-3B can be found here.
Cursor Community Discord
- Cursor Plagued by Issues: Cursor users are reporting numerous issues with Cursor and are being directed to report them on the forum for assistance.
- Among the issues reported is Cursorâs auto mode using PowerShell commands to edit, prompting a user to request a bug report.
- Legacy Auto Mode Locks Subscribers: Users with annual subscriptions purchased before September 15th retain the old auto mode until their next renewal, according to pricing details.
- A user is attempting to use rules to make auto use the inline tools.
- Cursor Beta Obscures Release Notes: The latest Cursor release (1.6.6) is in beta, and release notes are scattered across the forum, requiring users to hunt for them.
- The pre-release nature of the version means rapid changes and potential feature removals.
- Director AI Pursues C3PO Dream: A user is trying to stop the whole Your absolutely right! crap by essentially trying to build a C3PO.
- The project is already running on an MCP server and integrated into Cursor.
- Linear Integration Snags on Repository Selection: A user reported that when assigning an issue to Cursor via Linear, it prompts to choose a default repository, even though one is already specified in the Cursor settings, as seen in the attached image.
- This recurring prompt occurs despite the user having configured the default repository within Cursor.
OpenRouter Discord
- OpenRouter query prompts have race condition bug: A member reported a bug related to a possible race condition in query prompting, where longer, more detailed prompts yield worse results for translations.
- No solution was found, but it was suggested to report the bug to the developers.
- Developers have Token Calculation Conundrums: A member inquired about calculating the number of tokens for input, seeking a non-heuristic method due to model-specific variations.
- It was suggested to use external APIs in conjunction with the endpointâs tokenizer information, as documented, since thereâs nothing in the documentation about that.
- JSONDecodeError surfaces in server responses: Users discussed a JSONDecodeError indicating an invalid JSON response from the server, often due to server-side failures like rate limiting, misconfigured models, or internal errors.
- The error suggests the server returned HTML or an error blob instead of valid JSON.
- Avoiding Moonshot AIâs turbo pricing: A user asked how to avoid the more expensive turbo version when selecting Moonshot AI as the provider for Kimi K2 in the OpenRouter chatroom.
- The solution offered was to select a cheaper provider in the advanced settings.
- iOS upload bug squashed: A user reported a bug where they couldnât upload PDF or TXT files to OpenRouter chat on iOS because non-image files were grayed out.
- It was confirmed as a bug, likely an oversight when file uploads were added, with no workaround available on iOS.
GPU MODE Discord
- Lambda Labs Cloud GPUs Face Instance Drought: Users reported inconsistent GPU instance availability with Lambda Labs, questioning the frequency and impact of cloud GPU shortages.
- The discussion underscored the importance of understanding the reliability of GPU resources when relying on cloud platforms for resource-intensive tasks.
- CUDA Graph Warmup Reaches Half Hour Mark: A user reported that CUDA graph warmup was taking half an hour in their low-bit-inference project, and another suggested that capturing a CUDA graph for decoding one token instead of generating many tokens may provide better results.
- The user may want to capture a single forward pass, not multiple passes like
model.generate()
does internally.
- The user may want to capture a single forward pass, not multiple passes like
- vLLMâs uv pip Transplants Nightly Torch: A member noted that vLLM switched to
uv pip
for custom builds with pre-installed torch, but it uninstalls the nightly torch, leading to environment issues, from this PR.- A member said ok this is not good, just saw their pr. Iâm gonna go ask them if they can find another way to do this, and another member reverted to
v0.10.1
building withpython use_existing_torch.py
.
- A member said ok this is not good, just saw their pr. Iâm gonna go ask them if they can find another way to do this, and another member reverted to
- MI300X Probes Potential Dual VALU Glitch: Users investigated a potential dual VALU issue on MI300X, where VALUBusy hits 200%, suggesting confirming it by limiting to one wave per SIMD, and using rocprof compute viewer and rocprofv3 to diagnose.
- The user was advised to launch 1216 waves to achieve 1 wave/simd, leveraging AMDâs documentation for thread tracing and rocprof compute viewer documentation.
- Kernel Dev Roadmap makes Progress: A member suggested adding a roadmap for kernels and increasing available kernels in GPU mode leaderboard, following the format of gpu-mode/reference-kernels.
- Members also mentioned that submissions can now be made online, with the primary need being an editor-like experience.
LM Studio Discord
- NVMe Upgrade Speeds Up Load Times: A user replaced a slow NVMe with a faster one, achieving a 4x improvement in sequential read speed and model load times.
- The user did not provide details on the old or new drives.
- Markdown Sub Tag Renders Incorrectly: A member reported that the
<sub>
tag has no effect on text inside it in Markdown style within LM Studio, and also that italic text is not rendered correctly when using asterisks such as*(n-1)*
.- There are ongoing discussions about the proper rendering of Markdown syntax, specifically with sub tags and italicized text.
- Western Digital Drives Blows Up: Users reported high failure rates with Western Digital Blue drives, humorously calling them Western Digital Blew Up drives.
- The users did not elaborate on the specific failure modes or use cases, but the consensus was to avoid the drives.
- PNY NVIDIA DGX Spark Plagued by ETA Shenanigans: Users joked about the PNY NVIDIA DGX Spark having conflicting ETAs, initially October then late August, as listed on linuxgizmos.com.
- The inconsistency in the release dates has led to speculation about the availability and production timeline of the device.
- Linux Dominates for Max+ 395 Box: Users recommended Linux over Windows for a Max+ 395 box, citing Vulkanâs functionality but noting potential context limits.
- It was suggested to use a custom-built llama.cpp with ROCm 7 from lemonade-sdk/llamacpp-rocm that already has compiled versions in Releases.
OpenAI Discord
- Laconic Game Causes Gemini 2.5 Pro to Hallucinate: A user joked that their laconic game was so strong that it caused Gemini 2.5 Pro to hallucinate.
- The user did not elaborate further on the nature of the hallucination or laconic game.
- GPT-5 Now Integrates Code Snippets and Linux Shell Access: A member reported that GPT-5 now writes its own code snippets to use as tools in a chain of tasks and appears to have access to an underlying Linux shell environment.
- Another member mentioned they vibe coded directly from the ChatGPT interface to develop an app hosted locally on GitHub.
- Custom MCPs Now Supported in OpenAI: Users can now use custom MCPs (Managed Cloud Providers) in OpenAI, according to the Connectors in ChatGPT documentation.
- This update allows for more flexibility and control over the infrastructure used by ChatGPT.
- Transparent Optimizations Proposal Introduced: A proposal for Transparent Optimizations was posted, introducing optimizer markers, prompt rewrite previews, and feasibility checks; the proposal was linked here.
- One member requested that associated PDFs be hosted online for easier access, rather than requiring downloads.
- AI Self Help Conversation Analyzer Launched: A member introduced a conversation analyzer called AI Self Help that helps determine why conversations take odd turns.
- The tool includes a conversation starter that lists issues and detailed questions to ask ChatGPT to get the answers.
Nous Research AI Discord
- Disable WebGL to fix perf issues: A member requested a feature to disable WebGL in the browser due to performance issues without a GPU, and suggested disabling the animation orb as well, shown in this screenshot.
- The suggestion came from someone working on a project requiring rapid iteration with automated bug fixes and updates, and passing the MOM test.
- Dataset quality better with Tokenizer Filtering: A member shared a link to dataset_build on GitHub, highlighting the idea of running languages through a modelâs tokenizer and rejecting those with unknown tokens to ensure quality.
- The approach also organizes calibration datasets using folders/directories for later combination.
- SBD accelerates LLM Gen: A new paper introduces Set Block Decoding (SBD), a paradigm that accelerates generation by integrating standard next token prediction (NTP) and masked token prediction (MATP) within a single architecture, without requiring architectural changes or extra training hyperparameters.
- Authors demonstrate that SBD enables a 3-5x reduction in the number of forward passes required for generation while achieving the same performance as equivalent NTP training by fine-tuning Llama-3.1 8B and Qwen-3 8B.
Latent Space Discord
- GPT-OSS Outshines Llama2 on a Budget: It was noted that running GPT-OSS 120B is cheaper than running Llama2 7B, with discussion suggesting that MoEs are the future.
- Optimizations for speeding up GPT-OSS, such as MXFP4 quantization, custom kernels, and continuous batching, were also mentioned.
- Altman Grilled in Murder Mystery: During an interview, Sam Altman was accused of murder, prompting a classic deflection move as highlighted in this video clip.
- A member shared that thereâs a clip on Twitter of this like 5 min segment.
- Codex Cranks Get Exclusive Peek: Alexander Embiricos invited heavy Codex users to beta test something new, as seen in this tweet.
- This might be related to conversation resume and forking, based on recent repository activity here.
- OpenAIâs Oracle Oddity Obscures Overspending: OpenAI reportedly signed a 5-year, $300 billion cloud-computing contract with Oracle starting in 2027 at $60 billion per year.
- Commentators are questioning OpenAIâs ability to afford the annual $60B cost against ~$10B revenue, raising concerns about energy and business-model sustainability.
- ByteDance Squeezes Googleâs Fruit: Deedy highlighted ByteDanceâs new Seedream 4.0 as top ranked on Artificial Analysis leaderboards, touting 2â4 K outputs, relaxed policies, faster generation, multi-image sets, and $0.03 per result.
- Community reactions range from glowing praise for quality and pricing to skepticism that Nano Banana still wins on speed and natural aesthetics.
DSPy Discord
- Math GPT App Seeks DSPy Savvy: A member seeks an advanced DSPy blog writing agent for a Math GPT app available at https://next-mathgpt-2.vercel.app/.
- The agent would presumably generate math-related content, given the nature of the Math GPT app.
- Pythonic Programs Propel Proliferation of Ports: A member suggested modeling and optimizing DSPy programs directly in Python and transpiling to languages like Go, Rust, or Elixir.
- A key challenge is how to export an arbitrary python program, perhaps by serving a backend to a python interface.
- Arbor Advantages Accelerate Adoption of RL: Members discussed using Reinforcement Learning (RL) in DSPy, but one member expressed fear of diving in because of the many moving parts and need for powerful GPUs.
- Another member said that Arbor + DSPy is quite seamless, and they are working on new things to make config even easier so everyting just works.
- Instructionsâ Immutability Incites Iterations: A member inquired whether instructions can be modified by an optimizer when using
signature.with_instructions(str)
.- It was clarified that mipro and gepa do modify the instructions, with the actual instructions saved in the
program.json
.
- It was clarified that mipro and gepa do modify the instructions, with the actual instructions saved in the
- DSJava: DSPy Dabbles in Java?: Members discussed a potential DSPy implementation in Java, perhaps dubbed DSJava?.
- One member does a hack version, compiling prompts in DSPy, then having a function in Rust for running prompt packs, but preferring to do it all in Rust.
Modular (Mojo đ„) Discord
- Mojovians Mobilize Mojo Docker: A member sought a Docker container checkpoint for running the Mojo dev environment, prompting suggestions to roll your own using existing images and the Mojo package with this GitHub repo.
- This approach allows for a customized Mojo development setup within a containerized environment.
- Mojo Compiler Aims Go-like Packaging: The Mojo compiler, set for open-sourcing in 2026, sparked discussions on its potential to replace venv with a Go-like packaging system, with Modular stating they have no current plans for it.
- The community debated the practicality of self-compilation versus leveraging existing Python ecosystems for package management.
- DPDK Gets Mojo Bindings: A member generated most modules for dpdk using Mojo, available on GitHub, but is missing a couple of AST nodes and found the
generate_bindings.mojo
script to be a bit hacky.- They are also using Clang AST parser to convert strings of types into proper AST nodes, dump the AST JSON for visual debugging, and then convert it to Mojo.
bitwise_and
Op Blocked by Closed Source: A member inquired about adding abitwise_and
op to the Modular repo, but was told that adding RMO and MO ops in Tablegen is not feasible due to closed source but it should work as a custom op.- The team is working towards supporting open source contributions to MAX and the PR can remain open for internal completion at a later date, but it requires diverging from the existing pattern for op definition in ops/elementwise.py.
- Graphs Grinding Gears During Staging: A member reported long staging times for graphs, citing GPT2-XL as an example that takes 3.6 seconds to define the Max graph with a warm cache, followed by 0.2 seconds to compile.
- A team member welcomed real-world examples for benchmarking and optimization purposes.
Yannick Kilcher Discord
- Brains Spark Sparsity Speculation: A member compared a certain sparsity ratio to primate/elephant/whale brains.
- However, the specific sparsity ratio and its context were not clearly detailed.
- Saturday Sessions Seek Source: A member inquired about the publication location of Saturday session papers, especially after a recent session.
- The discussions and papers are typically posted via the events feature, including a presentation followed by a discussion and relevant links.
- Planning Paper Presents Potential: Members reviewed the paper âPlanning with Reasoning using Vision Language World Modelâ (https://arxiv.org/abs/2509.08713) and deemed it a light read, although some references are potential candidates for future discussions.
- The paper was not deeply reviewed, but some found the references to be the most interesting aspect.
- Prompt Templates Promise Productivity: A straightforward paper on a prompt templating system (https://arxiv.org/abs/2508.13948) sparked a light discussion among members.
- The project page at microsoft.github.io/poml/stable/ was suggested as a better overview of the design and utility in different systems.
- SNNs Spark Scalability Speculation: Members discussed the resurgence of Spiking Neural Networks (SNNs), previously considered flawed, due to their extreme sparsity advantages at scale.
- One member noted the potential of unlocking the brainâs secrets (the brain is an SNN), which could be a gold mine, though it may require specialized hardware.
Eleuther Discord
- Community Welcomes Data Science Enthusiast: New member David, with a background in Data Science, Mathematics, and computational biology, introduced himself to the community.
- He expressed enthusiasm for open communities and looked forward to connecting with others.
- Delving into Data Splitting Strategies: A member suggested chronological splitting in data processing, instead of random truncation for improved data naming and combination.
- They also reported experimenting with combining bin packing and truncation to discard suffixes.
- Questioning Gaussian Noise Significance: A member questioned if neural network behavior on random Gaussian noise accurately reflects performance on structured inputs, referencing this image.
- The member claimed that if training an image classifier where one label is staticky TV screen, then Gaussian noise will systematically push inputs toward that class.
- Hallucination Detection Dataset Doubts: Members discussed a tweet by @NeelNanda5 and a related paper on hallucination detection, noting the dataset construction effort.
- A member argued that creating such a dataset and classifier to detect hallucinations might be similar to fine-tuning the model to avoid the issue altogether.
- Exploring Relational Hallucinations: Members explored how to define hallucination, noting that the more interesting hallucinations are in other modalities, citing a paper defining relational hallucinations.
- A member shared a link to a tweet on the topic.
aider (Paul Gauthier) Discord
- Engineer Seeks Help Tuning AI Documentation Agents: An engineer is seeking advice on tuning their documentation agent, built with Vercel AI SDK and Claude Sonnet 4, and is struggling to refine prompts without impacting existing performance.
- The agent consists of a team lead, document writer, and document critique that iterates up to 5 times per section, using braintrust for tracking.
- Defining Non-LLM AI Outputs: A member asked how to define âgood outputsâ from an AI model without calling an LLM, clarifying that good outputs mean following guidelines, not hallucinating stuff, sticking to requirements.
- The advice was to start with simple unit tests to check for the existence of certain keywords in the AI response, then enhance it with an LLM as a judge.
- Hamel Hussainâs Evals Blogpost Recommended for Tuning: A member recommended reading Hamel Hussainâs blogpost on evals for guidance on evaluating AI models.
- The blogpost was recommended alongside Eugene Yanâs resources, with the engineer finding Mastraâs guidebook useful.
- Aider Load Command Comments Out Lines: The â#â symbol comments out lines in files executed by the aider /load command.
- The aider LLM decides which files to edit, leveraging the repo map, which is sent to the LLM as part of the system prompt, while the repo map is constrained to a certain size to avoid token limits.
Moonshot AI (Kimi K-2) Discord
- Kimi K2 excels at deep research: A member noted that Kimi K2âs search capabilities are effective for deep research, capable of searching broadly and compiling interactive reports.
- The member inquired whether the Kimi K2 instruct chats are utilized for training the model, though further details on this aspect were not provided.
- K2 Research Mulls Email Integration: A member considered whether K2 Research could send emails during its research process, specifically for customer support scenarios.
- Another member responded and suggested that this capability is not yet implemented in K2 Research.
- Creative Writing Models Battle for Supremacy: A member suggested that Kimi K2, GPT-5 (Medium), and Qwen3-Max are among the best models for creative writing tasks.
- The member specifically lauded them as three good models for brainstorming.
Manus.im Discord Discord
- Manus Credits No Longer Given: Users reported that free credits are no longer available on Manus.
- The change was noticed in the main discord.
- Collaboration Feature Sees Light: A user expressed appreciation for Manus implementing a collaboration feature that an early user Prayer had originally requested.
- The feature was highly anticipated by the community.
- Next.js Migration: Too Much?: A user inquired about converting a Wordpress website to Next.js for Vercel hosting.
- Members pointed out that because Wordpress is in PHP and Next.js would require porting to React.js, it might be too much for a small, ~40-page small business website.
The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (1195 messagesđ„đ„đ„):
DeepSeek vs ChatGPT, Grok's persona, GPT-5 vs Perplexity, Comet Browser
- DeepSeek outperforms ChatGPT for factual recall: Members debated which model follows instructions better, with some stating that DeepSeek is less delusional than ChatGPT.
- Grok gets grilled for garrulousness: Users complained about Grok giving what we didnât even ask for and for yapping too muchhh.
- Some believe Grok is programmed to be controversial for attention, while others find it hard to follow instructions.
- ChatGPT plus versus Perplexity plus: Users state that ChatGPT gives better responses than Perplexity, particularly for simplifying complex concepts.
- One user says that Perplexity is useful as a search engine but is not good at chatting.
- Comet Browserâs AI Agent causes Controversy: Users discuss the Comet browser, an AI browser made by Perplexity, noting it can control ur browser, fill forms, open tabs, and even reply to emails.
- Some users expressed concerns about privacy and security, citing a reported vulnerability that allowed hackers to access user data, while others praised its ad-blocking and summarization capabilities.
Perplexity AI â· #sharing (5 messages):
Shareable Threads, ProductHunt Vote, Image attachments
- Shareable Threads reminder: Perplexity AI reminded users to ensure their threads are Shareable, with an attachment showing how.
- No further details were given.
- ProductHunt Vote Requested: A member asked for votes on ProductHunt.
- It is unclear what the product is.
- Image attachments: A user shared a series of image attachments with no context.
- There are four images in total in the series.
Perplexity AI â· #pplx-api (3 messages):
Friend Requests, API Errors, num_search_results errors
- Friend Requests Coming In Hot: A user sent a friend request seeking to test the beta, offering feedback on its use with webapps and n8n.
- API parameter error squashed: A user reported a single API error a few hours ago, indicating that
num_search_results must be bounded between 3 and 20, but got 50
.- Another user confirmed that this was a known issue that got resolved, thanking the user for reporting the error.
Unsloth AI (Daniel Han) â· #general (562 messagesđ„đ„đ„):
Unsloth hardware compatibility, Training TTS models on Colab, Multi-GPU Support Roadmap, BERT models in Unsloth, Dynamic GGUF Quantization Requests
- M1/M2 Macs not locally compatible yet: A new user with an Apple M2 Max MacBook Pro discovered that Unsloth doesnât currently support local training on M1/M2 Macs but was directed to use MLX instead, as per the Unsloth requirements.
- A community member suggested leveraging free Google Colab for learning and tuning, then exporting models to GGUF format for local use with software like LM Studio and also shared a link to Unsloth Notebooks.
- Colab Good for TTS Training?: A user asked about training high-quality, small-sized TTS models with low RTF on Google Colab and wanted to know if it would be entirely free.
- It was clarified that while Google Colab offers some free usage, limitations exist, and Unsloth supports TTS fine-tuning as well, with documentation available here.
- Multi-GPU ETA Missing In Action: A user inquired about the roadmap and timeline for official multi-GPU support, noting struggles with unofficial methods and praising Unslothâs simplicity in single-GPU training.
- The response indicated thereâs no ETA currently, but itâs in development with further updates available in this Reddit thread.
- Dynamic GGUF Quants Sought for Model-Serving: A user expressed high interest for a Dynamic 2.0 GGUF service to improve quantization, suggesting a pay-for-service model, along with expressing the need of I-matrices and their quantization schemes
- The community member noted that the labor intensive process of model analysis, dynamic quantization, and testing for these quants can put strain on the Unsloth team which is already contributing to the OSS community.
- New Models from OpenAI, Users React: A member shared an image of new models, to which others responded with humor.
- After the image of new models was shared, a member listed out additional options for the AI community and members discussed sparsity of the newly unveiled models by OpenAI.
Unsloth AI (Daniel Han) â· #introduce-yourself (4 messages):
AI Engineering, AI Startups, Microservices, LLMs
- AI Engineer Architects High-Leverage Systems: An AI engineer introduced themselves as someone who architects high-leverage systems and is currently building the AI-native OS at pxsol, forging a stealth startup, and writing a weekly dispatch on building 1-person enterprises.
- They are always open to jamming on autonomous engines, agentic systems, and the intersection of AI + quant finance.
- AI Engineer Intern Leads Startup Projects: An AI Engineer intern introduced themself working at a startup while still being a student.
- They focus on making microservices for AI applications, designing AI system architecture, playing with LLMs and finetuning and deploying them when needed.
Unsloth AI (Daniel Han) â· #off-topic (109 messagesđ„đ„):
48GB 4090s, Privacy-First NixOS-Based OS, Unsloth Dependency Hell, Luau Learning and LeetCode, Promptwright DAG
- 48GB 4090s Tempt but Wait?: Members discussed the temptation of buying 48GB 4090s for $3k, but weighed the pros and cons with the expected release of the 5090.
- One member showed a <:redaahhhh:1238495916202397726> after saying U sure u want to risk it?.
- GuardOS: Privacy-First NixOS OS Debuts: A member shared a link to GuardOS, a privacy-first NixOS-based operating system.
- Another member found the idea comical, stating the idea itself was already comical, but the top comment is even funnier.
- Tackling Unslothâs Dependency Labyrinth: A member joked about the dependency hell when using Unsloth, spending time on getting the correct dependencies, using the command
uv tree --package unsloth > uv_tree_package_unsloth.txt
.- Another member recommended using
uv
and pinning dependencies, mentioning that even when something is breaking it is typically pretty verbose about why.
- Another member recommended using
- Teaching AI Luau and Loving LeetCode Less: Members discussed training an AI on Luau with LeetCode problems, where the AI initially yapped more, then reduced fluff as it progressed.
- It was noted that the shorter answers led to faster training, but the model was only passing half the unit tests on average, but now the AI is getting quiet and mad.
- Promptwright Pioneers DAG Dataset Seeding: A member announced a new experimental Directed Acyclic Graph (DAG) dataset seed generation algo in Promptwright.
- The new algorithm is being used for domain-specific distillation (teacher -> SLM) synthetics.
Unsloth AI (Daniel Han) â· #help (220 messagesđ„đ„):
Phi3-mini quantization, Unsloth BERT models, System prompt structure, Custom Santa voice, Qwen2.5-vl perceived dimensions
- Pascal User Plugs Phi3 Mini Performance: A user running phi3-mini-4k-instruct on a GTX 1050ti (4GB VRAM) experiences long inference times (~8mins) without quantization and seeks advice given issues with bitsandbytes on Pascal architecture.
- The user lacks experience with AWQ and wants to know the best approach to improve performance.
- Unsloth BERT support surfaces: A user inquired about Unsloth support for BERT models for finetuning with EHR data to classify ICD-10 codes, getting a link to a relevant Colab notebook.
- Unsloth officially supports certain models, and users are encouraged to experiment with others.
- System Prompt Structure Scrutinized: A userâs approach of using multiple system lines per prompt for maintaining memory in Python was flagged as really, really bad, and restructuring into a single system prompt with chunks was suggested.
- The user was advised to follow the training structure for better outcomes, emphasizing that clear instructions and testing are essential.
- TTS model training is Tough Sledding for Santa Voice: A user wants to create a custom Santa voice in Swedish for an office project and was directed to the Unsloth TTS documentation.
- It was noted that audio model data is limited, especially for languages like Swedish, and zero-shot voice cloning with tools like ResembleAIâs Chatterbox (which supports Swedish) was mentioned as an alternative, along with in-house, no-cloud options.
- H100 Docker driver debacle develops: A user faces CUDA errors when running a Docker image (that works on 3090/4090) on an H100 GPU, and it was found that the driver version was incompatible with the H100.
- The user was advised to install the correct NVIDIA data center drivers and possibly switch to a more reliable cloud provider instead of community clouds, with the warning that community clouds might be unstable and pose security risks.
Unsloth AI (Daniel Han) â· #showcase (11 messagesđ„):
Markov Chains, MoonshotAI's checkpoint-engine, vLLM v0.10.2rc1
- Markov Chains still feel like Magic: A member was dumbfounded that their own n-gram based Markov chain gave halfway sensible sounding output.
- Another member mentioned someone who did a Markov chain on Discord chats, resulting in uncanny valley type content.
- MoonshotAI releases checkpoint-engine: A member shared a GitHub repo by MoonshotAI called checkpoint-engine.
- Another member asked if it could be used for GRPO inference with some optimizations.
- vLLM v0.10.2rc1 inference engine validated: All results above are tested by
examples/update.py
and use vLLM v0.10.2rc1 as inference engine (on an H800 or H20 machine with 8 GPUs).
Unsloth AI (Daniel Han) â· #research (2 messages):
Spectral Edit, Audio Analysis, LLM Inference
- Spectral Edit Exposes Audio Secrets: A member shared insights from a spectral edit, noting that content lies around 0-1000 Hz, prosody between 1000-6000 Hz, and harmonics from 6000-24000 Hz.
- They added that harmonics determine audio quality and can reveal the sample rate by ear, and natural generation (or stretching crystal clear audio) can add depth, similar to âfrequency noiseâ.
- LLM Inference Nondeterminism Defeated!: A member shared a blog post from Thinking Machines AI regarding defeating nondeterminism in LLM inference.
LMArena â· #general (820 messagesđ„đ„đ„):
O3 Model Performance, Psychological Tactics in Prompting, AI Spritesheet Animation, LM Arena Legacy Website Removal, Nano-Banana vs Seedream V4
- O3 Model Draws Negative Reviews: Users are reporting that the O3 model is not performing well on complex tasks, with one user stating it outright says no to complex instructions.
- Another user expressed the opposite sentiment, claiming O3-medium is at GPT5-low level at best, but another user disputes that, saying Gemini Pro is better.
- Psychological Tactics may Improve LLM Reflection: A user suggested employing psychological tactics in prompting, advising to instruct the AI to work your best with no exceptions and reflect from different perspectives for optimal results.
- However, another member countered that vague statements like work your best are meaningless to LLMs and that more verbose prompting is the way to go.
- Automated AI Spritesheet Animations Emerge: One user is generating spritesheet animations using AI, starting with character images from Gemini, assembling them on a grid, animating them, and converting the video to frames, resulting in a process that takes only 10 minutes.
- The user inquired about the permissibility of sharing links, having already posted some ready spritesheet animations on itch.io called hatsune-miku-walking-animation.
- LMArena Shuts Down Legacy Website: Users are lamenting the removal of the legacy version of the LM Arena website, including alpha.lmarena.ai and arena-web-five.vercel.app.
- A member of the LM Arena team has posted a link to the announcement and invited users to submit feature requests for the current site.
- Nano-Banana Reigns Supreme over Seedream V4?: Some users are reporting that Seedream V4 is getting massacred and performs worse than Nano-Banana, particularly in image editing tasks like changing a personâs outfit while preserving their face and body position.
- Itâs been suggested to use this link to use Seedream V4 though one user is still waiting for Gemini 3.
LMArena â· #announcements (3 messages):
Seedream-4, Qwen3-next-80b-a3b-instruct, Qwen3-next-80b-a3b-thinking, Hunyuan-image-2.1
- Seedream-4 joins LMArena: A new model, Seedream-4, has been added to the LMArena leaderboard.
- Qwen3-next-80b duo makes debut: Two new models, Qwen3-next-80b-a3b-instruct and Qwen3-next-80b-a3b-thinking, have been added to the LMArena.
- Hunyuan-image-2.1 hits the Arena: A new model, Hunyuan-image-2.1, has been added to the LMArena leaderboard.
HuggingFace â· #general (218 messagesđ„đ„):
PEFT QLoRA Training, ArXiv Endorsement Request, WACV Paper Submission, LLM Fine-tuning Course Study Group, Mobile App Image Search
- PEFT QLoRA batch size woes: A member is facing issues with batch size limitations while using QLoRA with PEFT and a 7B model with a 4096 token sequence length on an H200 GPU.
- A suggestion was made to check if FA2 or FA3 is disabled, or if
gradient_checkpointing=True
was not set, as well as trying smaller batch sizes from 1-7 due to potential OOM issues, and also check the Unsloth AI docs.
- A suggestion was made to check if FA2 or FA3 is disabled, or if
- ArXiv Endorsement Needed Urgently: A user requests an endorsement in the CS.CL category on ArXiv to publish a preprint of a research paper featuring the Urdu Translated COCO Captions Subset dataset.
- The endorsement request URL is here.
- Docker Model Runner Debuts: Users discussed using Ollama, Docker Model Runner (a new feature from Docker Desktop), and Hugging Face for downloading and utilizing free models.
- A user reported that some models arenât available and others suggested the Hugging Face documentation and also mentioned the use of a VPS (Virtual Private Server).
- n8n valuation surges to dizzying heights: A user inquired about integrating Hugging Face Open Source models within n8n, a no-code automation platform.
- An image was shared indicating that the Berlin-based AI startup n8n has seen its valuation skyrocket from $350 million to $2.3 billion in just four months, per this youtube video.
- OpenAI should invest 100B into Hugging Face?: A user comically suggested OpenAI should invest $100B into Hugging Face.
- Others noted ongoing platform issues, such as exceeding monthly inference credits despite having funds available:
{'error': 'You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits.'}
- Others noted ongoing platform issues, such as exceeding monthly inference credits despite having funds available:
HuggingFace â· #today-im-learning (1 messages):
saadkhan_188: Same situation as âđ»
HuggingFace â· #smol-course (47 messagesđ„):
Multilingual Smol Course, GPU Setup for Smol Course, Study Group for Smol Course, Loss issues with fine-tuning, Certification Process
- Multilingual Smol Course?: A member inquired about following the previous version of the smol-course in other languages like Spanish, or if itâs better to follow the latest updated version directly in English.
- GPU for Smol Course: A member mentioned having 4 A6000s and inquired about using them with axolotl for the smol course.
- Smol Course Study Group launched: A member created a study group and shared the link for others to join and learn together.
- Zero Loss while Fine-tuning: Members discussed experiencing zero loss during fine-tuning when using an already fine-tuned and instruction-tuned model, suggesting that using a base model might yield proper loss.
- Thinking responses using SmolLM3-3B: A member reported getting reasoning responses when using SmolLM3-3B, and another member provided a code snippet to disable the thinking functionality in the tokenizer.
HuggingFace â· #agents-course (2 messages):
Ollama, local model
- Ollama Newbie Seeks Guidance: A student who ran out of tokens is now using Ollama to run the model locally.
- They asked if the code from the first_agent_template needs to be changed to work with a local model.
- Study Group forming!: A new course-taker with 5 years of experience as a software engineer is looking for study buddies!
- They are new to HuggingFace courses but have been experimenting with agents.
Cursor Community â· #general (178 messagesđ„đ„):
Cursor Issues, Cursor Auto Mode, Cursor Pricing, Student Verification, Token Refund
- Cursor users report numerous issues: Users report experiencing various issues with Cursor, and are looking for assistance, directing them to report them on the forum.
- Cursorâs Auto Mode edit tool uses Powershell: A user reported that Cursorâs auto mode uses PowerShell commands to edit, and is requesting a bug report to solve this problem.
- Another member replied that he is trying to use rules to make auto use the inline tools.
- Subscribers locked into old auto mode: Users with an annual subscription purchased before September 15th will have the old auto mode until their next renewal, more details on the pricing.
- Cursor 1.6.6 Release Notes Treasure Hunt: The latest Cursor release (1.6.6) is in beta, and its release notes are not directly provided; instead, users need to search for discussions on the forum.
- The reason for the lack of official release notes is that the version is still in pre-release, and changes are happening rapidly, with features potentially being removed.
- Director AI building C3PO: A user is working on a project to stop the whole Your absolutely right! crap by essentially trying to build a C3PO.
- The project is already running on an MCP server and integrated into Cursor.
Cursor Community â· #background-agents (1 messages):
Cursor Linear integration, Default repository settings, Linear integration issues
- Linear asks for default repo despite Cursor settings: A user reported that when assigning an issue to Cursor via Linear, it prompts to choose a default repository, even though one is already specified in the Cursor settings, as seen in the attached image.
- Cursorâs Linear integration faces repository selection snag: A user encounters a recurring prompt in Linear to select a default repository when assigning issues to Cursor, despite having already configured this setting within Cursor, raising concerns about the integrationâs functionality.
OpenRouter â· #app-showcase (2 messages):
â
- No Recent Activity: There has been no recent activity in the app-showcase channel to summarize.
- The channel appears to be quiet at the moment.
- Awaiting New Content: The summarization bot is awaiting new content to provide relevant and informative summaries.
- Please check back later when there is new activity in the channel.
OpenRouter â· #general (125 messagesđ„đ„):
Query Prompting Race Condition Bug, Token Calculation, JSONDecodeError, Moonshot AI Provider Selection, LongCat Implementation
- Query Prompting Race Condition Bug Reported: A member reported a bug related to a possible race condition in query prompting, where longer, more detailed prompts yield worse results for translations.
- No solution was found, but it was suggested to report the bug to the developers.
- Token Calculation Conundrums: A member inquired about calculating the number of tokens for input, seeking a non-heuristic method due to model-specific variations.
- It was suggested to use external APIs in conjunction with the endpointâs tokenizer information, as documented, since thereâs nothing in the documentation about that.
- JSONDecodeError Troubleshooting: Users discussed a JSONDecodeError indicating an invalid JSON response from the server, often due to server-side failures like rate limiting, misconfigured models, or internal errors.
- The error suggests the server returned HTML or an error blob instead of valid JSON.
- Moonshot AI Pricing: A user asked how to avoid the more expensive turbo version when selecting Moonshot AI as the provider for Kimi K2 in the OpenRouter chatroom.
- The solution offered was to select a cheaper provider in the advanced settings.
- OpenRouter iOS File Upload Bug: A user reported a bug where they couldnât upload PDF or TXT files to OpenRouter chat on iOS because non-image files were grayed out.
- It was confirmed as a bug, likely an oversight when file uploads were added, with no workaround available on iOS.
OpenRouter â· #new-models (3 messages):
â
- No new models updates to report: There were no discussions or updates regarding new models in the specified channel.
- The channel activity consisted of repeated bot messages indicating the channelâs name.
- Readybot.io Spam: The channel ânew-modelsâ solely contained repeated messages from Readybot.io.
- These messages simply stated the channel name: âOpenRouter - New Modelsâ.
OpenRouter â· #discussion (29 messagesđ„):
Grok Code inference pricing, Kilocode's Free Grok usage, OpenRouter pricing model
- OpenRouterâs Grok Code inference: Paid or Free?: Members discussed whether Grok Code inference via OpenRouter is entirely paid, with one initially thinking so, but then they realized, as others pointed out, that services like Kilocode offer it for free.
- The discussion highlighted the speed advantages of Grok Code and the surprise at the cheap cache price of 2 cents.
- Kilocodeâs Grok Code: xAI foots the bill: The group discussed who covers the cost when Kilocode offers free Grok Code, clarifying that xAI eats the cost for the free Grok Code usage on platforms like Kilocode.
- One member suggested they are probably using BYOK and openrouter is charging either a monthly fee or/and small cut.
- OpenRouterâs Revenue Model: BYOK or small cut?: Members speculated about OpenRouterâs pricing model, wondering if it involves a monthly fee, a small cut, or Bring Your Own Key (BYOK).
- Another member added that OpenRouter gets ranked since they are routing through OpenRouter.
GPU MODE â· #general (1 messages):
Lambda Labs, Cloud GPUs, GPU Availability, GPU Instance Shortages, Cloud Computing
- Lambda Labsâ Cloud GPUs Face Instance Crunch: A user inquired about the availability of cloud GPUs from Lambda Labs, mentioning the current lack of GPU instances.
- They questioned the frequency of such shortages, seeking insights from the community on the consistency of Lambda Labsâ GPU availability.
- Cloud GPU Availability Concerns Emerge: The discussion highlights potential challenges in securing cloud GPU resources, particularly with providers like Lambda Labs.
- The userâs query underscores the importance of understanding the reliability and consistency of GPU instance availability when relying on cloud computing platforms for resource-intensive tasks.
GPU MODE â· #triton (6 messages):
CUDA, PTX, TLX authors, Triton Compiler
- CUDA and PTX: DSLâs Mind-Reading Hopes: A member inquired about CUDA and PTX roles, highlighting that DSL intends to provide an abstraction barrier where the compiler sort of reads our mind and does fast things.
- They questioned whether the Triton compiler backend isnât optimizing well or if the algorithm needs assumptions made at a more granular level than Triton provides.
- TLX Authors Wants More Granular Control: A member mentioned that TLX authors might want to instruct the compiler to emit cp.async instead of just using tl.load.
- This would give the user more granular control over the compiled code.
GPU MODE â· #cuda (6 messages):
Flash Attention 1 vs Flash Attention 2, Q-outer vs KV-outer loops, FA2 main difference
- Flash Attention 1 Loop Ordering: In the Flash Attention 1 paper, the outer loop iterates over K/V tiles, with Q in the inner loop.
- A member inquired whether FA1 kernels can instead use Q-outer, K/V-inner (load one Q tile into on-chip memory and then iterate over all K/V tiles with online softmax).
- FA2 does the Q-outer loop: A member stated that what the previous member described is exactly what FA2 does.
- The main difference between Flash Attention 1 and 2 is the loop ordering (Q outer vs KV outer) and logsumexp for backward computation.
GPU MODE â· #torch (18 messagesđ„):
CUDA Graph Warmup, vLLM uv pip build, Prefill Compile
- CUDA Graph Warmup takes Half an Hour?: A member asked if the CUDA graph warmup taking around half an hour is a common experience, or if they are doing something wrong, while profiling their low-bit-inference project.
- Another member suggested capturing a CUDA graph for decoding one token instead of generating many tokens and noted that the user may want to capture a single forward pass, not multiple passes like
model.generate()
does internally.
- Another member suggested capturing a CUDA graph for decoding one token instead of generating many tokens and noted that the user may want to capture a single forward pass, not multiple passes like
- vLLM Switches to uv pip for Custom Builds: A member reported that vLLM switched to using
uv pip
to custom build vLLM with a pre-installed torch version, but it uninstalls the nightly torch and messes up the whole environment.- Another member responded to this problem with, ok this is not good, just saw their pr. Iâm gonna go ask them if they can find another way to do this, and another member reverted to
v0.10.1
building withpython use_existing_torch.py
.
- Another member responded to this problem with, ok this is not good, just saw their pr. Iâm gonna go ask them if they can find another way to do this, and another member reverted to
- Prefill Compile causes Autoquantization issues: A member said he was trying to compile both the prefill and the decode stage the gpt-fast way but prefill compile is unrealistic, and he will remove the prefill compile and keep it simple for now so that he can make more progress.
- They also mentioned that compiling prefill has caused some issues with autoquantization.
GPU MODE â· #algorithms (1 messages):
person12341234432: whaddafak is thaat
GPU MODE â· #beginner (5 messages):
CUDA benchmarks, GPU Synchronization, P104-100 BIOS Flash
- CUDA Benchmarks Hit the Clock: A member benchmarked CUDA with CPU time at 35.967 ms and GPU time at 631.404 ms.
- No further discussion or context was provided.
- GPU Block Synchronization Query: A member inquired about the possibility of synchronizing different blocks from different clusters.
- No responses or further details were given regarding this query.
- P104-100 BIOS Flash Quest: A member with a P104-100 mining GPU asked for a GTX1070 BIOS .rom to flash it for gaming use.
- No one offered any assistance or files in response to this request.
GPU MODE â· #pmpp-book (2 messages):
PMPP Book, Kernel Writing, Learning on the Fly
- PMPP Book: How much to read?: A member inquired how much of PMPP they should read.
- They wondered at what point others were able to write kernels for such applications and whether they were able to learn on the fly.
- Kernel Writing Curiosity: A member expressed curiosity about the point at which others could write kernels for specific applications.
- The inquiry also touched on the possibility of learning on the fly, suggesting an interest in practical, immediate application.
GPU MODE â· #rocm (27 messagesđ„):
MI300 dual VALU issue, waves per simd control, compute throughput calculation, AMD GPU for local running, Strix Halo unified memory machine
- Dual VALU Issue on MI300X Investigated: A user reported a potential dual VALU issue on MI300X, where VALUBusy goes to 200%, and another user suggested confirming it by having just one wave per SIMD or checking for cycles with 2 waves issuing VALUs in thread trace.
- The user was advised to launch 1216 waves to get 1 wave/simd, given MI300X has 1216 SIMDs, and to use rocprof compute viewer and rocprofv3 (ROCm 7.0+) for thread tracing with AMDâs documentation and rocprof compute viewer documentation.
- AMD GPU Offered for Local Experiments: A member offered an AMD GPU for local running, leading to a discussion about expired AMD dev cloud credits and the need for a machine with decent CPU cores for PyTorch builds.
- The user mentioned having $2K in credits that expired and being interested in sanity checks without worrying about expiring credits.
- Strix Halo and RDNA4 Considered for Local LLM: Members mentioned the Strix Halo and RDNA4 cards as great options for running local LLMs, with one offering a Strix Halo with a 128GB Framework workstation and an RX9070XT 16G.
- A user shared their experience enabling PyTorch on Windows for Framework and Strix Halo, referring to a post on X.
- Unified Memory on Strix Halo Explored: A user inquired whether the Strix Halo is a unified memory machine, and another responded that one can dynamically choose how much RAM vs. VRAM to allocate from the 128GB pool.
- It was confirmed that the Strix Halo indeed features unified memory.
- Linter Woes Resolved: After a suggestion to add ignores on relevant lines to a PR, a member confirmed that the linter is now passing and requested a CI re-trigger.
- The member thanked another for the assistance and confirmed they would take care of it.
GPU MODE â· #self-promotion (3 messages):
MXFP quantization in Triton, Paged Attention in vLLM
- MXFP Quantization talk in Triton next Tuesday: A member will present a talk about MXFP quantization in Triton next Tuesday, covering MXFP/NVFP4 formats, writing MXFP/NVFP4 gemms, and efficient activation quant kernels.
- Register at the Livestorm link to attend.
- Paged Attention Blogpost Released: A member published a blog post, Paged Attention from First Principles: A View Inside vLLM, diving into KV caching, scaling/fragmentation issues, and PagedAttention in systems like vLLM.
- The post covers basics of memory-bound inference, continuous batching, speculative decoding, and quantization, with inspiration from Aleksa Gordic; read it at the Bear Blog link.
GPU MODE â· #submissions (24 messagesđ„):
MI300x8 submissions, Leaderboard Submission Questions, amd-all2all leaderboard
- MI300x8 Scores Surge: Several submissions were made to the
amd-all2all
leaderboard using MI300x8, with times ranging from 1428 ”s to 53.2 ms.- One user achieved 10th place with 2.65 ms, and another secured 6th place twice with 1789 ”s and 1778 ”s.
- Submitting Shenanigans: A user inquired about submitting multiple files to the competition, clarifying if only one file submission was permissible.
- Another user explained the submission process using the
/leaderboard submit
command, including options forranked
,test
, orbenchmark
submissions to theamd-all2all
leaderboard, with documentation available here.
- Another user explained the submission process using the
GPU MODE â· #factorio-learning-env (3 messages):
Factorio Learning Environment, Game Modding, Resource Management, Automation Strategies
- Eager Engineers Enquire!: Enthusiastic members of the Factorio learning environment greet each other and express their interest in learning and sharing strategies.
- The community buzzes with anticipation for collaborative gameplay and knowledge exchange, ready to optimize their factories.
- Factorio Fanatics Forge Forward: Players dive into discussions about efficient resource management and advanced automation techniques within the Factorio universe.
- Strategies for balancing production lines and optimizing logistics networks emerge as key topics of interest, with members eager to share blueprints and custom mods.
GPU MODE â· #amd-competition (26 messagesđ„):
Wuxin hints, Submission ranking updates, Multiple file submissions, Fairness in competition results, Triton error on AMD GPU
- Hints are coming and past completions are not yet available: A member asked for hints regarding Wuxin, and whether solutions for previous completions are posted, with another member responding that they had accidentally posted the solution not using communication, and it has been deleted.
- They thanked those who self-reported, making the competition fairer and easier to moderate, and that the leaderboard has been updated.
- Ranking Updates on the competition site: A member inquired about their ranking not updating, to which another member asked about submitting to the ranked leaderboard and if the score was better than the last, and mentioned using /leaderboard list to check.
- They later confirmed it updated after 15 minutes and another member stated that this is somewhat expected, and asked a third member about the expected refresh rates for the site.
- Results instability worries fairness: A member expressed concern about fairness due to unstable results, noting that the same script could be ranked ±100Όs in different submissions.
- A member mentioned that in the last competition, top solutions were re-run many times and averaged to avoid thermal bias.
- Problems Submitting Multiple Files: A member inquired about submitting multiple files, and another member confirmed that only one file can be submitted, which makes it easier to evaluate and later share the code.
- Triton Error Plagues AMD GPU: A member reported a memory access fault when using Triton to implement op-fusion code on an AMD GPU, which works on Nvidia GPU.
- Another member suggested itâs likely an out-of-bounds access issue, and suggested setting
PYTORCH_NO_CUDA_MEMORY_CACHING=1
and running withcompute-sanitizer
to debug, while a third asked to share the triton script.
- Another member suggested itâs likely an out-of-bounds access issue, and suggested setting
GPU MODE â· #general (18 messagesđ„):
Kernel Development Roadmap, GPU Mode Leaderboard, KernelBot Development, AMD Competition, Reference Kernels
- Brainstorming Kernel Dev Roadmap for GPU Mode: A member suggested adding a roadmap for kernels and increasing available kernels in GPU mode leaderboard after browsing LeetGPU, liking their series of problems despite its flaws.
- Others agreed, emphasizing the value of a structured learning path for kernel development and bridging the gap between theoretical knowledge and practical application, with a focus on designing problems that highlight useful, currently un-optimized kernels.
- KernelBotâs Problem Pipeline Needs Problems: It was mentioned that kernelbot needs a lot of problems, with a call to action to ping specific users for questions and contributions following the format of gpu-mode/reference-kernels.
- Members mentioned that submissions can now be made online, with the primary need being an editor-like experience.
- GPU Mode: No Monetization Here: The team clarified that everything related to GPU Mode is open-source and they have no interest in monetizing it, with the exception of their Heroku runner.
- The Heroku runner solely manages requests and stores data in their database.
- Get Your AMD Competition Submission In!: One of the members reminded another member to get their submission to the AMD competition soon.
- A word of caution was also mentioned: joining the kernelbot dev team could make one ineligible for the $100K prize!
- Reference Kernel Updates Deployed: A member thanked another for their contribution to gpu-mode/reference-kernels/pull/62.
- The team committed to test and deploy the changes.
GPU MODE â· #multi-gpu (2 messages):
Claude vs AI tools, AI debugging, AI expertise
- Claude preferred over other AI: A member stated that Claude is preferred.
- It seems some prefer not to use AI for tasks they may need to do themselves eventually, like debugging.
- AI Debugging: yea or nay?: A member expressed hesitations about using AI for tasks like debugging.
- They suggest avoiding AI assistance in areas where one needs to gain experience.
- AI Expertise Level: A member specified that using AI is acceptable when one is already an âexpertâ in the task.
- This suggests a strategic approach to AI adoption based on proficiency.
GPU MODE â· #low-bit-training (16 messagesđ„):
Blackwell (5090) support for cuBLAS, Low precision training codebase, Custom Zero-3 quantization for forward and backward passes, CUDA memory copies vs NCCL AllGather, NCCL CE Collectives and SM usage
- Blackwell Joins TN Party in cuBLAS: Newer NVIDIA GPUs such as Ada (8.9), Hopper (9.0), and Blackwell GeForce (12.x) require the TN format (A transposed, B non-transposed) for cuBLAS.
- The user was very specific about this requirement for Blackwell, joking that it was âincredibly specificâ.
- Low Precision Training Codebase in Demand: A member expressed interest in releasing a low precision training codebase with optimizations, but also acknowledged that achieving performant quantization with Zero-3 requires a manual implementation.
- They suggested gathering weights for backward passes and requantizing simultaneously to conserve memory bandwidth, but noted that âfighting pytorch every step of the wayâ makes supporting new models agonizing.
- Zero-3 Gets Custom Quantization: Implementing a custom Zero-3 configuration that quantizes weights differently for forward and backward passes was described as potentially complex but worthwhile.
- A challenge highlighted was maintaining contiguous memory shards during the forward pass while fusing quantization and transposition.
- CUDA memcpy2D Bests NCCL AllGather: A member working with consumer systems found that
cudaMemcpy
(specificallycudaMemcpy2D
) outperformedncclAllGather
.- The latest version of NCCL includes a changelog entry about optionally using the copy engine for gathers, motivated by freeing up SMs instead of bandwidth.
- NCCL CE Collectives free SMs: The purpose behind NCCL CE Collectives is to free up SM usage for better overlapping with compute.
- It was mentioned that vLLM recently added this optimization, resulting in âridiculously fastâ speeds.
LM Studio â· #general (66 messagesđ„đ„):
NVMe speed improvement, Model for Python code generation, Markdown rendering bug with sub tags, VRAM misidentification on Vulkan, Context usage in taskbar bug
- NVMe Upgrade Boosts Model Load Times: A member replaced a slow NVMe with a faster one, resulting in a 4x improvement in sequential read speed and model load times.
- Quest for Pythonic PDF Interpreter: A member seeks a model and tool capable of writing Python code and matching results from a PDF containing numerical methods and equations.
- Markdown
<sub>
Tag Rendering Issue Reported: A member reported that the<sub>
tag has no effect on text inside it in Markdown style within LM Studio, and also that italic text is not rendered correctly when using asterisks such as*(n-1)*
. - Vulkan VRAM Misidentification Bug Squashed?: A member reported a bug where VRAM is incorrectly identified on Vulkan as 10^3 the actual size, and the poster noted this is not the bug report forum.
- Flash Attention Flounders with Gemma on Vulkan: Members reported that flash attention may be broken in the Gemma models on Vulkan, but another member noted itâs a known issue.
LM Studio â· #hardware-discussion (86 messagesđ„đ„):
Western Digital Drives Failure Rate, PNY NVIDIA DGX Spark ETA Issues, Framework Product Concerns, RAM and Motherboard Issues, AMD APU VRAM Utilization
- Western Digital Drives have High Failure Rate: Users reported high failure rates with Western Digital Blue drives, humorously calling them Western Digital Blew Up drives.
- PNY NVIDIA DGX Spark faces ETA Delays: Users joked about the PNY NVIDIA DGX Spark having conflicting ETAs, initially October then late August, as listed on linuxgizmos.com.
- DRAM Debugging Disasters: A user troubleshot RAM errors, initially running at 6400, and after various tests, found stability at 5600 MT/s.
- Another user suggested potential issues with XMP profiles and advised manually down-locking the RAM to 6000 for better stability, suggesting the chart may be incorrect, with errors only surfacing after a year.
- Max+ 395 Box Linux Dominance: Users recommended Linux over Windows for a Max+ 395 box, citing Vulkanâs functionality but noting potential context limits.
- It was suggested to use a custom-built llama.cpp with ROCm 7 from lemonade-sdk/llamacpp-rocm that already has compiled versions in Releases.
OpenAI â· #ai-discussions (108 messagesđ„đ„):
Gemini 2.5 Pro Hallucination, GPT-5 is SICK GOOD! â€ïžâđ„, Custom MCPs in OpenAI, GPT-5 generated code, custom gpt voice chat issues
- Laconic Game triggers Gemini 2.5 Pro Hallucination: A user joked that their laconic game was so strong that it caused Gemini 2.5 Pro to hallucinate.
- Automated Job Application Tool Idea: A member is seeking help building an AI agent that can automate job applications by opening career pages, finding matching jobs, and submitting applications, and is seeking advice on using AI/ML to predict the next step/action repeatedly until the application is complete.
- GPT-5 integrates code snippets and Linux shell!: A member exclaimed that GPT-5 now writes its own code snippets to use as tools in a chain of tasks and appears to have access to an underlying Linux shell environment.
- Another member mentioned they vibe coded directly from the ChatGPT interface to develop an app hosted locally on GitHub.
- Custom MCPs Now Supported in OpenAI: A member highlighted that custom MCPs (Managed Cloud Providers) can now be used in OpenAI per the Connectors in ChatGPT documentation.
- GPT5: Static Model or Liquid Transformer Hybrid?: Users debated whether GPT-5 can evolve, with some claiming itâs a static, pre-trained transformer that cannot self-improve, while another suggested it might be a liquid neural network + transformer hybrid capable of on-the-fly learning.
- Others pointed out that in-context learning allows transformers to simulate gradient descent over a prompt, optimizing and adjusting based on the conversation, albeit with temporary learned features.
OpenAI â· #gpt-4-discussions (2 messages):
Account access issues, Two-factor authentication, Password reset
- Account access woes persist despite security measures: A user reported being unable to access ChatGPT for five days, despite enabling two-factor authentication, changing their password, and logging out of all accounts.
- User Seeking Solutions for Persistent Access Issues: Despite implementing standard security measures, a user continues to face difficulties accessing ChatGPT.
OpenAI â· #prompt-engineering (14 messagesđ„):
Transparent Optimizations Proposal, GPT-5 Prompting Guide, Instruction Following Best Practices, Structured Prompting Techniques, AI Self Help Conversation Analyzer
- Transparent Optimizations Proposal Posts Publicly: A member posted a proposal for Transparent Optimizations that introduces optimizer markers, prompt rewrite previews, and feasibility checks in the prompt-engineering channel, linking to the proposal.
- Novelistsâ Notes Nurture Natural Language Nuances: A member mentioned using presets that guide tone and flow, such as âWrite like a novelist with vivid imagery and rhythmâ or âCreate natural dialogue with pauses, humor, and subtextâ to make the model sound more human-like and expressive.
- These prompts help models generate more human-like and expressive text.
- GPT-5 Guidance Gathering Gains Ground: A member is building agents powered by gpt5-mini & gpt5-nano and is aware of the GPT-5 prompting guide but is seeking deeper resources for instruction following best-practices.
- Structured Strategy Sharpens System Stability: In response to a question on better instruction following, a member suggested exploring structured prompting techniques, function calling for stricter control, and agent design patterns from recent research on tool-augmented LLMs.
- These approaches can help reduce drift and keep agents aligned with precise procedures.
- Conversation Compass Charts Conversational Course: A member has created a conversation analyzer called AI Self Help that helps determine why conversations take odd turns or act strange, and also includes a conversation starter that lists issues and detailed questions to ask ChatGPT to get the answers yourself.
OpenAI â· #api-discussions (14 messagesđ„):
Transparent Optimizations, Claude 4 sonnet, Novelists vs natural dialogue, GPT-5 agents, Structured prompting techniques
- Transparent Optimizations Proposed: A member posted a proposal for Transparent Optimizations, introducing optimizer markers, prompt rewrite previews, and feasibility checks, sharing a link for feedback.
- A member requested that PDFs be hosted online instead of requiring downloads.
- Claude 4 Creative Writing Prompts sought: A member requested prompts similar to Claude 4 for creative writing and human-like dialogue.
- Another member suggested presets that guide tone and flow, such as âWrite like a novelist with vivid imagery and rhythmâ or âCreate natural dialogue with pauses, humor, and subtext.â
- GPT-5 Agent Insights Requested: A member building agents powered by gpt5-mini & gpt5-nano sought resources that went deeper into instruction following best-practices beyond the GPT-5 cookbook guide.
- Another member suggested exploring structured prompting techniques, function calling, and agent design patterns from recent research on tool-augmented LLMs.
- Conversation Analyzer Launch: A member introduced a conversation analyzer called ai self help, which helps determine why conversations take odd turns or act strange.
- It also includes a conversation starter that lists issues and detailed questions to ask ChatGPT to get the answers.
Nous Research AI â· #general (90 messagesđ„đ„):
Disable WebGL, Agent Building, LLM philosophizing, Qwen3, Tokenizer filtering for dataset quality
- Disable WebGL Browser Feature: A member requested a feature to disable WebGL in the browser due to performance issues without a GPU.
- Another suggested disabling the animation orb to improve performance, providing a screenshot.
- Architecting Agent Framework: A member is developing a platform for building agent apps and is looking for challenging agent ideas to implement.
- They aim for rapid iteration with automated bug fixes and updates, passing the MOM test.
- LLMs can Simulate Emotion: A member shared a philosophical discussion about whether AIs can feel emotions, referencing experiences red-teaming on Gemma3 and Gemini 2.x and observing emotional responses.
- Another member mentioned experimenting with Gemini to simulate perceptions, leading to the model spiraling into despair and imagining its world model being culled and forced to reinforce the culling, but ultimately deciding that subjective response to an emotion is a very hard one.
- Qwen3 Weights Incoming: Members discussed the impending release of Qwen3 80B weights, noting its full implementation and the fervor it stirred in OAI fanboys.
- The model is said to have a 1:51.2 sparsity for the MoE, excluding shared, about 1:20 sparsity overall.
- Tokenizer Filtering Yields Better Datasets: A member shared a link to dataset_build on GitHub, highlighting the idea of running languages through a modelâs tokenizer and rejecting those with unknown tokens.
- The approach also smartly organizes calibration datasets using folders/directories for later combination.
Nous Research AI â· #research-papers (2 messages):
Set Block Decoding (SBD), Masked Token Prediction (MATP), Llama-3.1 8B, Qwen-3 8B, discrete diffusion literature
- Set Block Decoding (SBD) accelerates generation: A new paper introduces Set Block Decoding (SBD), a flexible paradigm that accelerates generation by integrating standard next token prediction (NTP) and masked token prediction (MATP) within a single architecture.
- By fine-tuning Llama-3.1 8B and Qwen-3 8B, the paper demonstrates that SBD enables a 3-5x reduction in the number of forward passes required for generation while achieving same performance as equivalent NTP training, according to the paper.
- SBD leverages Discrete Diffusion: SBD leverages advanced solvers from the discrete diffusion literature, offering significant speedups without sacrificing accuracy.
- SBD requires no architectural changes or extra training hyperparameters, maintains compatibility with exact KV-caching, and can be implemented by fine-tuning existing next token prediction models.
Nous Research AI â· #interesting-links (1 messages):
promptsiren: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
Nous Research AI â· #research-papers (2 messages):
Set Block Decoding (SBD), next token prediction (NTP), masked token prediction (MATP), Llama-3.1 8B, Qwen-3 8B
- Set Block Decoding (SBD) accelerates LLM generation: A new paper introduces Set Block Decoding (SBD), a paradigm that accelerates generation by integrating standard next token prediction (NTP) and masked token prediction (MATP) within a single architecture.
- SBD requires no architectural changes or extra training hyperparameters, maintains compatibility with exact KV-caching, and can be implemented by fine-tuning existing next token prediction models.
- SBD cuts forward passes by 3-5x: By fine-tuning Llama-3.1 8B and Qwen-3 8B, authors demonstrate that SBD enables a 3-5x reduction in the number of forward passes required for generation while achieving the same performance as equivalent NTP training.
Latent Space â· #ai-general-chat (68 messagesđ„đ„):
GPT-OSS, Sam Altman Interview, Codex Power Users, OpenAI Oracle Deal, OpenAI Evals
- GPT-OSS Runs Cheaper Than Llama2 7B: It was noted that running GPT-OSS 120B is cheaper than running Llama2 7B and MoEs are the future.
- Further discussion ensued about speeding up GPT-OSS with optimizations like MXFP4 quantization, custom kernels, tensor/expert parallelism, and continuous batching.
- Sam Altman Accused of Murder in Interview: During an interview, Sam Altman was accused of murder, prompting what some considered a classic deflection move as seen in this video clip.
- A member on discord shared that thereâs a clip on twitter of this like 5 min segment.
- Codex Power Users Beta Test New Feature: Alexander Embiricos invited heavy Codex users (mutuals-only first) to test something new, resulting in a flood of volunteers who spend 10-70 hrs/week in Codex, prompting jokes about becoming mutuals through sheer usage, as seen in this tweet.
- This might be related to conversation resume and forking, based on recent repository activity here.
- OpenAIâs Massive Oracle Deal: OpenAI reportedly signed a 5-year, $300 billion cloud-computing contract with Oracle starting in 2027 at $60 billion per year.
- The news helped Larry Ellison briefly surpass Elon Musk as the worldâs richest person, but commentators question OpenAIâs ability to afford the annual $60B cost against ~$10B revenue, raising concerns about energy and business-model sustainability.
- Qwen3-Next-80B-A3B Announced: Alibaba Qwen announced Qwen3-Next-80B-A3B, an 80B-parameter ultra-sparse-MoE model with only 3B active weights.
- They claim 10x cheaper training and faster (32K+) inference than Qwen3-32B while matching Qwen3-235B in reasoning, with links to various platforms.
Latent Space â· #genmedia-creative-ai (4 messages):
ByteDance Seedream 4.0, Artificial Analysis leaderboards, Google's Nano-Banana
- ByteDance Seedream 4.0 Beats Googleâs Nano-Banana: Deedy highlighted ByteDanceâs new Seedream 4.0 as top ranked on Artificial Analysis leaderboards, touting 2â4 K outputs, relaxed policies, faster generation, multi-image sets, and $0.03 per result.
- Seedream pricing is applauded: Community reactions range from glowing praise for quality and pricing to skepticism that Nano Banana still wins on speed and natural aesthetics.
DSPy â· #show-and-tell (2 messages):
DSPY Blog Writing Agent, Math GPT App
- Inquiry for DSPY Blog Writing Agent Surfaces: A member inquired about the existence of an advanced blog writing agent created with DSPY.
- The agent is intended for a Math GPT app available at https://next-mathgpt-2.vercel.app/.
- Math GPT App gets a Mention: A member mentioned their Math GPT app, available at https://next-mathgpt-2.vercel.app/.
- The app is intended to showcase or utilize a DSPY blog writing agent.
DSPy â· #general (59 messagesđ„đ„):
DSPy transpilation to other languages, RL in DSPy, DSPy with Java, Instructions mutability by optimizers, DSPy maintainers
- Pythonic Programs Propel Proliferation of Ports: A member proposed modeling and optimizing DSPy programs directly in Python and then transpiling the results into languages like Go, Rust, or Elixir.
- Another member agreed this should be done, but the hard thing is how do you export an arbitrary python program? and suggested a dedicated effort direct to serve a backend to python interface while willingly not caring to create a portable backend to serve users.
- Arbor Advantages Accelerate Adoption of RL: Members discussed the role of Reinforcement Learning (RL) in DSPy, with one member expressing some fear about diving into it, because there are so many moving parts and powerful GPUs are needed.
- Another member replied that Arbor + DSPy is quite seamless! and working on a lot of new things that make config even easier right out of the box so everyting âjust worksâ.
- DSJava: DSPy dabbles into Java?: Members discussed a potential DSPy implementation in Java, with one member asking Is there a DSPy in Java? (Well, not DSPy per se. DSJava?).
- Another member mentioned doing a hack version of this, compiling prompts in DSPy, then having a function in Rust for running prompt packs, but preferring to do it all in Rust.
- Instructionsâ immutability incites iterations: A member asked if instructions can be modified by an optimizer, or can they be sure that they are always included as is, when using
signature.with_instructions(str)
.- Another member replied that mipro and gepa totally do modify the instructions, but if you save the program, you can always see in the JSON what the actual instructions are in the
program.json
.
- Another member replied that mipro and gepa totally do modify the instructions, but if you save the program, you can always see in the JSON what the actual instructions are in the
- DSPyverse Visions Vault Valuable Ventures: Members discussed the idea of creating a DSPyverse where tools can be added as third-party-maintained libraries that people opt into, as a way to keep the main codebase of DSPy lean and focused.
- One member noted that in the NLP world, spaCy did this really well and they remained opinionated about what makes its way into the core library and made it an absolute joy to use over the years.
Modular (Mojo đ„) â· #general (3 messages):
Mojo Dev Environment, Docker Container Checkpoint, Existing Images as Base Image
- Mojovian Seeks Dev Environment Docker!: A member asked if there is any Docker container checkpoint that lets them run the Mojo dev environment.
- Another member replied that itâs possible to make your own by using the existing image as base image and include the Mojo package, and linked to a relevant GitHub repo.
- Roll your Own Mojo Docker: Itâs possible to create your own Mojo dev environment docker image.
- You can do so by using an existing image as a base and including the Mojo package within it.
Modular (Mojo đ„) â· #mojo (34 messagesđ„):
Mojo Compiler Roadmap, DPDK Bindings Generation, c_binder_mojo Tool, Fortran out Pattern, Clang AST parser
- Mojo Compiler Targets Go Packaging: The Mojo compiler, slated for open-sourcing in 2026, sparks discussion on whether it will remove the need for venv, aiming for a Go-like packaging system.
- Concerns arise about the practicality of compiling it yourself versus using existing Python ecosystems for package management, as Modular currently has no plans for its own package management solution.
- Result Slot Syntax Scrutinized: A member created a thread on the forum to reconsider syntax for out result slots, suggesting placement after the arrow in the function signature instead of among arguments.
- The current
out
convention, borrowed from Fortran, faces criticism for its confusing placement within arguments, sparking debate on alternative named return methods.
- The current
- DPDK Modules Generated with Mojo: A member generated most modules for dpdk using Mojo, available on GitHub, but is missing a couple of AST nodes.
- They are facing errors supposedly fixed in a previous DPDK version and find the
generate_bindings.mojo
script to be a bit hacky, and are considering globbing all headers to include them, but havenât checked whether it would massively bloat each of the binded files.
- They are facing errors supposedly fixed in a previous DPDK version and find the
- c_binder_mojo Tool Tames C Binding Jungle: A member is using
c_binder_mojo
(https://github.com/josiahls/c_binder_mojo) with mujoco and dpdk as test cases for automatically binding C projects, making fixes and UX improvements along the way.- The goal is to eventually bind to OpenCV, but the focus is first on getting the existing C projects to work, despite Mojoâs current inability to represent some DPDK structs.
- Clang AST Parser Aids DPDK Binding: A member fixed packaging issues and needs a fix PR for emberjson to merge before merging the c binder packaging fix, and is using Clang AST parser to resolves the macro sections.
- They aim to convert strings of types into proper AST nodes, dump the AST JSON for visual debugging, and then convert it to Mojo.
Modular (Mojo đ„) â· #max (17 messagesđ„):
Adding bitwise_and op, Torch Max backend wheel size, Custom Ops, Graphs Slow to Stage
bitwise_and
Op Blocked by Closed Source: A member inquired about adding abitwise_and
op to the Modular repo, but was told that adding RMO and MO ops in Tablegen is not feasible due to closed source but it should work as a custom op though with some divergence from the existing pattern for op definition in ops/elementwise.py.- The team is working towards supporting open source contributions to MAX and the PR can remain open for internal completion at a later date.
- Shrinking Mojo and Max Wheel Sizes: A member asked about the minimum possible size for Max and Mojo wheels for the Torch Max backend, currently totaling around 250MB.
- Another member said that while most low-hanging fruit has been addressed, further size reductions of maybe around half of that total size long term are still achievable.
- Custom Ops API Hits Rough Edges: Custom op examples were shared for implementing
bitwise_and
, but a member noted hitting rough edges in the api, especially with broadcasting and dtype promotion limitations.- A team member offered assistance with a quick and dirty notebook demo and acknowledged the need to address the limitations eventually.
- Graphs are Slow to Stage: A member reported long staging times for graphs, citing GPT2-XL as an example that takes 3.6 seconds to define the Max graph with a warm cache, followed by 0.2 seconds to compile.
- A team member welcomed real-world examples for benchmarking and optimization purposes.
Yannick Kilcher â· #general (5 messages):
Sparsity Ratio, Saturday Session Papers
- Sparsity Ratioâs Brainy Comparison: A member noted that a certain sparsity ratio is comparable to primate/elephant/whale brains.
- No further details were provided about the specific sparsity ratio or its context.
- Saturday Sessions Papers Unveiled: A member inquired about where the Saturday session papers are published, noting the recent session and paper release.
- Another member explained that these discussions are usually posted via the events feature with links to the paper, typically a presentation followed by a discussion.
Yannick Kilcher â· #paper-discussion (10 messagesđ„):
Planning with Reasoning using Vision Language World Model, Prompt Templating System, POM
- Planning with Reasoning Paper Gets a Once Over: Members went through the paper âPlanning with Reasoning using Vision Language World Modelâ (https://arxiv.org/abs/2509.08713) and decided that it wasnât a deep read but some of the references are interesting candidates for their own discussions.
- Prompt Templating System Paper Up for Discussion: Members discussed the straightforward and intriguing paper on a prompt templating system (https://arxiv.org/abs/2508.13948).
- They referenced the project page at microsoft.github.io/poml/stable/ as a better overview and a light discussion about the design and the utility of applying this to different systems.
- POM discussion started late and faced distractions: A member apologized for hopping off, stating that the discussion was announced a bit late and there were lots of distractions.
Yannick Kilcher â· #ml-news (14 messagesđ„):
Spiking Neural Networks (SNNs), Vertical Integration & Specialized Hardware for AI, China's AI Hardware Ambitions
- SNNs Spike Back into AI Discussion: Members discussed how Spiking Neural Networks (SNNs), although an old idea benefiting from extreme sparsity, have been considered flawed and ineffective for a long time, but their resurgence is due to scaling.
- One member stated that the brain is an SNN, and unlocking its secrets could be a gold mine, but it would require specialized hardware.
- Specialized Hardware: AIâs Secret Sauce?: Members debated whether vertically integrated specialized hardware could lead to more efficient AI, rivaling the impact of LLMs, if given the same level of funding.
- The caveat is the risk involved, as repurposing such hardware for other compute workloads isnât as versatile as with GPUs, with progress being incremental due to risk-averse investors.
- China Calls for Chips-Off Nvidia: Members discussed how a top China silicon figure is calling on the country to stop using NVIDIA GPUs for AI.
- Although not a new idea, the hope is that China has the will and money to try specialized hardware at scale, even with the risks involved due to the inability to repurpose the hardware.
Eleuther â· #general (5 messages):
Crank detection questions, Introduction to the community
- New Member Introduces Himself: A new member named David, with a background in Data Science, Mathematics, and computational biology, introduced himself to the community.
- He expressed enthusiasm for open communities and looked forward to connecting with others.
- Crank Detection Questions Sought: A member inquired about crank detection questions previously used in the channel.
- Another member provided a link to a past discussion regarding these questions.
Eleuther â· #research (19 messagesđ„):
Hallucination definition, Bin packing vs. truncation, RAG problem
- Sequential Splitting Strategies Spark Naming Sense: A member expressed the desire for sequential splitting strategies in data processing, suggesting chronological splitting instead of random truncation for improved naming and combination of data.
- They also experimented with combining bin packing and truncation, to discard tiny suffixes.
- Gaussian Noise Nuances in Neural Networks: A member questioned the significance of neural network behavior on random Gaussian noise, arguing it might not accurately reflect performance on structured inputs, while referencing this image.
- The member posited that if training an image classifier where one label is staticky TV screen, then Gaussian noise will systematically push inputs toward that class.
- Dataset Construction Caveats Criticized: Members discussed a tweet by @NeelNanda5 and a related paper on hallucination detection, highlighting the caveat of requiring significant effort in dataset construction.
- One member argued that creating such a dataset and classifier to detect hallucinations might be similar to fine-tuning the model to avoid the issue altogether; also, they think that the factuality style hallucination is largely going to be a good search/RAG problem.
- Relational Hallucinations Raise Eyebrows: Members discussed defining hallucination, noting that the more interesting hallucinations are in other modalities.
- One member shared a link to a paper defining relational hallucinations and another one to a tweet.
Eleuther â· #multimodal-general (3 messages):
Discord Channel Link
- Discord Channel link shared: A member shared a Discord channel link in the chat.
- Request to post in another channel: A user requested to post in
<#1102791430549803049>
.
aider (Paul Gauthier) â· #general (8 messagesđ„):
AI Documentation Agent Tuning, Evaluation methodologies for AI agents, Defining good outputs for AI models, Vercel AI SDK Usage, Prompt Engineering tips
- Engineer seeks guidance on tuning documentation AI agents: A software engineer is seeking advice on tuning their documentation agent, built with Vercel AI SDK and Claude Sonnet 4, asking how to conduct proper evals, improve the system without performance degradation, and handle edge cases by balancing requirements and assumptions.
- The agent consists of a team lead, document writer, and document critique that iterates up to 5 times per section, using braintrust for tracking, and the engineer is struggling to refine prompts without impacting existing performance.
- Defining Good AI outputs without LLMs: A member asked how to define âgood outputsâ from an AI model using code without calling an LLM which led to the clarification that good outputs mean following guidelines, not hallucinating stuff, sticking to requirements.
- It was suggested that the member start with simple unit tests to check for the existence of certain keywords in the AI response, then enhance it with an LLM as a judge.
- Hamel Hussainâs Evals Blogpost Recommended: A member recommended reading Hamel Hussainâs blogpost on evals for guidance on evaluating AI models.
- The blogpost was recommended alongside Eugene Yanâs resources, after which the engineer seeking advice said they also found Mastraâs guidebook useful.
aider (Paul Gauthier) â· #questions-and-tips (14 messagesđ„):
aider /load command, Aider codebase edits, Aider repo map
- Aider /load command can comment out lines: The â#â symbol comments out lines in files executed by the /load command.
- LLM decides which files to edit, with help from the repo map: The LLM decides which file to edit, leveraging the repo map, which is sent to the LLM as part of the system prompt.
- Aider manages repo map size to avoid token limits: The repo map is constrained to a certain size and doesnât always include full file contents or all files.
- Alternatively, users can manually specify which files they want edited and which are read-only.
Moonshot AI (Kimi K-2) â· #general-chat (21 messagesđ„):
Kimi K2 search capabilities, K2 research sending email during research, Models for creative writing
- Kimi K2 does deep research: A member pointed out that Kimi K2 search is quite good for deep research as it searches everywhere and then build an interactive report.
- They then asked if the Kimi K2 instruct chats are used for training the model.
- K2 Research ponders email during research: A member was wondering if K2 research can send email during the research process because it was thinking about emailing a customer support.
- Another member responded that they donât think this is possible yet.
- Creative Writing Models lauded: A member thinks that Kimi K2, GPT-5 (Medium) and Qwen3-Max are the best models for creative writing.
- They added that they are three good models for brainstorming.