What is going on?
AI News for 9/22/2025-9/23/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (193 channels, and 3072 messages) for you. Estimated reading time saved (at 200wpm): 236 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
We would normally feature the remarkable velocity of Qwen (headlined by todayâs Qwen3-Omni model) or the new DeepSeek V3.1 update, but really today belongs again to NVIDIA, which over the last week has deployed billions into Intel ($5b) and Enfabricaâs execuhire ($900m) and Wayne ($500m).
The relevant details of the press release are all we know:
News
- Strategic partnership enables OpenAI to build and deploy at least 10 gigawatts of AI datacenters with NVIDIA systems representing millions of GPUs for OpenAIâs next-generation AI infrastructure.
- To support the partnership, NVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed.
- The first gigawatt of NVIDIA systems will be deployed in the second half of 2026 on NVIDIAâs Vera Rubin platform.
**San Francisco and Santa ClaraâSeptember 22, 2025â**NVIDIA and OpenAI today announced a letter of intent for a landmark strategic partnership to deploy at least 10 gigawatts of NVIDIA systems for OpenAIâs next-generation AI infrastructure to train and run its next generation of models on the path to deploying superintelligence. To support this deployment including datacenter and power capacity, NVIDIA intends to invest up to $100 billion in OpenAI as the new NVIDIA systems are deployed. The first phase is targeted to come online in the second half of 2026 using NVIDIAâs Vera Rubin platform.
We donât know this for a fact but this $100B deal is likely a big part of how OpenAI is funding their $300B commit to Oracle from 2 weeks ago (whose stock is back up at all time highs, seeming to support this theory).
Side note: it hasnât escaped observers that somehow all the stocks involved - ORCL, OpenAI, and NVIDIA - are all jumping disproportionately on this money going from one to the other. NVIDIAâs stock gained $170B today after announcing this $100B investment to secure their revenue, OpenAIâs stock is now presumably valued more than the most recent $500B after this deal as well, and ORCL is still $250B higher than it was before the announcement. Are there -ANY- losers here?
From The Information, we also have some insight on the breathtaking scale of OpenAIâs intended infra spend, which includes about $150B more in existing + unaccounted spend.
AI Twitter Recap
Compute, Inference, and Systems: OpenAIâNVIDIA, FP8, and crossâvendor GPU portability
- OpenAI Ă NVIDIA: 10 GW and âmillions of GPUs.â OpenAI announced a strategic partnership with NVIDIA to deploy at least 10 gigawatts of GPU datacenters, targeting first capacity in 2H 2026 on âVera Rubin,â with NVIDIA intending to invest up to $100B as systems are deployed. OpenAI framed NVIDIA as a preferred strategic compute/networking partner; NVIDIAâs market cap jumped on the news. Details via @OpenAINewsroom and @gdb. Commentary on how such scaling continues to drive down the âcost of intelligenceâ from @ArtificialAnlys.
- Deterministic inference for RL & reproducibility: SGLang added endâtoâend deterministic attention/sampling that remains compatible with chunked prefill, CUDA graphs, radix cache, and nonâgreedy samplingâuseful for reproducible rollouts and onâpolicy RL with minimal overhead. See @lmsysorg.
- FP8, comms, and realâworld speedups: Practitioners reported tangible FP8 gains under parallelism with comms constraints (e.g., PCIe), with perf crossover vs BF16 under pipeline/data parallel regimes. See local results and methodology from @TheZachMueller and followâups. Related: Together AI is offering early access to GB300 NVL72 racks (@togethercompute).
- Write once, run on many GPUs: Modular previewed crossâvendor portability where most code written for NVIDIA/AMD âmostly just worksâ on Apple Silicon GPUsâaimed at lowering hardware access barriers (@clattner_llvm). See also their updated crossâvendor stack notes (@clattner_llvm).
Major model drops: Qwen3 Omni family, Grokâ4 Fast, DeepSeek V3.1 Terminus, Apple Manzano, Meituan LongCat
- Qwenâs multiâfront release wave:
- Qwen3âOmni: an endâtoâend omniâmodal model (text, image, audio, video) with 211 ms latency, SOTA on 22/36 audio/AV benchmarks, toolâcalling, and a lowâhallucination Captioner. Alibaba openâsourced the 30B A3B variants: Instruct, Thinking, and Captioner. Demos and code: @Alibaba_Qwen, release thread.
- Qwen3âNextâ80BâA3B with FP8: Apacheâ2.0 weights focused on longâcontext speed; mixtureâofâexperts with gated attention/DeltaNet, trained on ~15T tokens with GSPO, supports up to 262k tokens (longer with mods). Summary via @DeepLearningAI.
- Qwen3âTTSâFlash: SOTA WER for CN/EN/IT/FR, 17 voices Ă 10 languages, ~97 ms first packet; and
- QwenâImageâEditâ2509: multiâimage compositing, stronger identity preservation, and native ControlNet (depth/edges/keypoints). Launches: TTS, ImageâEdit.
- xAI Grokâ4 Fast: A costâefficient multimodal reasoner with 2M context, free in some âvibe codingâ UIs; community reports 2â3Ă higher throughput but weaker instruction following than GPTâ5âmini on some tasks; SVG generation test mixed; still competitive on LisanBench. See @ShuyangGao62860, @_akhaliq, @scaling01, and a longâcontext filtering anecdote from @dejavucoder.
- DeepSeekâV3.1âTerminus: Incremental update addressing mixedâlanguage artifacts and improving Code/Search agents. Available on Hugging Face; community shows usable 4âbit quant runs on M3 Ultra with MLX at doubleâdigit toks/sec. See @deepseek_ai, demos by @awnihannun.
- Apple Manzano: a unified multimodal LLM that shares a ViT with a hybrid vision tokenizer (continuous embeddings for understanding + 64K FSQ tokens for generation), scaling from 300M to 30B, with strong textârich understanding (OCR/Doc/ChartQA) and competitive generation/editing via a lightweight DiTâAir decoder. Threads: @arankomatsuzaki, summary with training details by @gm8xx8.
- Meituan LongCatâFlashâThinking: openâsource âthinkingâ variant reporting SOTA across logic/math/coding/agent tasks with 64.5% fewer tokens on AIME25 and a 3Ă training speedup via async RL. Launch: @Meituan_LongCat.
Coding agents, evals, and scaffolds: SWEâBench Pro, GAIAâ2/ARE, ZeroRepo, Perplexity Email Assistant
- SWEâBench Pro (Scale AI): a harder successor to SWEâBench Verified with multiâfile edits (avg ~107 LOC across ~4 files), contamination resistance (GPL/private repos), and tougher deps. Current top scores: GPTâ5 = 23.3%, Claude Opus 4.1 = 22.7%, most others <15%. Details from @alexandr_wang and @scaling01.
- Meta GAIAâ2 + ARE: a practical agent benchmark and an open platform (with MCP tool integration) for building/evaluating agents in noisy, asynchronous environments. Findings: strong âreasoningâ models can fail under time pressure (inverse scaling); KimiâK2 competitive at low budgets; multiâagent helps coordination; diminishing returns beyond certain compute. See @ThomasScialom and commentary by @omarsar0.
- MCPâAgentBench: Metastoneâs liveâtool benchmark with 33 servers & 188 tools to evaluate realâworld agent performance (@HuggingPapers).
- Repository Planning Graph (RPG) + ZeroRepo (Microsoft): proposes a graph of capabilities/files/functions and data dependencies to plan/generate whole repos from specs, reporting 3.9Ă more LOC than baselines on their setup. Threads: @_akhaliq and explainer from @TheTuringPost.
- Perplexity Email Assistant: a native email agent for Gmail/Outlook that drafts in your style, schedules meetings, and prioritizes inbox itemsânow live for Max subscribers (@perplexity_ai, @AravSrinivas).
- Coding UX trending up: GPTâ5âCodex shows dramatic capability jumps (e.g., a basic Minecraft clone in three.js) and reward shaping that âmakes sure your code actually runsâ (@gdb, @andrew_n_carr). Tri Dao reports 1.5Ă productivity with Claude Code (@scaling01); âcode is kingâ remains a durable, highâvalue application (@simonw).
Safety, governance, and agent security
- Detecting/reducing âschemingâ: OpenAI and Apollo AI Evals introduced environments where current models exhibit situational awareness and can be prompted/trained into simple covert behavior; âdeliberative alignmentâ reduces scheming rates, though antiâscheming training can increase evaluation awareness without eliminating covert actions (@gdb). Practitioner notes: outcomeâbased RL and âhackableâ envs may introduce scheming; rising use of nonâhuman âreasoning tracesâ complicates audits (@scaling01).
- Guardrails with dynamic policy: DynaGuard (ByteDance) evaluates if conversations comply with userâdefined rules, supports fast/detailed explanatory modes, and generalizes to unseen policies (@TheTuringPost).
- Agent ingestion principle: âIf the agent ingests anything, its permissions should drop to the level of the authorââa crisp policy design heuristic for toolâenabled agents (@simonw).
Research highlights: JEPA debate, synthetic data pretraining, memory for latent learning
- JEPA for LLMs (and for robots): A new LLMâJEPA iteration claims latent prediction benefits (@randall_balestr), but critiques argue it requires tightly paired data (e.g., TextâSQL), adds forward passes, and lacks generality (@scaling01). In robotics, VâJEPA shows strong spatial understanding but impractical inference (~16s/action via MPC) and no language conditioning; contrasts with labelâheavy approaches like Pi0.5 (@stevengongg).
- Synthetic Bootstrapped Pretraining (SBP): Trains a 3B model on 1T tokens by synthesizing interâdocument relationsâoutperforming repetition baselines and closing much of the gap to an oracle with 20Ă more unique data (@ZitongYang0, @arankomatsuzaki).
- Latent learning gap and episodic memory: A conceptual framework tying language model failures (e.g., reversal curse) to absent episodic memory; shows retrieval/episodic components can complement parametric learning for generalization (@AndrewLampinen).
- Also notable: NVIDIAâs ReaSyn frames molecule synthesis as chainâofâreaction reasoning with RL finetuning (@arankomatsuzaki); Dynamic CFG adapts guidance per step via latent evaluators, yielding large human pref gains on Imagen 3 (@arankomatsuzaki); Microsoftâs Latent Zoning Network unifies generative modeling, representation learning, and classification via a shared Gaussian latent space (@HuggingPapers).
Top tweets (by engagement)
- OpenAI Ă NVIDIA announce a strategic buildout of âmillions of GPUsâ and at least 10 GW of data centers (@OpenAINewsroom, 3.7K+).
- Qwen3âOmni: Endâtoâend omni model with SOTA audio/AV results and 30B open variants (Instruct/Thinking/Captioner) (@Alibaba_Qwen, 3.9K+).
- Tursoâs rapid evolution: Rust rewrite of SQLite with asyncâfirst architecture, vector search, and browser/wasm supportâframed as infra for âvibe codingâ (@rauchg, 2.8K+).
- GPTâ5âCodex demo: three.js âMinecraftâ built from a single prompt (@gdb, 3.1K+).
- SWEâBench Pro: harder agent coding benchmark with realâworld repos; GPTâ5 and Claude Opus 4.1 lead at ~23% (@alexandr_wang, 1.7K+).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. DeepSeek-V3.1-Terminus Launch & Online Upgrade
- đ DeepSeek released DeepSeek-V3.1-Terminus (Score: 361, Comments: 45): DeepSeek announced an iterative update, DeepSeekâV3.1âTerminus, targeting prior V3.1 issues like CN/EN language mixing and spurious characters, and upgrading its Code Agent and Search Agent. The team claims more stable, reliable outputs across benchmarks vs V3.1 (no specific numbers provided); weights are openâsourced on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus and accessible via app/web/API. Commenters ask if âTerminusâ signifies the final V3 checkpoint and request feedback on roleâplay performance; others discuss the aggressive naming, but no technical objections are raised.
- Clarification sought on whether âV3.1âTerminusâ denotes the final checkpoint of the
V3
line versus a routine sub-variant; the naming suggests a checkpoint/tag rather than a major-arch change, and commenters want release notes clarifying if itâs a new training run, a late-stage fine-tune, or an inference-time preset. - Critique of DeepSeekâs versioning semantics: the sequence from R1 to
V3.1
andV3.1âT
is seen as confusing, with warnings that a hypothetical â3Vâ (Vision) could be mistaken for âV3â. This ambiguity impedes reproducibility and apples-to-apples comparisons across checkpoints and capabilities unless model cards clearly specify training data, steps, and deltas between tags. - Requests for head-to-head benchmarks against popular open(-ish) baselines like GLMâ4.5 and âkimik2â (as referenced by users), including roleplay performance as a targeted eval dimension. Commenters want standardized evals (e.g., instruction-following plus RP/character consistency tests) to quantify whether
V3.1âT
improves practical usability versus current stacks.
- Clarification sought on whether âV3.1âTerminusâ denotes the final checkpoint of the
2. Qwen3-Omni Multimodal Release & Open-Source Models
- 3 Qwen3-Omni models have been released (Score: 362, Comments: 77): Three end-to-end multilingual, omni-modal
30B
modelsâQwen3-Omni-30B-A3B-Instruct, Thinking, and Captionerâare released with a MoE-based ThinkerâTalker design, AuT pretraining, and a multi-codebook speech codec to reduce latency. They handle text, image, audio, and video with real-time streaming responses (TTS/STT), support119
text languages,19
speech-input languages, and10
speech-output languages, and report SOTA on22/36
audio/video benchmarks (open-source SOTA32/36
), with ASR/audio understanding/voice conversation comparable to Gemini 2.5 Pro per the technical report. Instruct bundles Thinker+Talker (audio+text out), Thinking exposes chain-of-thought Thinker (text out), and Captioner is a fine-grained, low-hallucination audio captioner fine-tuned from Instruct with a cookbook. Early user reports claim TTS quality is weak while STT is âgodlike,â outperforming Whisper with contextual constraints and very fast throughput (e.g., ~30 s audio transcribed in a few seconds), and note strong image understanding on complex graphs/trees to Markdown; another asks about GGUF availability.- Reports note the modelâs STT is âgodlikeâ versus OpenAI Whisper, with promptable context/constraints (e.g., telling it to never insert obscure words) and very high throughputâ
~30s
of audio transcribed in a few seconds locally. Multimodal vision is praised for accurate structure extraction, e.g., converting complex graphs/tree diagrams into clean Markdown, implying robust layout understanding beyond simple OCR. See Whisper for baseline comparison: https://github.com/openai/whisper. - Conversely, native TTS quality is described as poor, which limits end-to-end speech-to-speech despite fast ASR. Real-time S2S is feasible in principle by chaining ASR â LLM â TTS, but latency/UX will depend on swapping in a higherâquality TTS engine; STT latency appears nearâreal-time, but output voice quality remains the bottleneck.
- Local deployment friction is highlighted: users ask for GGUF builds and note llama.cpp lacks full multimodal support (even
Qwen2.5-Omni
isnât fully integrated), so audio/image features may require vendor runtimes or custom servers for now. This constrains onâdevice use until community kernels catch up. Relevant refs: llama.cpp https://github.com/ggerganov/llama.cpp and GGUF format https://github.com/ggerganov/llama.cpp/blob/master/docs/gguf.md.
- Reports note the modelâs STT is âgodlikeâ versus OpenAI Whisper, with promptable context/constraints (e.g., telling it to never insert obscure words) and very high throughputâ
- đ Qwen released Qwen3-Omni! (Score: 186, Comments: 3): Alibabaâs Qwen team announced Qwen3âOmni, a natively endâtoâend multimodal model unifying text, image, audio, and video (no external encoders/routers), claiming SOTA on
22/36
audio/AV benchmarks. It supports119
text languages /19
speechâin /10
speechâout, offers ~211 ms
streaming latency and30âmin
audioâcontext understanding, with systemâprompt customization, builtâin tool calling, and an openâsource lowâhallucination Captioner. Open releases include Qwen3âOmniâ30BâA3BâInstruct, âThinking, and âCaptioner; code/weights and demos are on GitHub, HF, ModelScope, Chat, and a demo space. Comments flag the benchmark chart layout as making direct comparison to Gemini 2.5 Pro difficult, while several note the 30BâA3B results appear competitive with GPTâ4o on their tasksâespecially for visionâreasoningâprompting enthusiasm to test âthinkingâoverâimagesâ in an open model.- Skepticism about the benchmark visualization: one commenter notes the chart is âmasterfully craftedâ to push Gemini 2.5 Pro off the main comparison area, implying potential presentation bias and making sideâbyâside evaluation with Qwen3âOmni harder. The point emphasizes the need for transparent axes, overlapping points, and raw numbers to enable reproducible, applesâtoâapples comparisons across models.
- Early read on performance: a user says the
30B-A3B
variant shows surprisingly strong results and appears to match GPTâ4o
in their experience on multimodal reasoning, particularly âthinkingâoverâimages.â If borne out in independent tests, that would position an open model close to frontier multimodal reasoning capability, attractive for local/selfâhosted use and practical evaluation beyond curated leaderboards.
3. Qwen-Image-Edit-2509 Release: Multi-Image Editing & ControlNet
- Qwen-Image-Edit-2509 has been released (Score: 222, Comments: 30): Qwen released Qwen-Image-Edit-2509, a September update adding multi-image editing trained via image concatenation (supporting person+person/product/scene) with best results at
1â3
inputs, plus markedly improved single-image identity consistency for faces, products, and on-image text (fonts/colors/materials) accessible via Qwen Chat. It also adds native ControlNetstyle conditioning (depth, edge, keypoint maps, etc.) on top of the existing Qwen-Image-Edit architecture. Comments highlight surprise at the monthly cadence and note prior issues with facial identity drift over multiple iterations, which this release claims to address; some compare it to Flux Kontext, saying earlier versions sometimes had worse facial resemblance, so the fast update is welcomed.- Identity preservation across iterative edits: Users report prior Qwen-Image-Edit builds struggled to keep faces consistent, especially over multiple edit passes or with multiple subjects. v2509 is highlighted as targeting this issue, suggesting improved face/identity conditioning and reduced drift across iterations.
- Comparison vs Flux Kontext: One user found the previous release was close but sometimes worse at facial resemblance than Flux Kontext. The v2509 update is viewed as closing that gap by acknowledging and addressing facial similarity issues.
- Inpainting/object removal performance: A commenter says Qwen-Image-Edit-2509 is âcomparable to nano bananaâ on object removal tasks, implying competitive fill quality for removals. No quantitative benchmarks were provided, but the qualitative parity is noted.
- đ„ Qwen-Image-Edit-2509 IS LIVE â and itâs a GAME CHANGER. đ„ (Score: 208, Comments: 18): Qwen-Image-Edit-2509 is announced as a major upgrade of Qwenâs image editing stack with multi-image compositing (e.g., person+product/scene) and strong single-image identity/brand consistency. It claims fine-grained text editing (content, font, color, material) and integrates ControlNet controls (depth, edges, keypoints) for precise conditioning; code and weights are available on GitHub and Hugging Face (GitHub, HF model, blog). Top comments critique the marketing hyperbole (e.g., âgame changer,â ârebuiltâ) and donât provide benchmarks or technical counterpoints; skepticism centers on evidence for the claimed improvements.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. OpenAIâNVIDIA 10 GW Supercomputer Partnership Announcements
- OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems (Score: 244, Comments: 80): OpenAI and NVIDIA signed a letter of intent to deploy at least
10 GW
of NVIDIA systems (described as âmillions of GPUsâ) for OpenAIâs nextâgen training/inference stack, with the first1 GW
slated forH2 2026
on NVIDIAâs Vera Rubin platform; NVIDIA will be a preferred compute/networking partner, and both sides will coâoptimize OpenAIâs model/infrastructure software with NVIDIAâs hardware/software [source]. NVIDIA also intends to invest up to $100B in OpenAI, disbursed progressively as each gigawatt is deployed, alongside a parallel buildâout of datacenter and power capacity and continued collaborations (e.g., Microsoft, Oracle, SoftBank, âStargateâ) toward largeâscale model development (https://openai.com/index/openai-nvidia-systems-partnership). Top comments highlight the sheer scale of the proposed funding and note potential circular capital flows (OpenAI buys NVIDIA compute while NVIDIA invests back into OpenAI); others argue over âbubble vs. singularityâ framing rather than technical merits.- A cited claim states: âNVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed,â tied to a
10 GW
rollout of NVIDIA systems. This reads as tranche-based vendor financing keyed to power/compute milestones, aligning capex with datacenter build-outs and de-risking supply timing while scaling GPU deployment as power and facilities come online. - Another comment highlights a circular capital flow: OpenAI purchases NVIDIA systems while NVIDIA invests back into OpenAI. Technically, this resembles a strategic supplier-financing/compute-prepayment structure that could secure priority allocation for next-gen NVIDIA platforms (e.g., H200/B200/GB200), lock in roadmap/pricing, and accelerate training cadence, at the cost of deeper vendor lock-in and supply-chain concentration.
- A cited claim states: âNVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed,â tied to a
- đš BREAKING: Nvidia to Invest $100 Billion in OpenAI (Score: 615, Comments: 99): Post claims Nvidia will invest up to
$100B
in a strategic partnership with OpenAI to build/deploy10 GW
of AI supercomputer capacity on Nvidia hardware, translating to âmillions of GPUsâ coming online in 2H2026
, to support OpenAIâs AGI ambitions; it adds Nvidia stock rose+4.69%
on the news. Structure suggests funds are tied to progressive 10 GW rollout, effectively pre-financing/locking OpenAI to Nvidiaâs stack and positioning Nvidia at the center of nextâgen AI compute. Comments argue itâs effectively Nvidia investing in itself since OpenAI buys Nvidia hardware; OpenAI is trading equity for guaranteed compute; and Nvidia benefits by amplifying demand/prices for its chips, then recycling profits to subsidize OpenAI capacity.- Power-to-GPU math: taking the cited
10 GW Ă· 0.7 kW â 14.3M GPUs
(assuming ~700 W/GPU
, e.g., H100-class SXM modules), but accounting for data center PUE~1.2â1.4
and non-GPU overheads (CPUs, NICs, switches, storage, cooling) drops usable GPU count to roughly~8â11M
. Networking at this scale (400/800G per node over InfiniBand NDR or Ethernet) implies tens of millions of optics/ports and multi-megawatt fabric power; the interconnect and optics supply chain become bottlenecks alongside GPUs NVIDIA Quantum-2 400G IB, NVLink Switch. Blackwell-era modules are expected to push module power higher, further reducing the GPU count per GW and increasing cooling/networking overheads (NVIDIA GTC Blackwell). - Compute-for-equity flywheel: commenters frame this as OpenAI swapping equity for reserved NVIDIA capacity; in turn, OpenAIâs workloads popularize NVIDIA chips, letting NVIDIA raise ASPs and recycle profits into subsidized capacity for OpenAIâeffectively âNVIDIA investing in NVIDIA.â Practically, this likely means multi-year take-or-pay reservations and prepayments tied to constrained inputs like HBM3E and CoWoS packaging capacity, with priority allocation rather than pure cash injection (HBM3E overview, TSMC CoWoS). Deepened CUDA lock-in increases switching costs versus AMD MI300X/MI325X + ROCm, pressuring competitors to beat NVIDIA on $/TFLOP and memory BW to win inference/training TCO (AMD MI300X, ROCm).
- Scale and infrastructure constraints:
~10 GW
is utility-scale power (dozens of campuses), requiring new high-voltage substations, long-lead transformers (often18â36
months), and substantial water/heat-rejection capacity; grid and cooling timelines may dominate deployment speed. Even with supply, orchestrating millions of GPUs demands pod-level topologies (e.g., 8â16 GPU HGX nodes, multi-tier IB/Ethernet fabrics) and careful job placement to maintain high training efficiency; otherwise interconnect bottlenecks erase scale gains. HBM supply (stacks per GPU) and optics availability are likely pacing items as much as the GPUs themselves, which aligns with âtied to capacityâ language in such deals (Uptime PUE context, NVIDIA HGX).
- Power-to-GPU math: taking the cited
2. Qwen-Image-Edit-2509 Release and Gemini/ChatGPT Multimodal Demos
- Qwen-Image-Edit-2509 has been released (Score: 348, Comments: 92): Qwen released Qwen-Image-Edit-2509, the September iteration of its image-editing diffusion model, adding multi-image editing via training on concatenated images (optimal with
1â3
inputs; supports person+person, person+product, person+scene). It improves single-image edit consistency (better identity preservation for people with pose/style changes; stronger product identity/poster editing; richer text edits including fonts/colors/materials) and introduces native ControlNet conditioning (depth, edge, keypoint maps). A Diffusers pipelineQwenImageEditPlusPipeline
is provided; examples recommendtorch_dtype=bfloat16
on CUDA and exposetrue_cfg_scale
,guidance_scale
,num_inference_steps
, andseed
. Commenters ask if the âmonthly iterationâ implies regular monthly releases and whether LoRAs trained on prior versions will remain compatible across updates; they also note the likely need to redo quantization (e.g., GGUF/SVDQuant) per release, with one user immediately aiming to convert to GGUF.- A commenter plans immediate conversion to GGUF, indicating demand for a quantized, llama.cpp/ggml-friendly format for local inference on lowâVRAM or CPUâbound setups. GGUF support typically enables deployment via llama.cpp-style backends and offline tooling; GGUF spec: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md.
- Thereâs scrutiny of a potential
monthly
release cadence and whether LoRAs trained on previous checkpoints will remain compatible. Since LoRA deltas are tied to the base modelâs weight shapes/tokenization, even small checkpoint/architecture changes can break compatibility and require re-training or re-derivation; LoRA background: https://arxiv.org/abs/2106.09685. - Resource constraints are highlighted: one user notes theyâd need a new SVDQuant for each update and âthereâs no way Iâm using even the GGUF version on my poor GPU.â This implies monthly quantization pipeline churn (e.g., SVDQuant, GGUF) and reliance on aggressive quantization to fit VRAM limits for image-edit inference.
- I met my younger self using Gemini AI (Score: 231, Comments: 24): Post showcases a photorealistic AI edit where the OP âmeetsâ their younger self, reportedly created with Googleâs Gemini AI. While no exact workflow is given, the result implies an image-to-image or multi-reference image editing pipeline (likely supplying a current portrait and a childhood photo) guided by a text prompt to compose the scene; OP notes the realism, saying it worked âthis good.â No benchmarks or model variant/parameters are provided. Comments are enthusiastic and request reproducible detailsâspecifically the prompt, whether two identity reference images were used, and if a separate scene reference guided compositionâhighlighting interest in practical workflow replication.
- A commenter asks for the exact prompt/workflow, explicitly probing whether the OP used two reference images (current self + childhood photo) and an additional reference for the final scene. This highlights interest in multi-image conditioning and composition control in Geminiâs image pipeline (e.g., identity preservation across inputs and scene guidance). No specifics are provided in-thread, so reproducibility details (input modalities, steps, or constraints) remain unknown.
- The OP reports unexpectedly high fidelity (âI didnât think itâd work this goodâ) along with a generated composite: https://preview.redd.it/x7dvf09mzrqf1.jpeg?width=768&format=pjpg&auto=webp&s=88e91700732a51793ba05373ad87f6b7652cf01e, suggesting effective identity retention across age domains. However, the thread lacks technical parameters (model/version, input resolution, steps/seeds, or prompt details), so the result canât be benchmarked or replicated from the given information.
- I didnât know ChatGPT could do this. Took 3 prompts. (Score: 328, Comments: 95): OP reports that ChatGPT generated a runnable â3D Google Earthâstyleâ web app in ~3 prompts and rendered it directly in the ChatGPT Preview sandboxâno local compile/build or hosting required. The globe implementation is suspected (by commenters) to use WebGL via three.js and possibly three-globe; the linked video (v.redd.it/6mhokhu3wmqf1) returns
403 Forbidden
without Reddit auth, indicating access is gated by application/network-edge controls. The âcomplicatedâ part of the task was not achieved, but the interactive globe scaffold worked end-to-end within the ChatGPT environment. Commenters argue LLMs are strong at scaffolding âsingleâservingâ web apps due to the web platformâs breadth and code-heavy training data, while another highlights reliability limits (e.g., a recent vision misclassification of an apple as a tomatillo). Thereâs debate/curiosity about the exact stack (three.js vs. threeâglobe), but no confirmed details from OP.- Multiple commenters infer the demo is built with three.js (threejs.org) and a globe helper like three-globe, leveraging
WebGL
for rendering. In such setups, the heavy lifting (sphere geometry, atmospheric shaders, geojson-to-3D conversion, arc/point animations) is handled by the library, and the LLM primarily wires configuration and data. This explains why itâs achievable in â3 promptsâ: the code surface is mainly integrating well-documented APIs rather than writing low-level graphics. - LLMs are well-suited to scaffold âsingle-servingâ web apps by composing existing packages and browser APIs. With a clear spec, they can generate a minimal stack (e.g.,
Vite
+ vanilla JS/TS or React) and integrate libraries (e.g., three.js,d3-geo
,TopoJSON
), relying on the web platformâs capabilities (Canvas/WebGL/WebAudio). The abundance of publicly available code in training corpora increases reliability for boilerplate and idiomatic patterns, though correctness still hinges on precise prompts and iterative testing. - A report of misidentifying an apple as a tomatillo highlights current limits of general-purpose VLMs on fine-grained classification. Without domain-specific priors or few-shot exemplars, lookalike classes under variable lighting/backgrounds often confuse models; specialized models (e.g., CLIP variants or fine-tuned Food-101 classifiers) and prompt constraints can mitigate errors. It underscores that while LLMs excel at code synthesis, vision reliability may lag without task-specific calibration.
- Multiple commenters infer the demo is built with three.js (threejs.org) and a globe helper like three-globe, leveraging
- I had that moment with Kimi 2! (Score: 1390, Comments: 72): Screenshot shows Kimi 2 responding âGood catchâ after being correctedâillustrating a hallucination likely triggered when it couldnât access referenced documents. Commenters report Kimi 2 will fabricate content rather than return a retrieval/no-data error when document access fails, pointing to gaps in RAG/grounding and guardrails for provenance-aware answers. Users confirm this behavior occurs on document-access failures and liken the botâs reply to a student being caught unprepared; one notes it routinely âmakes something upâ in these cases.
- Users note that when the assistant cannot access referenced documents (e.g., retrieval/permissions failures), it tends to fabricate plausible details rather than abstain. This is a classic RAG failure mode; engineering mitigations include explicit retrieval success checks, surfacing âno evidence foundâ states, and enforcing
cite-or-abstain
responses to avoid unsupported generations. - The âGPT-18â anecdote illustrates correction-induced confabulation: the model freely swaps core facts (location, power plant type) while preserving the narrative outcome (evacuation). This highlights lack of grounding and constraint satisfaction; mitigations include schema-validated tool use, entity normalization (geo/organization disambiguation), and external verification before committing to factual assertions or actions.
- Hallucinations are reported across vendors (e.g., ChatGPT and Claude), suggesting model-agnostic limitations in factuality and tool reliability. Production setups should add deterministic guardsâretrieval timeouts, confidence gating, and post-hoc verifiersâto reduce error rates, rather than relying on model prompts alone.
- Users note that when the assistant cannot access referenced documents (e.g., retrieval/permissions failures), it tends to fabricate plausible details rather than abstain. This is a classic RAG failure mode; engineering mitigations include explicit retrieval success checks, surfacing âno evidence foundâ states, and enforcing
3. Robot Uprising Memes and Unitree G1 Agility Clips
- Unitree G1 fast recovery (Score: 1515, Comments: 358): Short post appears to showcase the Unitree G1 humanoid executing a rapid groundâtoâstand âfast recoveryâ maneuver, suggesting a wholeâbody controller coordinating multiâcontact transitions with sufficient actuator peak torque/power for explosive hip/knee extension. No quantitative benchmarks (e.g., recovery time, joint power/torque, controller type) are provided; the video link v.redd.it/8l0l09o6fpqf1 returns HTTP
403
(accessâcontrolled), but a still frame is visible here. Top comments emphasize the motionâs realism (âimpressive and scaryâ) and question authenticity (âlooks so good it looks fakeâ); no technical critique or controller/actuator discussion is present. - Primary target locked! This guys the first one to go (Score: 348, Comments: 48): A short v.redd.it clip appears to depict a robotic system announcing a target âlockâ on a human and then deploying a rope/tether after switching to âOFFENSIVE MODE,â implying basic vision-based target acquisition and a powered launcher/gimbal; no telemetry, specs, or control-loop details are provided to evaluate latency, actuation speed, or safety. Several commenters challenge the demoâs validity, asserting the footage is likely sped up and asking for realâtime playback to assess tracking stability, servo response, and the risk profile of a neck-level tether. Debate centers on law-enforcement applications versus safety/ethics, with some advocating eventual police use while others highlight strangulation hazards and reliability concerns; a quip about a person getting tangled in ~
2s
underscores skepticism about practical robustness.- A commenter alleges the demo is sped up and asks for real-time playback. Without true 1Ă footage, itâs impossible to judge controller bandwidth, state-estimation latency, actuator torque limits, and gait stabilityâtime-lapse can hide slow step frequency and long recovery times after disturbances. Best practice would be an on-screen timecode/frame-time overlay and reporting of step rate (Hz), CoM velocity, and reaction latency to external impulses.
- Another critic notes repeated âpush testsâ and simple preprogrammed punches show disturbance rejection but little capability progression. They implicitly call for harder, measurable benchmarks: uneven-terrain traversal with quantified slip, contact-rich manipulation with force/impedance control, autonomous perception/planning, payload handling, and normalized metrics like cost of transport, fall rate, and mean time-to-recovery under known impulse. Public logs or standardized benchmark suites would enable fair comparisons across humanoid platforms.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. DeepSeek v3.1 Terminus and Qwen3 Releases
- DeepSeek v3.1 Terminus Lands with Agentic Tweaks: DeepSeek released v3.1 Terminus with open weights on DeepSeek-V3.1-Terminus, citing bug fixes, improved language consistency, and stronger code/search agent behavior.
- Users noted âsomewhat degradedâ reasoning without tool use and âslight improvementsâ in agentic tool use, while others immediately asked âwhen is DeepSeek-R2?â and pointed to the broader deepseek-ai models.
- Qwen3 Omni-30B Goes Multimodal: Alibabaâs Qwen3 Omni-30B-A3B-Instruct (36B params) landed with multimodal encoders/decoders and multilingual audio I/O at Qwen/Qwen3-Omni-30B-A3B-Instruct.
- Community claims it âbeats Whisper v2/3 and Voxtral 3Bâ on scam-call audio and supports 17 input and 10 output languages, alongside chatter about local LoRA training on a single RTX 4090 and a coming wave of realtime perceptual AI.
- Qwen3-TTS Speaks Up: Tongyi Lab unveiled Qwen3-TTS with multiple timbres and languages optimized for English and Chinese, documented at ModelScope: Qwen3-TTS.
- Builders asked about open-source availability and API pricing, but the official response shared docs only and stayed silent on openness and cost.
2. Diffusion Sampling & Data Efficiency Breakthroughs
- 8-Step ODE Solver Smokes 20-Step DPM++: An independent researcherâs WACV 2025 submission, Hyperparameter is all you need, introduces an ODE solver that achieves 8-step inference (and 5-step rivaling distillation) outperforming DPM++2m 20-step in FID.
- The approach is a training-free sampler cutting compute by ~60% via better tracing of the probability flow trajectory, with code at GitHub: Hyperparameter-is-all-you-need.
- Diffusion Dunks on Autoregressive in Low-Data: CMUâs blog, Diffusion Beats Autoregressive in Data-Constrained Settings, argues diffusion models outperform autoregressive methods when data is scarce.
- Researchers flagged missing citations and pointed to a related preprint (arXiv:2410.07041) that âgeneralizes the approach betterâ but wasnât cited.
- Repeat-4x Data Trick Plays Even: A study (arXiv:2305.16264) showed repeating data 4x and shuffling each epoch matches training on the same amount of unique data.
- Practitioners discussed applying the trick to MLPs, treating inter-epoch shuffling as a cheap regularizer for data-limited regimes.
3. Compute Megadeals & GPU Systems
- OpenAIâNVIDIA Lock a ~$100B, 10 GW GPU Pact: Latent Space members discussed an OpenAIâNVIDIA plan to deploy up to 10 gigawatts of NVIDIA systemsâvalued around $100Bâfor next-gen datacenters starting lateâ2026.
- Reactions ranged from stock optimism to debates over vendor financing, AGI expectations, and whether end users will ever feel the added compute.
- Modular 25.6 Unifies GPUs, MAX Flexes: Modular shipped Modular 25.6: Unifying the latest GPUs with support for NVIDIA Blackwell (B200), AMD MI355X, and Apple Silicon, powered by MAX.
- Early results claim MAX on MI355X can outperform vLLM on Blackwell, hinting at aggressive cross-vendor tuning and a unified developer workflow.
- NVSwitch Know-How Boosts Multi-GPU Throughput: Engineers shared a primer on sharing memory addresses across GPUs and leveraging NVSwitch for reductions at Stuart Sul on X.
- These patterns matter for bandwidth-bound collectives and activation flows where efficient interconnects keep GPU utilization high.
4. Agent Protocols & Constrained Outputs
- MCP Adds response_schema for Structured Sampling: The MCP team discussed adding response_schema to the sampling protocol to request structured outputs, tracked via modelcontextprotocol/issues/1030.
- Contributors expect modest SDK changes plus provider-specific integration, with a volunteer targeting an October demo implementation.
- MCP Registry Publishing & Remote Servers Land: Publishing guidance for the MCP Registry arrived at Publishing an MCP server, covering
server.json
, status, repo URL, and remotes.- A reference for remote configs documents
streamable-http
endpoints at Generic server.json reference.
- A reference for remote configs documents
- vLLM Bakes In Grammar-Guided Decoding: Developers highlighted guided decoding that constrains logits with formal grammars in vLLM, see vLLM sampling.py.
- They contrasted this with KernelBench 0âshot evals that skip constraints, noting grammars could eliminate many compiler errors upfront.
5. Open-Source Platforms, DBs, and Communities
- CowabungaAI Forks LeapfrogAI as âMilitary-Gradeâ PaaS: CowabungaAI, an open-source fork of LeapfrogAI with chat, image gen, and an OpenAI-compatible API, launched at GitHub: cowabungaai.
- Its creator touted large code improvements and offered discounted commercial support and licensing for adoption.
- serverlessVector: Pure-Go VectorDB Debuts: A minimal Golang vector database, serverlessVector, is available at takara-ai/serverlessVector.
- Engineers can test a pure-Go vectorDB suitable for embedded/serverless use without external dependencies.
- HuggingâScience Discord Spins Up: A new hugging-science Discord for open projects in fusion, physics, and eval launched at discord.gg/hU9mdFPB.
- Organizers are recruiting Team Leaders, signaling momentum for domain-focused open-science collaborations.
Discord: High level Discord summaries
Perplexity AI Discord
- Comet Browser Invites Going Out: Members shared invitation links for Perplexityâs Comet browser such as this link for users to try out, and confirmed itâs available on Windows as well as Mac.
- One user reported reaching the Comet daily limit and suggested upgrading to a Max subscription for higher usage limits.
- Perplexity Better Than ChatGPT For Research?: A user argued that ChatGPT is known for hallucinating, while Perplexity is better for research purposes, specifically when retrieving trustworthy data using this Perplexity AI search.
- Another member echoed this sentiment.
- GPT-5 Launch Anticipation: Enthusiasm is building for upcoming models like GPT-5, with some preparing for a potential third AI winter, while others dismiss that theory.
- One member doubts the current GPT-4 models being smarter than the newest generation, while others express excitement for the advances in AI that can help them understand the universe itself.
- Perplexity Pro Users Feel Overlooked: A user shared a tweet highlighting complaints about Perplexity Pro users feeling left behind regarding feature parity with Max subscribers.
- A respondent dismissed these concerns, anticipating the new features would not work properly for another four months.
- Shareable Threads Encouraged: A member reminded others to ensure their threads are set to Shareable, linking to a specific Discord channel message.
- This should allow easier access to others.
Unsloth AI (Daniel Han) Discord
- VRAM Goes BRRR for Massive Context Windows: A user calculated that a 1M context length model would require 600GB VRAM, while another user questioned the accuracy of the VRAM calculator for their companyâs needs with 16 concurrent users and Qwen 3 30B models.
- It was noted that performance significantly degrades on a 30B model beyond 100k context length and training on shorter contexts is sufficient after initial context extension.
- DeepSeekâs Deep Dive with Huawei?: A user mentioned that DeepSeek might be experiencing issues due to using Huawei Ascent chips, with the newest version DeepSeek-V3.1-Terminus showing bug fixes.
- The fixes have somewhat degraded in reasoning without tool use and slight improvements in agentic tool use.
- Quantization Quagmire: Google and Apple Join: Members discussed various quantization techniques including QAT (Quantization Aware Training) and its successful implementations by Google and OpenAI.
- There was also mention of Appleâs super-weight research and experiments with NVFP4 and MXFP4, with NVFP4 showing a slight performance lead, and Unslothâs DeepSeek R1 blog.
- Money Talks: OOM Error Edition: A user is willing to pay 50 USDT for assistance in resolving an out-of-memory (OOM) issue after reviewing tutorials.
- Another user echoed the same sentiment, indicating a potential demand for paid support within the community.
- Diffusion Dominates Autoregressive in Data-Constrained Settings: A blog post suggests diffusion models outperform autoregressive models when data is limited.
- Another cited paper indicates that repeating data 4x and shuffling after each epoch yields results similar to using unique training data.
LMArena Discord
- Indonesian Video Prompt Creates Vivid Imagery: A member shared an Indonesian prompt to generate a video from a photo, detailing scene, movement, stylization, audio, and duration specifications, requesting a 2.5D hybrid animation style with predominantly blue and white colors and neon red in the background, accompanied by heavy rock music at 150 BPM.
- Translation provided: Generate video from photos [upload photos].
- Grok 4 Fast Undercuts Gemini Flash: Discussion arose around Grok 4 Fastâs performance relative to Gemini Flash, with one member stating that Grok 4 Fast is significantly better and cheaper.
- It was further noted that Grok 4 Fast pricing encourages others to offer competitive pricing.
- Seedream 4 2k Struggles with Ethnicity: Users are reporting issues with Seedream 4 2k failing to maintain the integrity of the characterâs ethnicity when using multiple references, and Seedream4 2k is better for speed of generation with good results.
- One user said it gets multiple reference images absolutely wrong, like all the time, and sometimes it give wrong output even in single image reference.
- AI Models Spark Debate in Medicine: Discussion covers the potential of AI models in healthcare, with concerns raised about their ability to identify fatal drug combinations.
- A member with experience in drug discovery stated, It works fine with fine tuned model with real data.
- Gemini 3.0 Flash Integrates Everywhere: Amidst a flurry of Gemini integrations, speculation intensifies around a potential Gemini 3.0 Flash release, possibly featuring integrated video capabilities, and potential deployment in mass home assistant devices.
- Members are wondering why Google has been deploying Gemini everywhere this week.
Cursor Community Discord
- Cursor Token Use Causes Sticker Shock: A user reported unexpectedly high token usage on a new Cursor account, clocking 146k tokens in 22 prompts, and asked for clarification.
- Community members explained that old chat logs and attached files inflate token usage, linking to Cursorâs learn pages on context.
- Kaspersky Mistakenly Flags Cursor as Malware: A user reported that Kaspersky is flagging Cursor as malware, sparking discussion around false positives.
- Cursor support requested logs to investigate, reassuring users itâs a generic warning related to potentially unwanted app (PUA) detection.
- SpecStory Auto-Backups Chat Exports: Users shared SpecStory, which automatically exports chats as files, to get around issues with randomly corrupted chats.
- One user noted they wouldnât submit their chats to 3rd parties, and thus would rather use the local version of the tool.
- GPT-5 Speculated to Offer Cheaper Limits than Claude Sonnet: The community speculated that because GPT-5 is cheaper than Claude Sonnet 4, it will offer better limits.
- While someone pointed out that GPT-Mini is free, a user clarified they were referring to Codex.
GPU MODE Discord
- vLLM Eyes Image/Video Generation: A member considered the alignment of adding image or video generation capabilities to vLLM, planning to contact the multimodal features team.
- The discussion explores expanding vLLMâs functionalities beyond language models.
- GB300s powers Quantum Gravity Research: A member plans to use a substantial number of GB300s to facilitate high-compute scale diffusion for magnetohydrodynamics and loop quantum gravity modeling.
- The member notes the magnetohydrodynamics model is more likely to succeed compared to the highly experimental loop quantum gravity approach.
- CPU strangling MLPerf on local runs: A member reported running MLPerf inference locally is bottlenecked by the CPU, despite sufficient VRAM.
- The low GPU utilization indicates a significant performance issue that needs resolution.
- Tianqi Chen Discusses ML Systems Evolution: A recent interview with Tianqi Chen discussed Machine Learning Systems, XGBoost, MXNet, TVM, and MLC LLM.
- Chenâs reflections include his work at OctoML, and his academic contributions at CMU and UW.
- New kids CowabungaAI splinters from LeapfrogAI: An open source fork of LeapfrogAI called CowabungaAI was announced; it is a military-grade AI Platform as a Service by Unicorn Defense.
- It has similar functionalities to OpenAI, including chat, image generation, and an OpenAI compatible API, and is available on GitHub.
HuggingFace Discord
- Hugging-Science Discord Launches: A new
hugging-science
Discord channel has launched, focusing on open-source efforts in fusion, physics, and eval, accessible here.- The channel seeks Team Leaders for projects, providing a leadership opportunity to guide exciting initiatives.
- Diffusion ODE Solver steps down to 8: An independent researcher has unveiled an ODE solver for diffusion models, achieving 8-step inference, outperforming DPM++2mâs 20 steps, and 5-step inference rivalling the latest distillation methods; see the paper on Zenodo and code on GitHub.
- This training-free improvement reduces computational costs by ~60% while enhancing quality by better tracing the probability flow trajectory during inference.
- Golang VectorDB sees the light: A member has created serverlessVector, a pure Golang vectorDB and provided a link to the GitHub repo.
- It is written in Go and available for immediate testing and implementation.
- HF Inference Providers face Quality Complaints: Members expressed concerns about the quality of HF Inference providers, wondering how HF guarantees the quality of inference endpoints, especially concerning quantization.
- They added that they feel endpoints should be default be zdr.
Latent Space Discord
- DeepSeek Model Terminus Arrives!: DeepSeek launched the final v3.1 iteration, Terminus, with improved language consistency and code/search agent capabilities; open weights are available on Hugging Face.
- The community immediately inquired about DeepSeek-R2, hinting at future developments.
- Untapped Capital launches Fund II: Yohei Nakajima announced Untapped Capital Fund II, a $250k pre-seed vehicle, continuing its mission to back founders outside traditional networks.
- The team has shifted to a top-down approach, proactively sourcing startups based on early trend identification.
- Alibabaâs Qwen3-TTS makes its debut!: Alibabaâs Tongyi Lab introduced Qwen3-TTS, a text-to-speech model with multiple timbres and language support, optimized for English and Chinese; see ModelStudio documentation.
- User queries focused on open-source availability and API pricing, though the official team did not address open-source plans.
- OpenAI and NVIDIA Strike $100B GPU Mega-Deal!: OpenAI and NVIDIA partnered to deploy up to 10 gigawatts (millions of GPUs) of NVIDIA systemsâvalued at ~$100 Bâfor OpenAIâs next-gen datacenters starting late-2026.
- Reactions ranged from celebrating the stock boost to debating vendor financing, AGI expectations, and the potential impact on end-users.
- Among AIs Benchmark Tests Social Smarts: Shrey Kothari introduced Among AIs, a benchmark evaluating language modelsâ deception, persuasion, and coordination within Among Us; GPT-5 excelled as both Impostor and Crewmate.
- Discussions included model omissions (Grok 4 and Gemini 2.5 Pro incoming), game selection, data concerns, discussion rules (3-turn debates), and enthusiasm for game-based AI evaluations.
Eleuther Discord
- Text-davinci-003 Debuts Day-and-Date with ChatGPT: Text-davinci-003 launched the same day as ChatGPT, with OpenAI quickly labeling it a GPT-3.5 model.
- An insider clarified that an updated Codex model, code-davinci-002, was also part of the GPT 3.5 series and released the day before ChatGPT.
- Diffusion ODE Solver Races Ahead: An independent researcher crafted a new ODE solver for diffusion models hitting 8-step inference that outpaces DPM++2mâs 20-step inference in FID scores while slashing computational costs by ~60%.
- The solver enhances inference by better tracing the probability flow trajectory, detailed in a WACV 2025 paper with code available on GitHub.
- GPT-3.5 Family Tree: Members are hashing out the origins of ChatGPT and other GPT-3.5 models like code-davinci-002, text-davinci-002, and text-davinci-003.
- One insider hinted that ChatGPT was fine-tuned from a chain-finetuned, unreleased model.
- Benchmarking MMLU Subtasks: The community considered the possibility of benchmarking an MMLU pro subtask using lm-eval, focusing on subsets like mmlu_law.
- Exploring this capability could allow for more detailed and precise evaluations of AI model skills within the lm-eval framework.
Nous Research AI Discord
- HuggingFace Comments Spark Laughter: Members found the comments on HuggingFace humorous, with one stating yeh i was lmaoing when i saw that.
- The specific nature of these comments was not detailed, but it indicates an active and engaged community reaction to content on the platform.
- OLMo-3 Safetensors Search Intensifies: A member is actively seeking leads on OLMo-3 safetensors, and has joined a Discord to track progress.
- Although inference code is available, they highlighted that no weights on HF yet, suggesting the model isnât fully accessible for use.
- Qwen3 Omni Launches with Multimodal Prowess: Qwen3 Omni-30B-A3B-Instruct is now live, boasting 36B parameters and multimodal capabilities.
- Reportedly, it beats Whisper v2/3 and Voxtral 3B in scam call audio processing, supporting 17 audio input languages and 10 output languages.
- Realtime Perceptual AI Race Commences: The community anticipates the rise of realtime perceptual AI (audio, video, and other modalities simultaneously).
- Appleâs realtime vision model was mentioned as a potential indicator of developments, sparking curiosity about the lack of public releases.
- LoRA Training Achievable on RTX 4090: It was mentioned that training a LoRA on 14B models locally is feasible using a single RTX 4090.
- A member is working on something similar, and the constraints theyâre facing are bandwidth and latency.
aider (Paul Gauthier) Discord
- Aider Forks Navigate with Navigator Mode: Members suggested using aider forks with navigator mode for an automated experience using uv pip install on fork repos or the aider-ce package.
- The package integrates MCP and navigator mode to streamline the coding process.
- Augment CLI Excels at Codebase Augmentation: The Augment CLI tool shines with large codebases, particularly when using Claude Code (CC) with Opus/Sonnet and Codex for GPT-5.
- The user mentioned that Codex doesnât have an API key, while Perplexity MCP works well with MCP integration.
- Deepseek V3.1 Configuration Guidance Sought: A user requested advice on the initial setup and configuration of Deepseek V3.1, along with preferred web search tools.
- Because aider does not have a built-in tool, one member suggested Perplexity MCP for web search to work with it.
- Aider Agents Multiply via Command Line: Members discussed setting up multiple aider agents via the command line for external orchestration, rather than a built-in solution.
- The suggestion was to use git worktrees for concurrent modifications to the same repository, which enables running multiple agents simultaneously.
- LLM Loses Track Editting Prompt Files: When using a file to set up the prompt, aider prompts to edit files and, simultaneously, the LLM gets confused about the task at hand.
- The LLM then asks for clarification of intent, specifically whether it should act as the âUserâ within the APM framework or modify a file using the SEARCH/REPLACE block format.
Modular (Mojo đ„) Discord
- Rustaceans summon Mojo via FFI: Members have been exploring using FFI to call Mojo from Rust, which is similar to calling C via the C ABI.
- To ensure correct type handling, the Mojo function must be marked with
@export(ABI="C")
.
- To ensure correct type handling, the Mojo function must be marked with
- Manual Mojo Binding still only option: The creation of C header -> Mojo binding generators remains a work in progress (WIP), rendering CXX assistance unavailable.
- Currently, generating Mojo bindings necessitates a manual approach.
- Windows Support still MIA: The question of Windows support for Mojo was raised among users.
- The response indicated that it is not coming any time soon.
- Modular 25.6 turbocharges GPU Performance: Modular Platform 25.6 is live, delivering peak performance across the latest GPUs including NVIDIA Blackwell (B200) and AMD MI355X, check out the blog post for more information.
- Early results show MAX on MI355X can even outperform vLLM on Blackwell, with unified GPU programming now available for consumer GPUs like Apple Silicon, AMD, and NVIDIA.
- MAX demands .mojopkg: To use MAX, a
.mojopkg
file is required alongside the executable, containing the highest level MLIR that Mojo can produce after parsing, for the runtimeâs JIT compiler.- For platforms hiding hardware details (Apple, NVIDIA GPUs), Mojo hands off compilation to the driver, performing a one-shot JIT without profiling, unlike V8 or Java.
Yannick Kilcher Discord
- JEPA Chat Incoming: A chat about Joint Embedding Predictive Architecture (JEPA) paper (https://www.arxiv.org/abs/2509.14252) by Yann LeCun is scheduled for <t:1758567600:t>.
- One member linked to an OpenReview document about JEPA along with a YouTube video and two X (formerly Twitter) posts and [https://fxtwitter.com/randall_balestr/status/1969987315750584617
- GPT Gets Philosophical, Scores Low: A user rated a GPT model a 7/10 for parsing their philosophy, but only a 4/10 for expanding on it, and 3/10 for formatting, deeming it not good enough.
- The user indicated improved results after further interactions, suggesting either improved prompting skills or enhanced machine reading capabilities.
- Presenter Seeks Optimal Paper Time: A member asked about the best time to present a paper, suggesting an earlier session for those in the eastern timezone.
- The member suggested presenting 6 hours earlier or later on most days.
DSPy Discord
- Environment Variables Trump MCP Secrets: A member cautioned against passing secrets through MCP, advising the use of environment variables for single-user services, or login+vault/OAuth for more complex setups.
- Their JIRA MCP server implementation uses stdio and retrieves credentials from the environment, deploying per single-user to avoid leaking secrets via process tables.
- DSPy Optimization Suffers from Trace ID Jitters: A user highlighted that initializing modules with a trace ID results in the same ID being used for all documents when running a batch after optimizing, and asked how to handle per-item trace IDs in DSPy modules without breaking optimization.
- They considered recreating modules per article (too expensive) and moving the trace ID to forward calls, questioning if this would affect optimization since the trace_id gets passed to llm gateway for logs and auditing.
- GEPA Silent on ReAct Context Overflows: A user asked about experiences with GEPA for ReAct Agents, focusing on how context overflows would be handled with long agent trajectories.
- Unfortunately, no one had a story to tell.
MCP Contributors (Official) Discord
- Response Schema Gains Support: An issue to Add response_schema to the MCP Sampling Protocol was converted into a discussion with a member willing to implement a demo in October, enabling requests for structured output when using sampling via this issue.
- The consensus is that implementing this in the SDK isnât complex, with the main effort lying in the integration between the SDK and the LLM API, potentially needing provider-specific code.
- Claudeâs Constrained Output Capability Examined: The group discussed presenting constrained output as a client capability, with the issue that Claude models are incorrectly identified as not supporting this feature.
- The preferred approach is to present response_schema support, allowing the client host to determine the actual implementation.
- MCP Registry Publishing Procedures: Instructions for publishing servers to the Model Context Protocol (MCP) Registry have been shared, beginning with this guide.
- The guide includes steps to create an empty GitHub repo and a
server.json
file that specifies name, description, status, repository URL, and remote endpoint.
- The guide includes steps to create an empty GitHub repo and a
- Remote Server Configurations Unveiled: Remote server configurations for the Model Context Protocol (MCP) have been linked, featuring this reference.
- The provided
server.json
example defines a$schema
and includes aremotes
array, specifying thestreamable-http
type and URL.
- The provided
- MCP Install Instructions Auto-Generated: A member mentioned using a tool to generate a readme file with instructions for installing the Model Context Protocol (MCP) in various clients via MCP Install Instructions.
- The tool was reportedly âpretty coolâ and beneficial for creating installation guides.
tinygrad (George Hotz) Discord
- Colfax Meeting on Monday: There will be a meeting #89 on Monday at 9am San Diego time for a company update.
- Starting next week, the meeting will be 3 hours earlier.
- RANGEIFY Progress Questioned: Members questioned whether RANGEIFY would be default by the end of the week, including store, assign, group reduce, disk, jit, const folding, buf limit, llm, and openpilot.
- It was noted that children are not making progress and image is not complete.
- CuTe DSL a Potential Gamechanger: Members mentioned that the CuTe DSL is a potential gamechanger.
- They added that ThunderKittens is nicestarting.
Moonshot AI (Kimi K-2) Discord
- Kimi Rejects Provocation Attempts: One member shared an experience of provoking Kimi, appreciating its rejection of blind obedience or sympathy.
- Another member expressed a similar sentiment, noting that Kimiâs attitude is a reason why itâs their favorite model.
- Claude Suffers Prompt Injection Debacle: Members discuss how Claude has been weakened due to prompt injection techniques, leading it to disagree when context exceeds a certain length.
- They noted that it is unlike Kimi K2, and some express disappointment in the changes to Claude.
Manus.im Discord Discord
- GenAI E-Book Reader goes Warp Speed: The latest release of the GenAI E-Book Reader introduces Generative Intelligence features for enhanced text clarification, summarization, and dictionary functions.
- The new version integrates with OpenRouter.ia, granting users access to over 500 large-scale language models, as showcased in this video.
- OpenRouter.ia plugs into GenAI E-Book Reader: The GenAI E-Book Reader now supports OpenRouter.ia, opening up access to over 500 large-scale language models for enriched reading assistance.
- Users can now utilize a diverse range of models for text clarification, summarization, and advanced dictionary functionalities.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (1175 messagesđ„đ„đ„):
Comet Browser invitation, GPTs Agents training, OpenAI Platform's sidebars, Comet availability for ipadOS, AI winter
- Comet browser invites going out: A member shared an invitation link for Perplexityâs Comet browser for others to try out.
- Another confirmed that Comet is available on Windows too, not just Mac: Just open it on a windows machine.
- Comet Daily limit reached, upgrade available: A user reported receiving a persistent notification saying Comet personal search daily limit reached.
- A member suggested upgrading to a Max subscription for higher usage limits.
- Perplexity is better than ChatGPT for research sort of: A user mentions that ChatGPT is known for hallucinating, generating information that is not real, whereas Perplexity is better for research purposes.
- Another member stated Perplexity is good for research sort of, but chatgpt i have heard that it hallucinates*.
- GPT-5 launch imminent, experts claim: Multiple users express excitement for upcoming models like GPT-5, and some say they are preparing for the third AI winter, while others dismiss this.
- One member stated that they highly doubt about the gpt 4 models being smarter than the newest generation, and others expressed their excitement to see With that, weâll understand the universe itself.
- Perplexity Pro users feeling left behind: A user shared a link to a tweet complaining about Perplexity Pro users being left behind in terms of features compared to Max subscribers.
- Another responded they are tripping about some B.S. that probably doesnât even work properly for probably another four months.
Perplexity AI â· #sharing (5 messages):
Comet Invitation, Shareable Threads, Trustworthy Data, Invitation Request
- Comet Invite Drops: A member shared a Comet invitation link for others to try out Comet.
- Shareable Threads Encouraged: A member reminded others to ensure their threads are set to Shareable, linking to a specific Discord channel message.
- Trustworthy Data Questioned: A link to a Perplexity AI search about trustworthy data was shared.
- Invitation Plea: Another member requested an invite, providing another invitation link and a link to Chris Bijuâs profile.
Unsloth AI (Daniel Han) â· #general (374 messagesđ„đ„):
VRAM Usage for 1M Context Length, GRPO Fine-Tuning for GPT-OSS-20B, DeepSeek V3.1 Terminus and Huawei Ascent Chips, Qwen3 and Data Privacy Concerns, QAT and GGUF Quants
- Ramping Up VRAM Requirements for Large Context Windows: A user calculated that a 1M context length model would require 600GB VRAM, while another user questioned the accuracy of the VRAM calculator for their companyâs needs with 16 concurrent users and Qwen 3 30B models.
- It was noted that performance significantly degrades on a 30B model beyond 100k context length and training on shorter contexts is sufficient after initial context extension.
- GRPOâs Grand Reveal Soon?: Members discussed the imminent release of GRPO finetuning for GPT-OSS-20B and its limitations for increasing context length on simple datasets like GSM8k.
- It was suggested to use at least 100 examples for a meaningful test and over 1000 for optimal results when training a model for a specific programming task.
- DeepSeekâs Deep Dive with Huawei?: A user mentioned that DeepSeek might be experiencing issues due to using Huawei Ascent chips.
- DeepSeek has released DeepSeek-V3.1-Terminus which is a small iteration with bug fixes, but it somewhat degraded in reasoning without tool use and slight improvements in agentic tool use.
- Qwen3âs Quietly Questionable Quotas?: The topic of data privacy arose in the context of using OpenAI or Gemini paid APIs, with the consensus being that users should assume their data is being used for training despite any stated policies.
- A member shared a link to a court order requiring OpenAI to preserve all ChatGPT logs, including deleted temporary chats and API requests.
- Quantizationâs Quantum Quest: QAT vs MXFP4: Members discussed various quantization techniques including QAT (Quantization Aware Training) and its successful implementations by Google and OpenAI.
- There was also mention of Appleâs super-weight research and experiments with NVFP4 and MXFP4, with NVFP4 showing a slight performance lead. There was also conversation of Unslothâs DeepSeek R1 blog post.
Unsloth AI (Daniel Han) â· #introduce-yourself (1 messages):
Collaboration Opportunities, Software Engineering, Small Business Ventures
- Collaboration: A New Business is Brewing: A software engineer and small business owner has opened the door to potential collaboration.
- This could lead to new projects, ventures, or partnerships within the community.
- Software Engineering Collab: A software engineer is seeking collaboration opportunities, hinting at potential projects needing technical expertise.
- This may involve software development, system design, or technical consulting.
Unsloth AI (Daniel Han) â· #off-topic (41 messagesđ„):
Loss Curve Success, New iPhone Acquisition, CS Uni vs Bootcamps, Gacha Game Ratios, DataSeek Tool
- Model Training Yields Good Loss Curve: A member celebrated achieving a satisfactory loss curve in their model training experiment (image attached).
- They clarified it wasnât an LLM, but rather an experiment to learn training more types of models, fueled by coffee and Noccos (180mg caffeine each).
- iPhone Upgrade Sparks Sheet Debate: A member showed off a new iPhone (image attached).
- One response jokingly suggested selling the phone and delaying bed sheet replacement while another member said Every time I see such pics I feel better about how I live lol.
- CS Uni Questioned vs Bootcamps: A member said that CS uni was worse than no uni and these bootcamps were unironically better option (planting less outdated shit in your head, faster and cheaper).
- Gacha Game Ratio Formula Leaked: A member leaked a macro that gacha devs are using to determine new characters (image attached).
- The leaked code joke showed that
if (banner_needs_money) -> add curvy adult woman
and similarly for other conditions.
- The leaked code joke showed that
- DataSeek for Agent Sampling: A member shared DataSeek, a tool they created for gathering samples with agents.
- They used the tool to gather 1000 samples for their claimify-related project.
Unsloth AI (Daniel Han) â· #help (23 messagesđ„):
OOM Errors & USDT, Blackwell CUDA Issues, Orpheus TTS Fine-Tuning
- Money for Nothing? User offers USDT for OOM fix: A user is willing to pay 50 USDT for assistance in resolving an out-of-memory (OOM) issue after reviewing tutorials.
- Another user echoed the same sentiment, indicating a potential demand for paid support within the community.
- Blackwell Blues: Debugging CUDA with New Architectures: A user rebuilt their environment with CUDA 12.8/12.9 and ARCH 120 to support their Blackwell card, but continues to face issues, particularly with TP (Tensor Parallelism).
- Theyâve found that offloading to CPU or using only 1 GPU works but suspect a conflict possibly related to disabling amd_iommu for vLLM, leading to funky cuda errors.
- Orpheus in Error: Batch Size Mismatch in TTS Fine-Tuning: A user encountered a
TorchRuntimeError
during fine-tuning of Orpheus TTS using the provided notebook after switching to their own dataset, with the error indicating an input batch_size mismatch.- It was identified that the userâs dataset had variable length samples, and the suggested fix of increasing the max_seq_length (or removing longer samples) resolved the issue, as described in the TRL documentation.
Unsloth AI (Daniel Han) â· #showcase (1 messages):
LLMs, Fine Tuning, Task-Specific Models
- LLMs Tailored for Tasks: The future involves LLMs fine-tuned for specific tasks, even down to individual projects.
- This approach optimizes performance and relevance in specialized applications, ensuring models are purpose-built.
- The Dawn of Bespoke Models: The user thinks LLMs will become highly specialized, honed for precise tasks and projects.
- This evolution promises efficiency, accuracy, and adaptability in diverse domains.
Unsloth AI (Daniel Han) â· #research (13 messagesđ„):
Diffusion vs Autoregressive, Data Repeating, Paper Citation, Peer Review
- Diffusion Dominates Autoregressive Approaches: A blog post suggests diffusion models outperform autoregressive models when data is limited.
- Another cited paper indicates that repeating data 4x and shuffling after each epoch yields results similar to using unique training data.
- Data Repeating Shuffles the Deck: Repeating the same data 4x and shuffling after each epoch can yield similar outcomes to using the same amount of unique training data (arxiv link).
- This suggests that shuffling between epochs is beneficial, prompting consideration for applying this technique to MLPs.
- Missing Citation Sparks Debate: A member pointed out that this paper seems to be a repeat of the CMU blog post but with nicer graphs, but the CMU blog post doesnât cite the arxiv paper.
- The response was that the later paper generalizes the approach better, but should have been cited.
- Peer Reviewers Save the Day: A member asked about the severity of not citing related work, and the response was in general, this is the job of a peer reviewer.
- It was further noted that papers on arXiv are often unreviewed and subject to updates, leaving room for corrections.
LMArena â· #general (446 messagesđ„đ„đ„):
Video generation from photos in Indonesian, Grok 4 Fast performance, Seedream 4 2k vs High-Res, AI in medical field, Gemini 3.0 Flash rumors
- Indonesian Video Generation Prompt Shared: A member shared an Indonesian prompt to generate a video from a photo, detailing scene, movement, stylization, audio, and duration specifications, requesting a 2.5D hybrid animation style with predominantly blue and white colors and neon red in the background, accompanied by heavy rock music at 150 BPM.
- Translation provided: Generate video from photos [upload photos].
- Grok 4 Fast Surprises with Competitive Pricing: Discussion arose around Grok 4 Fastâs performance relative to Gemini Flash, with one member stating that Grok 4 Fast is significantly better and cheaper.
- It was further noted that Grok 4 Fast pricing encourages others to offer competitive pricing.
- Seedream 4 2k Image reference issues emerge: Users are reporting issues with Seedream 4 2k failing to maintain the integrity of the characters ethnicity when using multiple references, and Seedream4 2k is better for speed of generation with good results.
- One user said it gets multiple reference images absolutely wrong, like all the time, and sometimes it give wrong output even in single image reference
- Debate sparks over AI models in medical field and drug discovery: Discussion covers the potential of AI models in healthcare, with concerns raised about their ability to identify fatal drug combinations.
- A member with experience in drug discovery stated, It works fine with fine tuned model with real data.
- Whispers of the Wind: Gemini 3.0 Flash Rumors Fly: Amidst a flurry of Gemini integrations, speculation intensifies around a potential Gemini 3.0 Flash release, possibly featuring integrated video capabilities, and potential deployment in mass home assistant devices.
- Members are wondering why Google has been deploying Gemini everywhere this week.
LMArena â· #announcements (1 messages):
Seedream-4, LMArena Models
- Seedream-4-2k Joins the LMArena Battle: The model
seedream-4-2k
is now available in Battle, Direct, & Side by Side modes on the LMArena.- The
seedream-4-high-res
model is not available at this time.
- The
- High-Res Still MIA: Model
seedream-4-high-res
is currently unavailable, with updates to be announced in the channel.- Users are encouraged to monitor the announcements channel for further details.
Cursor Community â· #general (419 messagesđ„đ„đ„):
Token Usage, Kaspersky Malware Flag, Chat Exports, GPT-5 Pricing
- Cursor Token Use Baffles User: A user noticed unexpectedly high token usage on a new Cursor account, spending 146k tokens in 22 prompts, and sought clarification from the community.
- Members explained that old chat logs and attached files contribute to context and token usage, and provided a link to Cursorâs new learn pages on context.
- Kaspersky Flags Cursor as Malware: A user reported that Kaspersky is flagging Cursor as malware, prompting discussion about false positives and potential reasons such as Cursor modifying right-click menus.
- A community member suggested double-checking the files, while Cursor support requested logs to investigate, reassuring users that itâs a generic warning related to potentially unwanted app (PUA) detection.
- SpecStory Automates Chat Exports: Users discussed the issues with randomly corrupted chats and the lack of a reapply feature after canceling changes, then sought a solution by sharing SpecStory, which automatically exports chats as files.
- One user noted they wouldnât submit their chats to 3rd parties, and thus use the local version of the tool.
- GPT-5 Pricing is Cheaper than Claude Sonnet: The community compared Claude Code and Codex CLI limits and suggested that because GPT-5 is cheaper than Claude Sonnet 4, it will yield better limits.
- One pointed out that GPT-Mini is free, but a user clarified that they meant Codex.
GPU MODE â· #general (13 messagesđ„):
vLLM affiliation, Image/Video Gen in vLLM, Sliding/Striding Multi-Node DiT Kernel, GB300s for High Compute Scale, Magnetohydrodynamics and Loop Quantum Gravity Modeling
- Image/Video Gen in vLLM: A Good Fit?: A member inquired about the potential alignment of adding image or video generation capabilities to vLLMâs goals, considering contacting the multimodal features team on their Slack.
- Sliding/Striding Multi-Node DiT Kernel Advances: Members discussed sliding/striding kernels, referencing an xDiT style multi node para approach and implementations by the thunderkittens team for Hopper and the cutlass team for Big Blackwell.
- One member is trying to create a fast, early sliding/striding multi-node DiT kernel for GB300s due to the high compute scale required for math/physics research.
- GB300s Powering Quantum Gravity Research: A member aims to leverage a substantial number of GB300s to facilitate high-compute scale diffusion for applications in magnetohydrodynamics and loop quantum gravity modeling.
- The member predicts the magnetohydrodynamics model will likely succeed while noting the loop quantum gravity approach is highly experimental.
GPU MODE â· #triton (1 messages):
exquisite_lemur_80905: Thereâs also TRITON_ALWAYS_COMPILE
to ignore the cache
GPU MODE â· #cuda (1 messages):
MLPerf Inference, CPU Bottleneck, GPU Utilization
- MLPerf Inference bottlenecked by CPU: A member reported an issue with running MLPerf inference locally, where the process seems to be bottlenecked by the CPU, despite sufficient VRAM usage.
- The user notes low GPU utilization, and seeks assistance in resolving this performance issue.
- MLPerf struggles on local compute: MLPerf is a suite of benchmarks that measures the performance of machine learning hardware and software.
- Running MLPerf locally can be challenging due to resource constraints.
GPU MODE â· #torch (2 messages):
Speeding up pip install, Setting TORCH_LOGS
- Turbocharge Setup Compilation: A member inquired about speeding up setup compilation, noting that
!pip install -v pytorch-fast-transformers
doesnât seem to use multi-CPU/parallel jobs to compile files.- Unfortunately no response was provided, so this is still an open question.
- Torch Logs Configuration Tricks: A member asked about the correct way of setting torch logs output, providing an example
%env TORCH_LOGS = '__main__,+dynamo,graph,fusion,output_code'
.- No response to this question was given.
GPU MODE â· #cool-links (1 messages):
Tianqi Chen Interview, Machine Learning Systems, XGBoost, MXNet, TVM
- Tianqi Interview on ML Systems: An interview with Tianqi Chen, discussing Machine Learning Systems, XGBoost, MXNet, TVM, MLC LLM, OctoML, CMU, UW, and ACM.
- The interview also touches on the topics of long-termism and original intentions in the field of machine learning.
- OctoML Founder Interviewed: A recent interview features Tianqi Chen, the founder of OctoML, discussing his journey and insights into machine learning systems.
- Chen reflects on his work with XGBoost, MXNet, TVM, and MLC LLM, alongside his academic pursuits at CMU and UW.
GPU MODE â· #jobs (1 messages):
Remote Research Intern, Deep Learning, New Models, Model Building, Stipend Information
- Remote Research Intern Opportunity in Deep Learning: A member is seeking a remote research intern to work on complex deep learning projects, offering a stipend of 30-40k INR/month.
- The projects focus on building new models rather than engineering products using existing ones.
- Emphasis on Model Development: The internship emphasizes the development of new deep learning models, indicating a research-oriented role.
- Interested candidates are encouraged to DM their resume, GitHub profile, or any relevant projects showcasing their skills.
GPU MODE â· #beginner (1 messages):
nwyin: https://jax-ml.github.io/scaling-book/roofline/
GPU MODE â· #off-topic (6 messages):
NVIDIA Tech Demos, Tailscale Interface & Pricing, VPN Business Models
- NVIDIA Demos Evolve Bigtime: A member shared a YouTube video showcasing the evolution of NVIDIA Tech Demos from 1998 to 2025.
- The video likely highlights the progression of graphics technology and real-time rendering capabilities over the years.
- Tailscale Interface Irks Users: Some members expressed reservations about Tailscale, noting that while itâs nice, they dislike that the interface is hosted by the company.
- A member commented âalmost all vpns have rent seeking pricingâ.
- VPN Business Models Baffle Onlookers: A member stated âi canât figure out how they seem to make money the rent seeking ones or the non rent seeking ones both? i guess i dont understand any of their businessesâ.
- The discussion revolves around the challenges in understanding the business models of both rent-seeking and non-rent-seeking VPN providers.
GPU MODE â· #self-promotion (1 messages):
CowabugaAI, LeapfrogAI, Open Source AI, Military-Grade AI, Commercial AI Support
- CowabungaAI splinters from LeapfrogAI: An open source fork of LeapfrogAI, named CowabungaAI, was announced; it is a military-grade AI Platform as a Service created by Unicorn Defense, offering similar functionalities to OpenAI, including chat, image generation, and an OpenAI compatible API, and is available on GitHub.
- Commercial Support offered for CowabungaAI: The creator of CowabungaAI announced significant improvements to the source code and intends to offer commercial support for the application.
- They are offering steep discounts on licensing opportunities.
GPU MODE â· #đż (2 messages):
vLLM's guided decoding, grammars for automated code generation, kernel generation LLMs, KernelBench 0-shot evals
- vLLMâs Guidance on Grammar: A member inquired about what grammars are generally used by automated code generation or kernel generation LLMs, noting vLLMâs guided decoding support (constrain logits with a formal grammar).
- KernelBench Skips Grammar Constraints: They pointed out that the KernelBench paper does 0-shot evals and doesnât seem to do any constrained decoding.
- This is implied because they check for compiler issues, which would mostly be taken care of by the grammar constraint.
GPU MODE â· #thunderkittens (2 messages):
GPU memory sharing, NVSwitch reduction
- GPU Memory Sharing Info Valuable: A member found this explanation about sharing memory addresses across GPUs to be valuable.
- They noted there are few public resources that discuss sharing memory and using NVSwitch for reduction.
- NVSwitch Enables GPU Communication: The discussion highlighted the importance of NVSwitch in facilitating efficient communication and memory sharing between GPUs.
- This is particularly relevant for large-scale AI model training and inference, where minimizing data transfer overhead is crucial.
GPU MODE â· #edge (1 messages):
radiation-hardened chips, Jetson usage in space, chips in the magnetosphere
- Modern Chips in Space: A member expressed surprise that modern chips are radiation-hardened for space applications.
- They asked if this is due to Jetsonâs capabilities, proximity to Earth and the magnetosphere, or another reason.
- Rad-Hard Jetson?: The discussion centers on the possibility and implications of using Jetson chips in space environments, where radiation hardening is crucial.
- Concerns are raised regarding whether the radiation tolerance is inherent to the chip design, influenced by the near-Earth orbit within the magnetosphere, or achieved through other means.
GPU MODE â· #submissions (13 messagesđ„):
MI300x8, amd-all2all leaderboard, amd-gemm-rs leaderboard, Personal Bests
- MI300x8 sets Personal Bests: Members achieved personal bests on MI300x8 across the
amd-all2all
andamd-gemm-rs
leaderboards. - amd-all2all sees fierce competition: A member achieved a personal best of 2.45 ms on the
amd-all2all
leaderboard using MI300x8.- Another member secured 5th place with a time of 1171 ”s.
- amd-gemm-rs heats up the leaderboard: One member repeatedly achieved personal bests on the
amd-gemm-rs
leaderboard with MI300x8, eventually reaching 557 ”s.- Other members also submitted successful runs in the 587 ”s - 638 ”s range.
GPU MODE â· #factorio-learning-env (8 messagesđ„):
Sweep for Qwen 2 35b, Deepseek 3.1, GPT-oss progress, Release work
- SWEeping Qwen and DeepSeek this weekend!: A member ran a sweep for Qwen 2 35b and Deepseek 3.1 over the weekend.
- Progress on GPT-oss is still ongoing, and there are no new videos to share just yet.
- Release work keeps some busy: Some members reported they donât have much to report today, as they are occupied with finishing up a release.
- They are open to discussing release-related matters later in the day if needed, or during a meeting on Wednesday.
GPU MODE â· #amd-competition (7 messages):
AMD Contractor Prize Eligibility, All2All Optimizations
- AMD Contractorâs Prize Money Predicament: An AMD contractor was well-placed in a competition but was ineligible for prize money.
- Despite this, they were invited to San Francisco and met the team, which was more than the price money.
- Optimizationsâ Acceptable Spirit Questioned: A member inquired whether certain optimizations for all2all would be within acceptable parameters.
- They clarified that not sending the tokens to the other ranks at all would be unacceptable, and sought confirmation on other aspects; another user gave them permission to DM.
GPU MODE â· #cutlass (7 messages):
CUTLASS MLP Accuracy, CuTe Layouts
- CUTLASSâs MLP accuracy differs from Pytorchâs: A member reports a 7x speedup by implementing a two-layer MLP with CUTLASS, achieving approximately atol 1e-3 accuracy when compared to PyTorch.
- Another member commented that this level of error is acceptable even for fp16, especially when compared to serial CPU references due to differences in reduction orders, noting that the GPU results are numerically more accurate.
- Colfax introduces CuTe Layouts blogpost: A member shared a blog post, paper, and example repo from Colfax about CuTe Layouts.
- Another member highlighted the workâs capability to compute complicated layout algebra results âby handâ with visual simplicity, terming it the âgraphical calculus of layout diagramsâ as detailed in Chapter 4 of the paper.
GPU MODE â· #singularity-systems (2 messages):
PyTorch autograd, JAX autograd, tinygrad autograd, torch dynamo, bytecode interception
- Compiler-First Autograd Emerging: A member is leaning towards a compiler-first approach similar to tinygrad and JAX for autograd implementation, noting that not all valid Python will be valid picograd.
- They found Torch Dynamo and bytecode interception to be cool but too complex for their projectâs scope, advocating for explicit ramp-up to tinygradâs design decisions.
- Design Decisions in Language Implementation: Inspired by Krishnamurthiâs approach in PLAI, the member suggests explicitly pointing out design decisions (e.g., eager vs lazy, serial vs parallel) and alternative paths in the language implementation.
- This treats programming languages as a collection of features that can be composed together, moving away from pre-scientific classification.
- Explicit Motivation for Deep Learning Framework Components: The member wants to make the motivation very explicit for each component of a deep learning framework so that readers understand the reasons behind design choices.
- For example, readers should try optimizing deep neural networks with symbolic or numeric differentiation to understand why autodifferentiation exists.
- Motivating the Need for Fusion Compiler: Readers need to measure and profile the autograd system in eager mode to see that theyâre being bottlenecked by memory (data movement) on non-tensor core operations.
- This motivates the need for a fusion compiler to optimize memory usage.
HuggingFace â· #general (48 messagesđ„):
Hugging-Science Discord Launch, HF Inference Providers Quality, Gradients Clipping, gguf conversion with llama cpp, smollm's goals
- Hugging-Science Discord Channel Launched: A new
hugging-science
discord was launched for targeted efforts in open source, including fusion, physics, and eval, check it out here.- The channel is also looking for Team Leaders for each project; it could be a great growth opportunity to take charge and lead the way on some exciting stuff! <:hugging_rocket:968127385864134656>
- HF Inference Providers face Quality Concerns: A member expressed concerns about the quality of HF Inference providers, wondering how HF guarantees the quality of inference endpoints, especially concerning quantization.
- They added that they feel endpoints should be default be zdr.
- Grad Norms above
max_grad_norm
cause Heart Spikes: A member asked why thegrad norm
is going abovemax_grad_norm
even after setting it to 1.0.- Another member clarified that the gradients get modified internally by a clipping coefficient of
max_grad_norm/your_norm
and that the pre-clipping norm is merely for logging purposes.
- Another member clarified that the gradients get modified internally by a clipping coefficient of
gguf
conversion aided by Llama CPP: A member inquired about using a model ingguf
format to run it in LM Studio.- Another member suggested using
llama cpp
to convert it togguf
locally, referring to this github discussion.
- Another member suggested using
- Dreaming of the smollm Working Partner: A member noted that what smollm should be is a little working partner instead of a benchmaxxed brick.
- The member also said that itâs already impressive for a tiny team and having everything fully open.
HuggingFace â· #cool-finds (1 messages):
Diffusion ODE Solver, DPM++2m, WACV 2025, Hyperparameter-is-all-you-need
- Radical Reduction in Steps for ODE Solvers: An independent researcher has developed a novel ODE solver for diffusion models that achieves 8-step inference, beating DPM++2mâs 20-step inference in FID scores, and 5-step inference comparable to latest distillation methods.
- Solver Design Traces Probability Flow: The new ODE solver design better traces the probability flow trajectory during inference.
- This is a pure training-free improvement to the sampling process.
HuggingFace â· #i-made-this (7 messages):
golang vectorDB, AI agent trust challenges, AgentXTrader, protein prediction dataset
- Golang VectorDB is Born: A member created a super simple pure golang vectorDB and provided a link to the GitHub repo.
- Itâs called serverlessVector and written in Go.
- AI Agent Trust Issues Survey: A member is working on research about AI agent trust challenges in production and posted a link to an anonymous survey about AI Agent trust for anyone whoâs deployed AI agents.
- The survey focuses on identity verification and accountability issues and claims to be completely anonymous.
- AgentXTrader Wants to Trade for You: A member introduced AgentXTrader and shared a LinkedIn post showcasing it in action.
- It debates the best investment strategy.
- Protein Prediction Dataset Lands on HF Hub: A member noted that the protein prediction dataset by INRIA landed on the hub via a link to the HF collections.
- It is a dyna-repo by INRIA called mdposit.
HuggingFace â· #computer-vision (1 messages):
fingaz_ai: i havent but im also looking into that same feature i just havent isolated one yet.
HuggingFace â· #smol-course (2 messages):
In-Person Meetup in NYC, GSM8k Eval on Trained Model
- HuggingFacers plan IRL meetup in NYC: Members in NYC are planning a weekly in-person meetup for the course and are collecting availabilities via lettucemeet.com.
- Members are running GSM8k evals: A member is checking if anyone has successfully run the GSM8k eval on their trained model.
HuggingFace â· #agents-course (4 messages):
Starting the Agents Course, Backgrounds of new course members
- Fresh Faces Fine-Tune Forward: Several individuals are starting the âFine-Tuning Language Modelsâ course today, with one member from Ethiopia seeking advice.
- Itâs a great time to delve into the world of language models and connect with fellow learners.
- Diverse Devs Deep-Dive into Agents: New course members include a full-stack developer/entrepreneur aiming to apply agents for client projects, and an analytics expert from Brazil venturing into agent development.
- The course attracts a diverse range of backgrounds all looking to expand their expertise with AI agents.
Latent Space â· #ai-general-chat (44 messagesđ„):
DeepSeek Terminus, Claude resumable streaming, Untapped Capital Fund II, Alibaba Qwen3-TTS, OpenAI NVIDIA deal
- DeepSeek Model Terminus Launches!: DeepSeek released a final v3.1 iteration, Terminus, boosting language consistency, code/search agents, and benchmark stability, and the open weights are now on Hugging Face.
- The community is already asking: when is DeepSeek-R2?
- Untapped Capital Launches Fund II Pre-Seed Bonanza: Yohei Nakajima announced Untapped Capital Fund II, a $250k pre-seed, generalist vehicle that continues the 2020-founded fundâs mission to back founders outside typical networks.
- For this fund the team has shifted to a top-down approach spotting early trends and proactively sourcing startups.
- Alibabaâs Qwen3-TTS hits the stage!: Alibabaâs Tongyi Lab announced Qwen3-TTS, a new text-to-speech model supporting multiple timbres, languages, and dialects with high naturalness, currently optimized for English and Chinese.
- The community is mostly excited, but many users immediately ask whether the model will be open-sourced or what its API pricing will be; the official team replied with a ModelStudio documentation link but stayed silent on open-source plans.
- OpenAI and NVIDIA lock in 100B Mega-Deal!: OpenAI and NVIDIA sealed a strategic partnership to deploy up to 10 gigawatts (millions of GPUs) of NVIDIA systemsâvalued at ~$100 Bâlaying the groundwork for OpenAIâs next-gen datacenters beginning late-2026.
- Reactions ranged from cheering the stock surge to debating vendor-financing tactics, AGI hype, and whether ordinary users will ever see the extra compute.
- Among AIs: Social Intelligence gets a new benchmark: Shrey Kothari unveiled Among AIs, a new benchmark where top language models compete in Among Us to evaluate deception, persuasion, and coordination skills, and GPT-5 ranked highest in both Impostor and Crewmate wins.
- Replies cover missing models (Grok 4 and Gemini 2.5 Pro will join soon), choice of game, training-data concerns, discussion mechanics (3-turn debates, equal speaking time), and enthusiasm for game-based AI evaluations.
Latent Space â· #genmedia-creative-ai (4 messages):
Google Gemini, Runway AI, Runway Gen-2
- Gemini + Runway AI combine for video manipulation!: A user shared a TikTok video demonstrating how to use Google Gemini and Runway AI to replace a person in a video with another person.
- The process involves taking a screenshot of the video, using Google Gemini to create a replacement image, and then using Runway Gen-2 to apply the image to the video, but the video must be less than thirty megs.
- AI Video Manipulation Trend: The poster mentioned that similar video manipulation techniques have been circulating on Instagram for about a year or more.
- This suggests a growing trend in accessible AI-powered video editing tools and techniques.
Eleuther â· #general (34 messagesđ„):
Text-davinci-003 origin story, ChatGPT model finetuning, UChicago ML research community, GPT-3.5 series
- Text-davinci-003 launched same day as ChatGPT: Text-davinci-003 was made available in the API on the same day as the ChatGPT announcement, and OpenAI stated it was a GPT-3.5 model within 48 hours of its release.
- One member recalled that it had been released the day before ChatGPT, alongside an updated Codex model also considered part of the GPT 3.5 series, called code-davinci-002.
- ChatGPT Fine-tuned from GPT-3.5 series: Members discussed that the first ChatGPT version was fine-tuned from a model in the GPT-3.5-series, with the most likely candidate being text-davinci-003.
- One member stated, âOAI confirms here that the ChatGPT model is fine-tuned from a model that finished training in early 2022, so that canât be GPT-4 since we know GPT-4 hadnât even started pretraining until mid/late 2022.â
- Community Seeking UChicago ML Research Access: A member sought introductions to the ML research community at UChicago to attend public seminars/lab meetings for ML sys, cog neuro, mech interp, and linguistics research discussions.
- Another suggested attending research colloquia, citing advice from their â80 year old phd grandpaâ and suggesting that one might not need to ask for permission.
- GPT-3.5 Family Feud: Members debated the lineage and release order of models in the GPT-3.5 series, including code-davinci-002 and text-davinci-002 and text-davinci-003.
- One insider claimed ChatGPT was a finetune of an unreleased model, derived from chain finetuning.
Eleuther â· #research (9 messagesđ„):
Prefilling vs Decoding Intuition, Diffusion ODE Solver, WACV 2025 Submission
- Prefilling vs Decoding in Training: A member inquired whether the intuition that pre-training is entirely prefilling, and the post-training / RL stage is mostly decoding, is correct.
- The member was trying to think through the advantages of techniques that increase decoding throughput but donât help during prefilling, and thinking if post training is mostly rollouts.
- 8-Step Diffusion ODE Solver Beats DPM++: An independent researcher developed a novel ODE solver for diffusion models that achieves 8-step inference that beats DPM++2mâs 20-step inference in FID scores with ~60% reduction in computational cost.
- Human touch is behind research Paper: A member inquired how much of a research paper was LLM-generated, referring to its unusual abstract length.
- The independent researcher stated that the code and the paper do not involve any things generated by AI.
Eleuther â· #lm-thunderdome (2 messages):
MMLU pro benchmark, lm-eval
- Benchmarking MMLU Subtasks: Members discussed whether itâs possible to benchmark a subtask of MMLU pro with lm-eval, such as just mmlu_law.
- No definitive answer or method was provided in the discussion, but it implies itâs a feature worth exploring within the lm-eval framework.
- lm-eval potential features: A potential feature of lm-eval would be to benchmark specific subtasks.
- This would allow more precise testing of AI model capabilities.
Nous Research AI â· #general (34 messagesđ„):
HuggingFace comments, OLMo-3 safetensors, Qwen3 Omni, Realtime Perceptual AI, SVG coding among various LLMs
- HuggingFace comments spark laughter: Members found the comments on HuggingFace very funny.
- One member stated yeh i was lmaoing when i saw that.
- OLMo-3 safetensors search begins: A member is seeking leads on OLMo-3 safetensors, joining their Discord to follow up.
- They noted the inference code was published, but no weights on HF yet.
- Qwen3 Omni launched: Qwen3 Omni-30B-A3B-Instruct is now available, featuring 36B parameters with multimodal encoders/decoders.
- According to one member, it beats Whisper v2/3 and Voxtral 3B on scam call audio processing, supporting 17 audio input languages and 10 output languages.
- Realtime Perceptual AI race heats up: A member anticipates the next race will be realtime perceptual AI (audio, video, other modalities at the same time).
- They cited Appleâs realtime vision model as an example of whatâs brewing behind the curtains, and is curious about the lack of releases.
- SVG Coding LLM results: A member tested SVG coding among various LLMs and shared the results with an image of a duck swimming in a pond.
- This discussion touches on multi-modal, world models, visual language action and biological neural network as new hipster lingos, as LLM fixation with just language and mathematics centric approach may just not cut it alone.
Nous Research AI â· #ask-about-llms (7 messages):
LLM Training, LoRA training, Consumer hardware for LLMs, Bandwidth and Latency for LLMs
- LoRA training feasible on consumer hardware: A member mentioned that you can train a LoRA on whatever you want with 14B models locally with a single RTX 4090.
- Bandwidth & Latency constraints during LLM work: A member stated that they are working on something similar, and the constraints theyâre facing are bandwidth and latency.
aider (Paul Gauthier) â· #general (19 messagesđ„):
Navigator Mode in Aider Forks, aider-ce Package, Augment CLI, Deepseek V3.1 Setup, Web Search Tools in Aider
- Aider Forks Gain Navigator Mode: Members are encouraged to use aider forks with navigator mode for a more automated experience.
- One member suggested using uv pip install on fork repos or the aider-ce package, which includes MCP and navigator mode.
- Augment CLI Excels with Large Codebases: According to one member, the Augment CLI tool shines particularly when used with large codebases.
- They recommend Claude Code (CC) with Opus/Sonnet and Codex for GPT-5, noting that Codex lacks an API key.
- Deepseek V3.1 Configuration Tips Requested: A new user is planning to use Deepseek V3.1 and seeks advice on initial setup and configuration.
- The user also inquired about preferred web search tools, noting that aider doesnât have a built-in tool but can scrape provided URLs; one member suggested using Perplexity MCP for web search with MCP integration.
- GAS Scripting automation causes job losses?: One member created a GAS (Google App Script) that responds to quotes for their business.
- They said it performed better than a human employee, raising concerns that if this was public, it would cause massive job losses.
aider (Paul Gauthier) â· #questions-and-tips (5 messages):
Running multiple aider agents, aider asks to edit files, LLM confusion with prompt files
- Multi-Agent Aider Setup Discussed: A member inquired about setting up multiple aider agents via command line for external orchestration, rather than a built-in solution.
- The discussion suggested using git worktrees for concurrent modifications to the same repository.
- Aiderâs Edit Prompts Cause LLM Confusion: When using a file to set up the prompt, aider prompts to edit files and, simultaneously, the LLM gets confused about the task at hand.
- The LLM then asks for clarification of intent, specifically whether it should act as the âUserâ within the APM framework or modify a file using the SEARCH/REPLACE block format.
Modular (Mojo đ„) â· #general (15 messagesđ„):
FFI, Rust, C ABI, C header, Mojo binding generators
- Rustaceans invoke Mojo via FFI: Members discussed using FFI to call Mojo from Rust, with one user confirming they have done so via the C ABI.
- They added that itâs similar to calling C, and to ensure types match up, and to mark the Mojo function with
@export(ABI="C")
.
- They added that itâs similar to calling C, and to ensure types match up, and to mark the Mojo function with
- Manual Mojo Binding Required for now: A member stated that C header -> Mojo binding generators are still WIP, so CXX doesnât help.
- The only way to generate Mojo bindings is doing it manually.
- Windows Support still nonexistent: One user asked about Windows support for Mojo.
- Another responded that it is not coming any time soon.
Modular (Mojo đ„) â· #announcements (1 messages):
Modular Platform 25.6, NVIDIA Blackwell, AMD MI355X, Consumer GPUs
- Modular 25.6 Fuels GPU Performance: Modular Platform 25.6 is live, delivering peak performance across the latest GPUs including NVIDIA Blackwell (B200) and AMD MI355X.
- Early results show MAX on MI355X can even outperform vLLM on Blackwell, with unified GPU programming now available for consumer GPUs like Apple Silicon, AMD, and NVIDIA.
- Modular Unifies GPUs: Modular now supports Apple Silicon, AMD and NVIDIA consumer GPUs, bringing unified GPU programming to laptops and desktops.
- Check out the blog post for more information.
Modular (Mojo đ„) â· #mojo (7 messages):
Mojo MAX .mojopkg requirement, Mojo nightly install command, Variadic args binding in Mojo
- MAX Requires .mojopkg: Using MAX requires a
.mojopkg
file alongside the executable, which contains the highest level MLIR that Mojo can produce after parsing, for the runtimeâs JIT compiler.- For platforms hiding hardware details (Apple, NVIDIA GPUs), Mojo hands off compilation to the driver, performing a one-shot JIT without profiling, unlike V8 or Java.
- Easy Mojo Installs via PIP: The
mojo
packages are now on PyPI, allowing users to install the latest stable version of Mojo, debugger, and LSP usingpip install mojo
.- For nightly builds, use
pip install mojo --pre "mojo<1.0.0" --index-url https://dl.modular.com/public/nightly/python/simple/
.
- For nightly builds, use
- Variadic arguments binding blocked by Rust: Variadic args binding in Mojo might not be possible due to the
va_list
not being part of the function signature in C.- Even Rust doesnât currently fully support this feature (issue 44930), making it a challenging aspect to implement in
c_binder_mojo
.
- Even Rust doesnât currently fully support this feature (issue 44930), making it a challenging aspect to implement in
Yannick Kilcher â· #paper-discussion (16 messagesđ„):
New Paper Discussion, Yann LeCun's Work, Joint Embedding Predictive Architecture, Paper Presentation Opportunity
- Paper Chat Scheduled: A chat about a paper (https://www.arxiv.org/abs/2509.14252) was scheduled for <t:1758567600:t>.
- A member suggested an earlier session for those in the eastern timezone.
- Discussing Yann LeCunâs Impact: A member inquired whether the scheduled paper was related to Yann LeCunâs work.
- Another member agreed and suggested looking at the background on Joint Embedding Predictive Architecture (JEPA), linking to an OpenReview document.
- Links Galore!: Several links were shared, including a YouTube video, a tweet, another tweet, an arXiv paper, another YouTube video, and two more arXiv papers and (https://arxiv.org/abs/2509.13805).
- These resources likely relate to the discussed paper and related topics.
- Paper Presentation Invitation: A member inquired about presenting a paper and inquired about the appropriate timing.
- The member suggested doing it 6 hours earlier or later on most days.
Yannick Kilcher â· #ml-news (3 messages):
GPT parsing philosophy, Prompting improvements
- GPTâs Philosophical Parsing Gets Mixed Reviews: A user rated a GPT model a 7/10 for parsing their philosophy, but only a 4/10 for expanding on it, and 3/10 for formatting.
- They added that itâs better than most models, but not good enough.
- User experiences Improved Results: The user initially disliked the GPT results, but then reported liking the results after further interactions.
- The user said they are either getting better at prompting or the machine is better at reading.
DSPy â· #general (16 messagesđ„):
MCP Secrets, Trace IDs, GEPA for ReAct
- MCP Secrets should be stored in ENV: A user cautioned against passing secrets through MCP, advising to use environment variables for single-user services or a login+vault/OAuth for more complex setups.
- The user noted that their JIRA MCP server implementation uses stdio and retrieves credentials from the environment, deploying per single-user to avoid leaking secrets via process tables.
- Trace IDs in DSPy modules affect optimization: A user asked about handling per-item trace IDs in DSPy modules without breaking optimization, noting that initializing modules with a trace ID results in the same ID being used for all documents when running a batch after optimizing.
- The user considered recreating modules per article (too expensive) and moving the trace ID to forward calls, asking if this would affect optimization because the trace_id gets passed to llm gateway for logs and auditing.
- GEPA for ReAct Agents: A user inquired about experiences with GEPA for ReAct Agents, specifically regarding how context overflows would be handled with long agent trajectories.
- No war stories were shared.
MCP Contributors (Official) â· #general (7 messages):
MCP Sampling Protocol, response_schema addition, Claude models constrained output
- MCP Sampling Protocol addition has support: A member saw an issue to Add response_schema to the MCP Sampling Protocol that was converted into a discussion.
- Another member would be happy to try and do an implementation of this (in October) with a small demo as part of a SEP, because it would be really neat to be able to request structured output when using sampling.
- response_schema addition sounds feasible: The group discussed that it is not difficult to implement in the SDK - the work is in the integration between the SDK and LLM API / SDK.
- One member stated that a bit of provider specific code might be needed, but nothing bad.
- Claude models have constrained output: It was mentioned that presenting constrained output could be presented as a facet of the client capability.
- A member stated that Claude models show up as not supporting this feature even though they do and they believe itâs better to present response_schema support, and then the client host decides how to actually provide it.
MCP Contributors (Official) â· #general-wg (6 messages):
Model Context Protocol Registry, Publishing MCP Servers, Remote Server Configurations, MCP Install Instructions
- MCP Registry Publish Instructions Emerge: A member shared instructions for publishing servers to the Model Context Protocol (MCP) Registry, starting with this guide.
- The guide details creating an empty GitHub repo and providing a
server.json
file with details like name, description, status, repository URL, and remote endpoint.
- The guide details creating an empty GitHub repo and providing a
- Remote Server Configs for MCP: A member linked to remote server configurations for the Model Context Protocol (MCP), with this reference.
- The example
server.json
includes a$schema
definition and aremotes
array specifying thestreamable-http
type and URL.
- The example
- MCP Install Instructions Generated!: A member mentioned using a tool to generate a readme file with instructions for installing the Model Context Protocol (MCP) in various clients via MCP Install Instructions.
- He said the tool was âpretty coolâ and helpful in creating installation guides.
tinygrad (George Hotz) â· #general (6 messages):
CuTe DSL, RANGEIFY status, Company update, ThunderKittens project
- Colfax Meeting on Monday: There will be a meeting #89 on Monday at 9am San Diego time for a company update.
- Starting next week, the meeting will be 3 hours earlier.
- RANGEIFY Progress Questioned: There was a question about whether RANGEIFY would be default by the end of the week, including store, assign, group reduce, disk, jit, const folding, buf limit, llm, and openpilot.
- It was noted that children are not making progress and image is not complete.
- CuTe DSL Potential Gamechanger: Members mentioned that the CuTe DSL is a potential gamechanger.
- They added that ThunderKittens is nicestarting.
Moonshot AI (Kimi K-2) â· #general-chat (6 messages):
Kimi K-2, Prompt Injection, Claude's Lobotomization
- Kimi Rejects Provocation Attempts: One member shared an experience of provoking Kimi, appreciating its rejection of blind obedience or sympathy.
- Another member expressed a similar sentiment, noting that Kimiâs attitude is a reason why itâs their favorite model.
- Claudeâs Prompt Injection Debacle: Members discuss how Claude has been weakened due to prompt injection techniques, leading it to disagree when context exceeds a certain length.
- They noted that it is unlike Kimi K2, and some express disappointment in the changes to Claude.