we are so close!
AI News for 9/24/2025-9/25/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (194 channels, and 5737 messages) for you. Estimated reading time saved (at 200wpm): 472 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
OpenAIâs Evals team is back for a third time this year with GDPVal, which they are framing as a logical next step in model evals with the breadth of MMLU, but with the depth of agentic benchmarks like SWE-Bench and their own SWE-Lancer. GDPval (full paper here) takes its name from a top down selection of major (>5%) sectors of GDP, filtered for âpredominantly digitalâ knowledge work:
This resulted in 1,320 tasks across 44 occupations, which were then evaluated against models and human experts averaging 14 years of experience in those fields:
The two primary results charts are hugely validating: first that OpenAI doesnât bias towards itseslf, and that Opus is within spitting distance of industry expert output:
and the model trendlines over time have GPTnext matching human performance roughly by mid 2026:
The word AGI isnât mentioned at all in the paper, but the original 2018? OpenAI charter defined AGI as âhighly autonomous systems that outperform humans at most economically valuable workâ. If we were to wake up in Sept 2026 and find that GPT6-high-ultrathink-final-for-realsies was above confidence interval of 50% in GDPVal pairwise comparisons, then we could truly say that we have achieved AGI by 2018 standards.
AI Twitter Recap
OpenAIâs GDPval and the state of realâworld evals
- GDPval (OpenAI): OpenAI introduced GDPval, a new eval measuring model performance on âeconomically valuableâ tasks across 44 occupations, with tool use (search/code/doc) and multi-hour complexity. Early results: Claude 4.1 Opus tops most categories, approaching or beating human industry experts; GPTâ5 âhighâ trails Opus on the same tasks. OpenAI provides a public site and methodology; leadership frames this as a key metric for policymakers and forecasting labor impact. See launch and discussion: @OpenAI, @kevinweil, @gdb, @dejavucoder, @Yuchenj_UW, @LHSummers.
- Artificial Analysis indices:
- Gemini 2.5 Flash/FlashâLite (Preview 09â2025): +3/8 points (reasoning/nonâreasoning) for Flash; +8/+12 for FlashâLite vs previous releases. FlashâLite is ~40% faster (â887 tok/s) and uses 50% fewer output tokens; 1M context, tool use, and hybrid reasoning modes. Pricing: FlashâLite $0.1/$0.4 per 1M in/out; Flash $0.3/$2.5. Benchmarks: @ArtificialAnlys, followâup.
- DeepSeek V3.1 Terminus: +4 points over V3.1 (reasoning mode), large gains in instruction following (+15 IFBench) and long context (+12 AAâLCR). Architecture: 671B total, 37B active; availability via API and thirdâparty hosts (FP4/FP8). @ArtificialAnlys.
- AAâWER (speechâtoâtext): New wordâerrorârate benchmark across AMIâSDM, Earningsâ22, VoxPopuli. Leaders: Google Chirp 2 (11.6% WER), NVIDIA Canary Qwen2.5B (13.2%), Parakeet TDT 0.6B V2 (13.7%). Price/perf tradeoffs noted; Whisper/GPTâ4o Transcribe smooths at cost to literal accuracy. @ArtificialAnlys, pricing.
Agentic coding and productized agents
- Kimi âOK Computerâ (K2âpowered agent mode): An OSâlike agent with its own file system, browser, terminal and longer tool budgets. Demos: singleâprompt websites/mobileâfirst designs, editable slides, and dashboards from up to 1M rows. Also released a Vendor Verifier for toolâcall correctness by provider on OpenRouter. Threads: @Kimi_Moonshot, @crystalsssup, examples 1, 2.
- GitHub Copilot CLI (public preview): Local terminal agent with MCP support that mirrors the cloud Copilot coding agent. Use existing GitHub identity, script embedding, clear perârequest billing. Announcements: @github, @lukehoban.
- Factory AI âDroidsâ + $50M: Modelâagnostic software dev agents (CLI/IDE/Slack/Linear/Browser), #1 on TerminalâBench, pitched as broader knowledgeâwork agents via code abstractions. Launch + funding: @FactoryAI, commentary @swyx, @tbpn.
- Ollama web search API + MCP server: Bridges local/cloud models to live web grounding; compatible with Codex/cline/Goose and other MCP clients. @ollama.
- Reka Research âParallel Thinkingâ: API option that generates multiple candidate chains and resolves via a verifier model; +4.2 on ResearchâEval and +3.5 on SimpleQA with nearâflat latency. @RekaAILabs.
Video reasoning and robotics
- Video models as zeroâshot reasoners (Veo 3): DeepMind shows broad zeroâshot skills across perception â physics â manipulation â reasoning. Introduces âChainâofâFramesâ as visual CoT. Still behind SOTA on depth/physics; cost remains high. Papers/discussion: @arankomatsuzaki, project/paper, @tkipf.
- Gemini Robotics 1.5 (Google): New embodied reasoning stack (GR 1.5 VLA + ER), long context, tool use, spatialâtemporal planning, transfer across embodiments, and safety constraints. API in Google AI Studio; sortingâlaundry reasoning demo. Announcements: @GoogleDeepMind, @sundarpichai, API note, @demishassabis.
Model and method releases
- EmbeddingGemma (Google): A 308M encoder model topping MTEB among subâ500M models (multilingual/English/code). Claims parity with ~2Ă larger baselines; supports 4âbit and 128âdim embeddings. Techniques: encoderâdecoder init, geometric distillation, spreadâout regularizer, model souping. Good for onâdevice/highâthroughput. Threads: @arankomatsuzaki, paper roundup.
- ShinkaEvolve (Sakana AI, open source): A sampleâefficient evolutionary framework that âevolves programsâ using LLM ensembles with adaptive parent sampling & novelty filtering. Results: new SOTA circle packing with 150 samples; improved ALEâBench solutions; discovered a novel MoE loadâbalancing loss improving specialization/perplexity; stronger AIME scaffolds. Code/paper: @SakanaAILabs, @hardmaru, report.
- RLMT & TPT:
- âLanguage Models that Think, Chat Betterâ proposes RL with Modelârewarded Thinking (RLMT) to surpass RLHF on chat benchmarks for 8B models; ablations emphasize prompt mixtures and reward strength. @iScienceLuvr, notes.
- âThinkingâAugmented PreâTraining (TPT)â reports ~3Ă pretrain data efficiency and >10% postâtraining improvements on reasoning for 3B models via synthetic stepâbyâstep trajectories. @iScienceLuvr.
Systems, serving, and infra
- Perplexity Search API: A realâtime web index with stateâofâtheâart latency/quality for grounding LLMs and agents, plus public evals/research. Claims strong performance vs singleâstep and deep research benchmarks, and advantages vs Google SERP for LLM use. Launch: @perplexity_ai, research: article, commentary: @AravSrinivas.
- KV reuse and dynamic parallelism:
- LMCache: Open KVâcache layer that reuses any repeated text segment (not just prefixes) across GPU/CPU/disk; reduces RAG cost 4â10Ă, TTFT, and boosts throughput. Integrated in NVIDIA Dynamo. @TheTuringPost.
- Shift Parallelism (Snowflake): Dynamically switches Tensor/Sequence Parallelism based on loadâup to 1.5Ă lower latency (interactive) and 50% higher throughput (heavy traffic). Code in Arctic Inference. @StasBekman.
- Contextâparallel diffusion: Native support for ring/Ulysses variants to make multiâGPU diffusers âgo brrr.â @RisingSayak.
- attnd (ZML): Sparse logarithmic attention on CPU, over UDP; pitched as âpaving the way for unlimited context.â @steeve.
- Energy and hardware:
- Microsoft (LLM inference energy): Median chatbot query ~0.34 Wh; long reasoning ~4.3 Wh (~13Ă); fleet at 1B q/day ~0.9 GWh (~web search scale). Claims public estimates are 4â20Ă too high; 8â20Ă efficiency gains feasible. @arankomatsuzaki.
- B200 spot pricing: B200 spot instances briefly at ~$0.92/hr. @johannes_hage.
Industry moves and platform updates
- Meta talent coup: Diffusion/consistency models pioneer Yang Song departs OpenAI to join Meta; widely regarded as a major poach. Coverage: @iScienceLuvr, @Yuchenj_UW.
- ChatGPT Pulse: OpenAI rolls out âproactiveâ daily updates (context, connected apps) to Pro usersâan ambient agent form factor moving beyond reactive chat. Threads: @OpenAI, @sama, @fidjissimo.
- Qwen ecosystem: Qwen models added to the LMSYS Arena (@Alibaba_Qwen); Qwen3âVL provisioning via thirdâparty providers for easier trials. @mervenoyann.
Top tweets (by engagement)
- âthereâs this guy⊠if ChatGPT is wrong he puts his phone in the fridgeâ â 55,057
- Sam Altman on ChatGPT Pulse (âfrom reactive to proactiveâ) â 28,573
- Karpathy on âAI isnât replacing radiologistsâ (why benchmarks â deployment reality) â 7,980
- Kimiâs âOK Computerâ agent mode launch â 2,646
- OpenAI announces GDPval â 4,144
- Demis Hassabis on Gemini Robotics 1.5 (âtalk to robotsâ) â 1,545
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. China AI Model Launches: Alibaba Qwen Extreme-Scaling Roadmap & Tencent Hunyuan Image 3.0
- Alibaba just unveiled their Qwen roadmap. The ambition is staggering! (Score: 662, Comments: 146): Alibabaâs Qwen roadmap slide signals an aggressive bet on unified multimodal models and extreme scaling: context length from
1M â 100M
tokens, parameters from ~1T â 10T
, testâtime compute from64k â 1M
tokens, and training data from10T â 100T
tokens, alongside âunlimited-scaleâ synthetic data generation and expanded agent capabilities (complexity, interaction, learning). The plan echoes a âscaling is all you needâ philosophy, implying massive compute, data curation, and inference optimization challenges for memory bandwidth, KVâcache management, longâcontext attention (e.g., hybrid/linear/sparse), and reliability of synthetic data pipelines. Commenters question feasibility/practicality: a100M
context window and>1T
parameter models strain hardware and inference costs, likely pushing deployments to closed, cloud-only settings; others ask what local compute could realistically handle trillionâscale models, implying reliance on quantization, MoE, or offloading schemes.- Several latch onto the â100M contextâ teased in the roadmap (image). Naive quadratic attention makes this intractable at scale: for a ~32-layer, ~4k-hidden decoder, FP16 KV cache is â
0.5 MB/token
, so100M
tokens implies â50 TB
of VRAM (even 4-bit KV would still be â12.5 TB
). Hitting that target would require sparse/linear/streaming attention (e.g., block-sparse, ring/streaming), retrieval/chunking, aggressive KV quantization/offload, and careful bandwidth-optimized kernels; compute optimizations like FlashAttention help constants but not O(n^2) scaling. - Re: ârun >1T locally?ââweight storage alone dominates:
1T
params atint4
â500 GB
(FP16 â2 TB
) before KV cache, which at long contexts adds hundreds of GB to multi-TB. Realistically this needs multi-GPU servers (e.g.,8â16Ă80 GB
with NVLink/NVSwitch) with tensor+pipeline parallelism; per-token compute is âO(P) (~2e12
FLOPs/token), so10â30
tok/s needs roughly20â60
TFLOP/s sustained, but memory bandwidth and collective comms are the primary bottlenecks rather than raw FLOPs.
- Several latch onto the â100M contextâ teased in the roadmap (image). Naive quadratic attention makes this intractable at scale: for a ~32-layer, ~4k-hidden decoder, FP16 KV cache is â
- Tencent is teasing the worldâs most powerful open-source text-to-image model, Hunyuan Image 3.0 Drops Sept 28 (Score: 173, Comments: 26): Tencent teased Hunyuan Image 3.0, an openâsource textâtoâimage model slated for release on Sept 28, claiming it will be the âmost powerfulâ open-source option. The teaser implies a
96 GB VRAM
requirement (at least for some inference modes), but provides no public benchmarks, architecture details, training data, or throughput/latency metrics yet; thus the performance claim is unverified pending release. Image: https://i.redd.it/t8w84ihz1crf1.jpeg Commenters are skeptical of heavy preârelease hype, noting that strong models often arrive with minimal marketing (e.g., Qwen) and citing past overhyped releases (e.g., SD3 vs FLUX). Others point out the âmost powerfulâ label is premature without applesâtoâapples openâsource comparisons; one commenter confirms theVRAM 96
detail from the teaser.- Rumored
~96 GB VRAM
requirement for inference suggests a very large diffusion/DiT backbone or highâres latent configuration, which exceeds single consumer GPUs (24â48 GB). Expect heavy reliance on memory optimizations (attention slicing, tiled VAE), CPU/NVLink offload, model sharding or multiâGPU tensor parallelism; quantization for diffusion UâNets is less mature and can hurt quality. Memory footprint versus resolution/steps tradeâoffs will be critical for practical local use. - Several note a pattern where heavily teased releases underdeliver versus âshadowâdroppedâ ones (e.g., Qwen), citing SD3 vs FLUX as precedent. They want hard numbers before believing âmost powerfulâ: sideâbyâside prompts vs Qwen Image/FLUX/SDXL with FID/CLIPScore/HPSv2, plus tests for text rendering, smallâobject counting, multiâsubject composition, and prompt faithfulness. Without a data card and reproducible evals, the claim reads as marketing.
- Immediate ask for ComfyUI support; feasibility hinges on whether Hunyuan Image 3.0 sticks to an SDXLâstyle pipeline or introduces custom schedulers/blocks. If itâs DiTâlike (as in prior Hunyuan releases), a loader node with FlashAttention 2/xFormers should suffice; otherwise custom CUDA kernels and sampler nodes may be needed. Community will look for FP16 checkpoints, ONNX/TensorRT exports, and sampler compatibility (DDIM/DPM++/DPMSolver) to gauge ease of adoption.
- Rumored
2. Local AI Alternatives: Fenghua No.3 CUDA/DirectX GPU + Post-Abliteration Uncensored LLM Finetunes
- China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Score: 454, Comments: 124): Post claims Chinaâs Fenghua No.3 GPU natively supports modern graphics/compute APIs:
DirectX 12
,Vulkan 1.2
,OpenGL 4.6
, and even NVIDIAâs CUDA, suggesting a potential alternative to NVIDIAâs ecosystem. The image appears to be a product/spec slide, but no driver maturity details, CUDA compatibility layer notes, or benchmarks are provided, so real-world parity and performance remain unverified. Contextually, CUDA âsupportâ could mean a reimplementation/translation layer (akin to AMDâs HIP: https://github.com/ROCm/HIP or projects like ZLUDA: https://github.com/vosen/ZLUDA), which can be legally and technically fraught unless fully clean-room and robustly tested. Top comments highlight that AMD already offers CUDA-compatibility via HIP and that Chinese vendors may ignore legal/IP constraints to advertise CUDA outright; others remain skeptical (âIâll believe it when I see itâ) and anticipate geopolitical pushback. Overall sentiment questions readiness, driver quality, and legality more than the headline API list.- Several point out AMD already provides a CUDA-like path: HIP/ROCm enables source-level portability by mapping CUDA APIs to HIP (avoiding NVIDIA trademarks/legal issues), while projects like ZLUDA attempt binary-level CUDA driver/runtime translation to run unmodified CUDA apps on nonâNVIDIA GPUs. Practically, this means many CUDA kernels can be auto-translated/recompiled for AMD with minimal code changes via HIP, whereas ZLUDA targets dropâin execution of existing CUDA binariesâcoverage and performance remain dependent on driver maturity and parity with newer CUDA features.
- IMPORTANT: Why Abliterated Models SUCK. Here is a better way to uncensor LLMs. (Score: 273, Comments: 80): OP reports that weight-space âabliterationâ (uncensoring) of LLMsâespecially MoE like Qwen3-30B-A3Bâconsistently degrades reasoning, agentic/tool-use behavior, and increases hallucinations, often causing
30B
abliterated models to underperform nonâabliterated4â8B
models. In their tests, abliterated+finetuned models largely ârecoverâ capabilities: mradermacher/Qwen3-30B-A3B-abliterated-erotic-i1-GGUF (testedi1-Q4_K_S
) approaches base Qwen3-30B-A3B performance with lower hallucination vs other abliterated Qwen3 variants and better tool-calling via MCP; mlabonne/NeuralDaredevil-8B-abliterated (DPO FT from Llama3â8B) reportedly outperforms its base while remaining uncensored. Direct comparisons against abliterated-only buildsâHuihui-Qwen3-30B-A3B-Thinking-2507-abliterated-GGUF, Huihui-Qwen3-30B-A3B-abliterated-Fusion-9010-i1-GGUF, Huihui-Qwen3-30B-A3B-Instruct-2507-abliterated-GGUFâfound unrealistic responses to illicit-task prompts, frequent wrong/repetitive tool calls, and higher hallucination than the finetuned abliterated model (though still slightly worse than the original). Comments call for a standardized benchmark to quantify âabliterationâ degradation beyond NSFW tasks and frame the observed recovery as âmodel healingâ: post-edit finetuning lets the network relearn connections broken by unconstrained weight edits. A skeptical view argues that if finetuning is required anyway, abliteration adds risk without benefitâclaiming theyâve never seen abliteration+finetune beat a straight finetune.- Several commenters note that arbitrary weight edits (âabliterationâ) introduce uncontrolled distribution shift and capability loss; this is essentially known as model healing: if you perturb weights without a training signal, you should expect degraded reasoning/knowledge, and only further fine-tuning with a proper loss can partially restore the broken circuits. Practitioners report that an abliterated-then-fine-tuned model rarely outperforms a plain fine-tune on the same base, implying the edit adds optimization debt without measurable gains in benchmarks.
- Thereâs a call for evaluation beyond porn-centric tests; the Uncensored General Intelligence (UGI) Benchmark/leaderboard aims to quantify broad capabilities of uncensored models (reasoning, coding, knowledge, etc.) while minimizing refusal artifacts: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard. Using UGI (or similar multi-domain suites) would better capture whether uncensoring preserves general performance versus causing regressions.
- As alternatives to abliteration, users recommend uncensored fine-tunes known to retain utility, e.g., Qwen3-8B
192k
Josiefied GGUF builds (https://huggingface.co/DavidAU/Qwen3-8B-192k-Josiefied-Uncensored-NEO-Max-GGUF), Dolphin-Mistral-24B variants (https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-i1-GGUF), and models from TheDrummer (https://huggingface.co/TheDrummer). These are cited as better baselines for uncensoring that can be benchmarked head-to-head on UGI to validate capability retention.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Gemini Robotics 1.5 and Veo 3 ZeroâShot Video Reasoning
- Gemini Robotics 1.5 (Score: 276, Comments: 39): Google DeepMind announces âGemini Robotics 1.5,â a Gemini-1.5âbased multimodal VLA that maps natural language + vision to robot control for longâhorizon, multiâstep manipulation across diverse embodiments, with demos like laundry sorting, desk organization, and full scene reset/rollback (page). Building on prior VLA lines (e.g., RTâ2/RTâX), it emphasizes openâvocabulary object/tool grounding, hierarchical task decomposition via the modelâs long context, and generalization without perâtask fineâtuning, enabling âreturn to initial stateâ behaviors and multiâobject organization. Technically oriented commenters highlight the significance of robust scene restoration as a practical household primitive (canonical âresetâ to a predefined state), and speculate on direct transfer to agriculture (e.g., fruit picking) as a scalable, highâimpact application domain.
- Applying this to fruit picking is a non-trivial jump from laundry: outdoor, unstructured scenes introduce variable lighting, occlusions, and deformable/fragile-object handling that demand closed-loop vision, tactile/force feedback, compliant/soft grippers, and robust visual servoing. Generalist VLA policies (e.g., RTâ2âs openâvocabulary affordance grounding) could help map language goals like âpick the ripe appleâ to action primitives, but success will hinge on onâboard latency, multi-view perception, and slipâaware grasp release [https://deepmind.google/discover/blog/rt-2/].
- The ârestore the scene to a canonical stateâ use case is essentially goalâconditioned manipulation with persistent memory: maintain an objectâcentric scene graph, compute deltas to a reference snapshot, then plan multiâstep rearrangements. Methods like Transporter Nets for keypointâbased pickâandâplace and visual goalâconditioned policies can execute âtidy to match this imageâ behaviors, but need robust relocalization, clutter segmentation, and failure recovery to avoid compounding errors over long horizons [https://transporternets.github.io/].
- âAll robots share the same mindâ maps to fleet learning: centralized policy/parameter sharing across heterogeneous embodiments with periodic cloud updates, as seen in multi-robot datasets/policies like RTâX [https://robotics-transformer-x.github.io/]. Practical deployments add embodiment adapters and may favor federated learning for privacy/safety; core challenges are distribution shift across morphologies/sensors, catastrophic forgetting in continual learning, and sim2real drift, mitigated via domain randomization and strong regularization.
- Video models are zero-shot learners and reasoners (Score: 238, Comments: 30): The post highlights a project and paper claiming that the generative video model Veo 3 exhibits broad zero-shot capabilitiesâwithout task-specific training or language mediationâacross segmentation, edge detection, image editing, physical property inference, affordance recognition, tool-use simulation, and early visual reasoning tasks (e.g., maze and symmetry solving). Drawing a parallel to LLM emergence, the authors argue that scaling large, web-trained generative video models could yield general-purpose vision understanding, positioning video models as potential unified vision foundation models; see the project page and demos at https://video-zero-shot.github.io/ and the paper at https://arxiv.org/pdf/2509.20328. Notably, the materials appear primarily qualitative: no disclosed parameter counts, compute, training corpus specifics, standardized benchmarks, or ablations are evident, limiting rigorous comparison and reproducibility. Commenters speculate that coherent long-horizon video generation implies a strong learned world model and that further scaling could improve capabilities, while also noting the significant compute cost of video models and proposing integration with LLMs into a single multimodal model; several request basic model details (e.g., Veo 3 size).
- Several commenters infer that high-quality video generation (e.g., Googleâs claimed Veo 3) implies a learned âworld modelâ that enforces temporal coherence and basic physics, which can surface as zero-shot reasoning. This aligns with prior world-model work like DeepMindâs Genie (interactive environment model) that learns dynamics from video (blog). The core idea: to produce consistent frames, models must internalize object permanence, motion continuity, and causalityâcapabilities that also benefit downstream reasoning without task-specific finetuning.
- Thereâs a practical scaling constraint: video modeling explodes token/computation compared to text. A
10s
video at24 fps
and720p
patchified at16x16
yields roughly(1280/16)*(720/16)=3600
tokens per frame â~864k
tokens per clip; even with latent compression (8â16Ă) and diffusion/flow-matching in a VAE latent, training/inference FLOPs dwarf LLMs. This motivates hybrid systems (LLM for planning/reasoning + specialized video generator) or unified backbones with shared token spaces to amortize compute across modalities. - On multimodality, participants note gaps: video-in exists in LMMs (e.g., Gemini 1.5 can process long videos via large context windows, reportedly up to âhoursâ with frame sampling; see Gemini 1.5), and GPT-4o supports real-time video input (OpenAI). But truly unified video-in + video-out + reasoning in one released model remains uncommon; current practice chains a reasoning LLM with a T2V model (e.g., Veo, Sora) or explores research Video-LLMs like LLaVA-Video (arXiv) and Video-LLaMA (arXiv) that focus on video understanding rather than generation. This is the integration frontier commenters expect next.
2. LLM Reasoning Reliability: Apple vs Anthropic and GPTâ5 Regression Reports
- Apple called out every major AI company for fake reasoning and Anthropicâs response proves their point (Score: 377, Comments: 198): Apple MLâs âThe Illusion of Thinkingâ (https://machinelearning.apple.com/research/illusion-of-thinking) evaluates LLM âreasoningâ by applying semantically preserving but surface-level perturbations to math/logic word problems and reports sharp accuracy drops, arguing models lack invariances expected of algorithmic reasoning and instead exploit spurious patterns. Anthropicâs reply, âThe Illusion of the Illusion of Thinkingâ (https://arxiv.org/html/2506.09250v1), contends Appleâs setup induces distribution shift/annotation artifacts and that under controlled prompts and âfairerâ conditions Claudeâs performance is stableâframing the brittleness as an evaluation issue rather than a model incapacity. The debate centers on robustness to contentâpreserving rewordings, metric overfitting, and whether current LLMs demonstrate reasoning-like generalization versus sophisticated pattern matching. Top commenters largely endorse Appleâs critique that LLMs donât âreason,â share the two papers, and describe the practical stack: tokenization to numeric IDs, assistant/policy layers that filter/steer IO (e.g., safety/RLHF), and decoding choices that can induce degenerate outputs (e.g., repetitive tokens when sampling is misconfigured)âimplying observed failures can reflect pipeline/decoding brittleness as much as model limits.
- Several commenters unpack the production stack around LLMs: the user-facing model tokenizes text into subword tokens and predicts the next token, while âouterâ layers (system prompts, safety/guardrail classifiers, pre-/post-processing rewriters, and routing/orchestration) constrain and shape outputs. This wrapper design explains behaviors like unreliable verbatim recall of training data (knowledge stored parametrically vs. indexed) and why base-model behavior can differ from the product experience (e.g., RLHF and filtering altering likelihoods).
- Technical failure modes were highlighted, e.g., early repetition loops like âthe the theâŠâ arising from decoding pathologies when high-probability tokens dominate. Mis-tuned decoding (
temperature
,top-k
/top-p
) and lack of penalties can cause low-entropy degeneracy; mitigations includerepetition/frequency/presence
penalties, nucleus sampling, and entropy-boosting heuristicsâissues widely observed in early GPT-2/3-era systems before guardrails stabilized outputs. - On the âreasoningâ debate, commenters argue for operational definitions and capability-focused evaluation rather than labels, noting that small perturbations of logically equivalent prompts often break solutionsâevidence of pattern matching over robust inference. Links to primary sources were shared for deeper analysis: Apple MLâs âIllusion of Thinkingâ research note (https://machinelearning.apple.com/research/illusion-of-thinking) and an arXiv preprint (https://arxiv.org/html/2506.09250v1), encouraging benchmarked, perturbation-robust assessments over marketing claims.
- ChatGPT is in such a bad state my most novice students have noticed it going off rails (Score: 211, Comments: 90): An AI-integration instructor reports a sharp post-update regression in OpenAIâs assistant (referred to as âGPT5â): a long-standing master prompt that previously produced
~2000
word, exam-focused summaries with GPTâ4o now yields generic prose with âwild inaccuracies,â requires up to5
back-and-forth clarifications, and frequently drifts off-instruction. In side-by-side use, Googleâs Gemini and NotebookLM, plus Anthropic Claude, still deliver consistent results; the user also claims a local Gemma-family model with~1B
parameters (e.g., Gemma) outperforms the hosted model for their healthcare-education summarization workflow. Based on this observed reliability drop for converting multi-hour lectures/readings into concise notes, the instructor advised canceling the paid plan pending improvement. Top comments echo a noticeable capability decline and reduced trust for research-assistant use cases, claiming a broader cross-model dip. Others express strong skepticism that a~1B
parameter Gemma could substantively outperform OpenAIâs latest model, implying potential evaluation or prompting confounds.- Multiple users report noticeable capability regression in recent ChatGPT releases, especially for research/analysis workflows: perceived rise in hallucinations, âlazyâ/short outputs, and failures on formerly trivial tasks, leading some to abandon it for critical work. This aligns with concerns about model routing or safety/latency tuning affecting behavior, though no hard benchmarks were cited by commenters.
- A claim that a âGemma 1Bâ outperforms GPT drew skepticism; publicly released Gemma variants are typically 2B/7B (Gemma 1/1.1) and 2B/9B (Gemma 2) docs. At ~1â2B scale, models generally lag GPTâ4âclass systems on standard benchmarks (e.g., MMLU, GSM8K), so a 1B model exceeding GPT on broad tasks would be atypical outside narrow domains or with heavy tool/RAG support.
- One practical workaround mentioned: enable âlegacy modelsâ in ChatGPT settings to access GPTâ4o if the default routing feels degraded. This suggests model selection/routing changes may be impacting quality; testing sideâbyâside (same prompts across 4o vs current default) can help isolate regressions OpenAI model list.
- I am losing my f*cking mind with the image generation filters. (Score: 503, Comments: 56): User reports inconsistent safety-filter behavior in GPT image generation: an arachnid-like monster image was initially allowed (example preview), but subsequent requests for a less-realistic, bestiary/DnD-style rendering were blocked, as were prompts involving
werewolf
,blood
, andglowing red eyes
. The pattern suggests keyword- and style-sensitive moderation with possible non-determinism (same concept sometimes passes, sometimes fails), leading to false positives on fantasy/horror content rather than explicit gore or realism thresholds. Commenters suggest a workaround: use ChatGPT to craft a highly detailed prompt, then generate the image with an alternative model (e.g., Grok) that has looser filters. Others note frequent false positives (e.g., benign prompts flagged for ânudityâ), arguing current safety heuristics are brittle and overbroad.- Content moderation appears overly sensitive: a prompt for a realistic trout drying itself with a beach towel was flagged for nudity, indicating false positives where benign anthropomorphic scenarios are conflated with explicit content. This points to coarse-grained safety classifiers or keyword heuristics that degrade usability by blocking non-explicit requests.
- A user reports stable local generation with Stable Diffusion via the Stability Matrix UI on a single RTX-3090, describing text-to-image inference as fast and reliable, albeit a step behind state-of-the-art image models. Running locally provides control and eliminates hosted platform filters, with performance adequate on commodity high-VRAM GPUs.
- Workflow suggestions included using ChatGPT to craft highly detailed prompts, then feeding them to alternative generators like Grok; others noted rephrasing via Gemini sometimes reduced moderation friction. Separating prompt engineering from inference can improve output quality and reduce false-positive triggers from stricter front-end filters.
- How ChatGPT helped me quit weed and understand the roots of my addiction (Score: 428, Comments: 120): OP reports quitting daily cannabis use after
17 years
by leveraging ChatGPT as an onâdemand support tool. They used it to (1) explain withdrawal symptoms in real time (e.g., chest pressure, insomnia, vivid dreams), (2) normalize stageâspecific experiences, (3) reframe cravings as âold programmingâ vs identity, and (4) facilitate structured reflection on root causes (strict upbringing, insecurity, loneliness, creative blockage). Outcome:9 weeks
abstinent, markedly reduced cravings, improved sleep, and increased presentâstate awareness; OP characterizes ChatGPT as a 24/7 therapist/coach/mirror substitute. Top comments are largely supportive (one echoing a30+
âyear struggle), with one contrarian remark implying AI enabled continuous use without consequencesâhighlighting debate over AI as recovery aid vs potential enabler. - ChatGPT has been helping me fight my divorce for the last year (Score: 333, Comments: 97): A pro se litigant in a contested Texas divorce/child-support case (two children) reports using ChatGPT to draft and format filingsâdeclarations, hardship statements, and evidence listsâby supplying factâconstrained instructions and performing multiâpass manual verification. After a 3âmonth temporaryâorders phase and counsel predicting an unfavorable deviation outcome, he dismissed counsel and continued selfârepresented, seeking a deviation from Texas guideline child support (â
$1,100
/mo; see Texas guidelines Family Code §154.125 and OAG calculator) while on a fixed 100% VA disability as the former stayâatâhome parent, asserting the other party is employed with free housing. He credits ChatGPT with improved structure, issueâspotting, and reduced emotional content in written records, using filings to compensate for limited inâcourt advocacy amid opposing counselâs threats of sanctions and delays. Commenters warn about LLM hallucinations in legal research, citing the sanctions in Mata v. Avianca for fabricated case law generated by ChatGPT (order), urging strict verification of citations and precedents. Others argue LLMs can outperform lawyers in drafting clarity if kept factual, noting courts may respond favorably to precise, wellâsupported filings from pro se parties.- Multiple commenters flag legal hallucination risk: one references the widely publicized Avianca incident where an attorney submitted ChatGPT-fabricated case citations and was sanctioned; they urge rigorous verification of all citations/precedents against primary sources before filing or arguing in court (order PDF, news). Emphasis: do not rely on model-generated case law without cross-checking; âself represented is a huge red flag,â so expect heightened scrutiny of authorities.
- A cost/control workflow is proposed: use ChatGPT for drafting/research âgrunt work,â then have a licensed attorney review, refine, and handle hearings to cut billable hours while maintaining courtroom competence. One commenter reports success with prepaid legal plans and hybrid billing (splitting plan-covered hours and out-of-pocket work) and suggests using ChatGPT to compare plans/wait times to optimize coverage and responsiveness.
- Thereâs debate on capability vs. reliability: one asserts âlaw is written⊠ChatGPT has the dataâ and can outperform lawyers in aspects of drafting, arguing that sharper filings can improve court reception. Counterpoints stress that even with strong AI-assisted filings, outcomes can still be unfavorable and model outputs must be grounded in verified facts and real precedents to avoid credibility damage.
3. AI Industry Shifts: Anthropicâs NewâGrad Hiring Stance and Chinaâs Fenghua No.3 GPU
- Anthropic CPO Admits They Rarely Hire Fresh Grads as AI Takes Over Entry-Level Tasks (Score: 207, Comments: 86): Anthropic CPO Mike Krieger says the company has largely stopped hiring fresh grads, leaning on experienced hires as Claude/Claude Code increasingly substitute for entryâlevel dev workâevolving from singleâtask assistants to collaborators that can delegate and execute 20â30âminute tasks and larger chunks, even âusing Claude to develop Claudeâ (source). He predicts most coding tasks will be automated within ~1 year and other disciplines within 2â3 years, framing this amid industry cuts and a
6.1%
CS graduate unemployment rate in 2025. Commenters question causality, noting firms like Netflix historically avoided newâgrad hiring preâAI and suggesting this may reflect a highâimpact hiring philosophy rather than AI per se; others warn new grads to expect longer apprenticeships. Some argue Kriegerâs remarks read as marketing/PR and may not reflect dayâtoâday realities inside Anthropic.- Multiple engineering leaders claim juniors are now materially more productive due to native use of LLM coding tools (e.g., ChatGPT, Claude Code), citing â2â3xâ output on routine implementation, scaffolding, test generation, and debugging. They report juniors can tackle larger, less tightly-scoped tasks than before because LLMs reduce back-and-forth and accelerate boilerplate and integration work.
- Others argue the âno new gradsâ stance predates AI (e.g., Netflix historically) and is driven by organizational economics: desire for immediate high-impact contributors, reduced mentorship/ONCALL burden, and lower production risk. AI assistance doesnât eliminate the need for domain context, codebase familiarity, and reliability engineering practices, so teams optimized for senior-only throughput may see limited gains from juniors even with LLMs.
- A strategic hiring angle emerges: avoiding fresh grads may handicap AI capability because many senior candidates lag in LLM adoption, whereas new grads are âAI-nativeâ and bring current AI/ML toolchains and workflows. Companies report improved ROI by seeding teams with juniors who propagate modern prompting, automation, and evaluation practices, bridging an internal skills gap in practical LLM usage.
- China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Score: 559, Comments: 199): The image appears to be a product/marketing slide for the Chinese âFenghua No.3â GPU (likely from Innosilicon), claiming graphics API support for DirectX 12, Vulkan 1.2, and OpenGL 4.6. There are no benchmarks, feature-level details (e.g., DX12 12_1/12_2), driver maturity notes, or compute stack specifics; the titleâs claim of âCUDAâ support is likely inaccurate since NVIDIAâs CUDA is proprietaryâthird-party GPUs would require translation/compatibility layers rather than native CUDA. As presented, the post signals driver/API coverage claims but provides no evidence on performance, software ecosystem, WHQL certification, or compatibility with existing CUDA workloads. Top comments highlight demand for competition to NVIDIA and note the capital/complexity of scaling GPU manufacturing; optimism centers on potential consumer benefits if viable alternatives emerge.
- The headline claim that Fenghua No.3 supports DirectX 12, Vulkan 1.2, and OpenGL 4.6 is only a baseline; real viability hinges on driver maturity, shader compiler quality, and specific feature coverage like DX12 hardware feature levels (e.g.,
12_1
/12_2
) and SM 6.x support (Microsoft docs). Absent public conformance data (e.g., Vulkan 1.2 CTS on the Khronos conformant products list) or game/compute benchmarks, performance and compatibility are unknown, especially for modern workloads requiring DXR, mesh shaders, and advanced scheduling. - âCUDA supportâ from a nonâNVIDIA GPU typically implies a translation layer (e.g., ZLUDA) or a CUDAâlike SDK (e.g., Moore Threads MUSA), which rarely achieves full API/ABI parity or performance with NVIDIAâs toolchain. For AI/ML, endâtoâend ecosystem support (cuDNN/cuBLAS equivalents, PyTorch/TensorFlow backends, kernel autotuning) and driver stability tend to dominate over API checkboxes, so meaningful competition would require solid framework integrations and reproducible benchmarks.
- The headline claim that Fenghua No.3 supports DirectX 12, Vulkan 1.2, and OpenGL 4.6 is only a baseline; real viability hinges on driver maturity, shader compiler quality, and specific feature coverage like DX12 hardware feature levels (e.g.,
- Regulating AI hastens the Antichrist, says Peter Thiel (Score: 298, Comments: 135): At a soldâout San Francisco lecture, Peter Thiel (coâfounder of Palantir and PayPal) claimed efforts to regulate AI risk âhastening the coming of the Antichrist,â framing regulation as a promise of âpeace and safetyâ that would strangle innovation; the report by The Times (James Hurley,
2025â09â25
) documents the rhetoric but cites no technical evidence, governance models, or concrete regulatory proposals (The Times). The OP challenges the unstated premise that technological progress is inherently netâpositive/safe, noting one could equally cast AIâor Thielâs rhetoricâas the âAntichrist,â highlighting the lack of falsifiable claims or riskâbenefit analysis. Top comments are nonâtechnical dismissals/jokes and do not add substantive debate. - âYou strap on the headset and see an adversarial generated girlfriend designed by ML to maximize engagement. She starts off as a generically beautiful young women; over the course of weeks she molds her appearance to your preferences such that competing products wonât do.â (Score: 203, Comments: 73): Conceptual (meme-style) depiction of a VR âAI girlfriendâ that performs continual personalizationâeffectively gradient ascent on a userâs latent attraction manifoldâto maximize engagement/retention. It maps to recommender/bandit and RL-style optimization (akin to RLHF but over an individualâs reward signal), illustrating reward hacking/adversarial examples where the system converges to grotesque local optima (âgrotesque undulating arrayâ) that exploit human reward circuitry and create lockâin against competitors. Top comments frame it as a credible, lateâstage capitalism trajectory: systems that âget their hooksâ into evolved reward channels, making escape difficult; initial skepticism turns to acceptance once the adversarial/grotesque optimization endpoint is mentioned.
- The scenario maps to an online personalization loop where a generative avatar (e.g., StyleGAN [https://arxiv.org/abs/1812.04948] or latent-diffusion per Stable Diffusion [https://arxiv.org/abs/2112.10752]) is tuned via multi-armed bandits or RL to maximize a proxy reward (engagement, session length). Over
weeks
, contextual bandits/Thompson sampling [https://en.wikipedia.org/wiki/Thompson_sampling] could adapt the avatarâs latent vectors and prosody/affect to click/biometric feedback, converging on a personalized superstimulus. Without regularization/constraints (e.g., KL penalties as in RLHF PPO [https://arxiv.org/abs/2203.02155] or human preference priors), such optimization tends to exploit proxy metrics, producing pathological attractors that outcompete âcompeting products.â - The âgrotesque undulating arrayâ is analogous to adversarial/feature-visualization failure modes where optimization against a fixed classifier/perceptual model yields extreme, high-frequency artifacts that maximally activate features. Similar phenomena occur in âfooling imagesâ [https://arxiv.org/abs/1412.1897] and DeepDreamstyle gradient ascent [https://research.google/blog/inceptionism-going-deeper-into-neural-networks/], producing bizarre yet high-confidence outputs; in humans, this corresponds to engineered âsupernormal stimuliâ [https://en.wikipedia.org/wiki/Supernormal_stimulus] that hijack evolved preferences.
- The ârun a photo through AI
100
timesâ analogy points to recursive generation/feedback loops that amplify features and cause distributional drift or collapse. Empirically, repeated self-conditioning leads to artifact accumulation (e.g., iterative image-to-image pipelines), and training on model outputs induces model collapseâprogressive forgetting of the true data distributionâper Shumailov et al., 2023 [https://arxiv.org/abs/2305.17493]. These effects imply long-horizon personalization systems need fresh human-grounded feedback and anti-feedback-loop guards (data deduplication, diversity constraints, entropy/novelty bonuses).
- The scenario maps to an online personalization loop where a generative avatar (e.g., StyleGAN [https://arxiv.org/abs/1812.04948] or latent-diffusion per Stable Diffusion [https://arxiv.org/abs/2112.10752]) is tuned via multi-armed bandits or RL to maximize a proxy reward (engagement, session length). Over
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Agent Tooling: Chrome DevTools MCP and Perplexity Search API
- Chrome DevTools MCP Lets Agents Drive Chrome: Google announced the public preview of Chrome DevTools MCPâan MCP server that exposes CDP/Puppeteer controls so AI coding agents can inspect and manipulate a live Chrome sessionâvia Chromium Developers, opening programmatic access for navigation, DOM/console/network debugging, and screenshotting to automate testing and scraping workflows.
- Developers framed this as a missing piece for agentic browsers, noting it standardizes control surfaces across tools using Model Context Protocol (MCP) and could streamline end-to-end evals and CI for web tasks.
- Perplexity Plugs Devs Into Live Web: Perplexity launched a Search API providing raw results, page text, domain/recency filters, and provenanceâakin to Sonarâannounced in the blog post with a new SDK to integrate quickly.
- Early feedback praised the playground and filters but flagged a Python SDK streaming bug returning unparseable JSON per the API docs, with one user noting âthereâs no solution for this yet.â
- MCP Debates Multi-Part Resource Semantics: MCP contributors discussed the undocumented purpose of
ReadResourceResult.contents[]
, proposing it bundle multi-part Web resources like HTML + images and asking whetherresources/read(.../index.html)
should implicitly includestyle.css
andlogo.png
per issue #1533.- Participants argued an array improves agent retrieval fidelity by shipping all render-critical assets together, reducing extra fetches and negotiation overhead for browser-control agents.
2. Code World Models & Agent Execution Infra
- Metaâs CWM Marries Code and World Models: Meta unveiled CWM, an open-weights LLM for research on code generation with world models, in CWM: An Open-Weights LLM for Research on Code Generation with World Models, emphasizing training on program traces to improve tool-use and code execution understanding.
- Builders compared notes on similar ideas (e.g., interpreter traces), calling CWM a plausible path to more sample-efficient coding agents while they await concrete benchmarks and sizes.
- Modal Muscles Remote Code Agent Rollouts: Members credited Modal with powering remote execution for large agent rollouts in the wake of FAIRâs CWM buzz, sharing a post-run screenshot attachment and praising cold/warm/hot start tradeoffs while noting missing MI300 support.
- Operators highlighted that elastic executors and controlled start distributions lower tail latencies for eval sweeps, making Modal attractive for orchestrating code-agent experiments at scale.
- Windsurf Bets Big on Tab Completion: Windsurf prioritized advanced tab completion via context engineering and custom model training, with Andrei Karpathy commenting in this tweet.
- Users expect deeper repo-aware completions and latency wins, framing tab-complete quality as the top lever for perceived coding productivity in IDE agents.
3. GPU Systems & Diffusion Scale-Ups
- Hugging Face Ships Context-Parallel Diffusion: Hugging Face announced native context-parallelism for multi-GPU diffusion inference, supporting distributed attention flavors Ring and Ulysses, per Sayak Paul.
- Practitioners see CP as a key unlock for high-resolution, long-context diffusion serving, reducing single-GPU memory bottlenecks without rewriting model code.
- PTX Consistency Papers Keep GPU Devs Honest: Members circulated formal work including A Formal Analysis of the NVIDIA PTX Memory Consistency Model and Compound Memory Models (PLDIâ23) that proves mappings for CUDA/Triton to PTX despite data races and details heterogeneous-device consistency.
- While some found it âtoo heavy on formal mathâ, others noted tools like Dat3M uncovered real spec bugs, arguing these formalisms guide fence placement and compiler correctness.
- Cutlass Blackwell Teaches TMEM Tricks: NVIDIA Cutlass examples show SMEMâTMEM tiled copies via
tcgen05.make_s2t_copy
/make_tmem_copy
with helpers to pick performant opsâsee the dense blockscaled GEMM example and helpersâandTmemAllocator
reduces boilerplate vs rawcute.arch.alloc_tmem
.- Kernel authors trading notes reported fewer foot-guns moving tiles between TMEM and SMEM, a must for high-throughput Blackwell blockscaled GEMM paths.
4. Evaluations and Proactive Assistants
- OpenAI Drops GDPval for Real-World Tasks: OpenAI introduced GDPval, an evaluation targeting economically valuable, real-world tasks, outlined in GDPval.
- Engineers welcomed a shift toward grounded evals, hoping for transparent task specs and reproducible harnesses to compare across models and tool-use stacks.
- ChatGPT Pulse Goes Proactive: OpenAI launched ChatGPT Pulse, a proactive daily update experience from chats, feedback, and connected appsârolling out to Pro on mobileâper the announcement.
- Some in the community quipped âoai cloned huxeâ and debated privacy knobs and notification hygiene for dev and enterprise settings.
- Microsoft 365 Copilot Adds Claude: Anthropic announced Claude availability in Microsoft 365 Copilot, per Claude in Microsoft 365 Copilot.
- Builders read this as a competitive realignment in enterprise AI assistants, with one user joking âmicrosoft are on the rebound after a messy breakup.â
5. Training Tricks: Losses, Merges, and Data
- Tversky Loss Gets a âVibe Checkâ: Members highlighted the paper Tversky Loss Function and shared implementations, including a CIFAR-10 vibe check network and a Torch port repo, with one run reporting ~61.68% accuracy and ~50,403 trainable params on a 256â10 head (paper PDF).
- Suggestions included verifying with a single XOR task and probing speed vs MLP baselines; one run hit ~95% XOR accuracy at 32 features with notes on initialization asymmetry.
- Super-Bias Mashes LoRAs Like a DJ: Researchers floated Super-Bias, a mask-aware nonlinear combiner that trains a small MLP on expert outputs + binary masks to ensemble LoRAs, claiming parity with full fine-tunes at a fraction of cost and enabling hot-swaps of experts.
- Teams discussed treating different domain LoRAs as experts and fusing them post-hoc, avoiding destructive hard merges and preserving a clean base-model.
- MXFP4 Saves the 120B: For saving large QLoRA checkpoints, users recommended
save_pretrained_merge(save_method="mxfp4")
for GPT-like models to avoid 16âŻGB shard bloat frommerged_16bit
, producing native MXFP4 artifacts better aligned with GPT-like architectures.- Engineers reported timeouts on 120B merges to remote stores; the MXFP4 path and local saves reduced failures and storage churn during consolidation.
gpt-5-mini
1. Model launches & leaderboard shakeups
- GPTâ5 Codex lands on LMArena WebDev â coders audition a new star: GPTâ5 Codex was added to LMArenaâs WebDev environment (software engineering/coding sandbox) and users started comparing it directly to Claude 4.1 Opus for code generation and Godot scripting.
- Threads focused on realâworld coding tests and anecdotal runs, with some users claiming GPTâ5 Codex beats Opus on certain tasks while others still prefer Claude for instruction fidelity; the consensus is to benchmark on your repo and language (Godot users in particular tested scene/script generation).
- Qwen3 drops and leaderboard reshuffle â Qwen3âmax + Qwen3 coder join the fight: LMArena announced new Qwen3 flavors including qwen3-max-2025-09-23 and a Qwen3âcoder variant targeted at dev/web flows, pushing fresh entries onto the platformâs model roster.
- Community discussion parsed model names/dates and encouraged headâtoâhead runs (reasoning, code, VL tasks) to see where Qwen3 variants actually outpace existing models; some reports praised Qwen3âs reasoning while others still flagged hallucination issues.
- Seedream vs Nano Banana â image leaderboard tugâofâwar: Seedreamâ4â2k now sits tied atop the LMArena TextâtoâImage leaderboard with geminiâ2.5âflashâimageâpreview (nanoâbanana), sparking fresh comparisons on fidelity and prompt sensitivity.
- Users emphasized that the right prompt+tool+purpose matters more than raw rank â some swear by Nano Banana, others find Seedream superior for specific styles, and the thread included GPU requests ( >16GB VRAM, mentions of a 96GB Huawei GPU ) for running top image models locally.
2. Imageâgeneration arms race & inference tooling
- Qwen image editor courts creators â open source rival to Nano Banana?: Members reported that Qwenâs new image editorâdescribed as open sourceâproduces higherâquality edits than Googleâs Nano Banana in many tests and is attracting interest for local runs.
- Practical conversations pivoted to hardware (users asked for GPUs with 16+GB VRAM and mentioned a 96GB Huawei card) and to workflow choices: some recommend cloud inference for quick experiments, others pushed local setups to avoid provider bias.
- Gemini 2.5 Flash keeps pressure on image top spots: Gemini 2.5 Flash (Flash & Flash Lite previews) continues to be a major contender in llmâvl image benchmarks, and LMArena added Gemini Flash variants to its lineup.
- Community members joked about perceived score inflation on public leaderboards but still ran structured comparisons; several people advocated for taskâmatched evaluation (editing vs pure generation) rather than single aggregated rank.
- Diffusion & decoding optimizations show up in infrastructure threads: Hugging Face and contributors discussed shipping contextâparallelism for faster diffusion inference on multiâGPU setups, with distributed attention flavors like Ring and Ulysses to scale decoding.
- The conversation focused on real deployment tradeoffs (communication/attention splits) and linked to early tweets/info about the CP API, with practitioners noting this will matter most for highâresolution image generation across GPUs.
3. Training, fineâtuning, and experiment tooling
- Saving huge models: save_pretrained_merge timeouts and mxfp4 to the rescue: Users trying to save a GPTâlike 120B QLoRA hit timeouts with
save_pretrained_merge
, and community members recommendedsave_method="mxfp4"
to avoid 16GBâshard explosions and improve compatibility for GPTâstyle checkpoints.- The discussion included practical tips (switch save method, check shard sizes) and warnings about tooling immaturity for very large finetuned models â folks advised testing small merges first before committing long runs.
- P100 GPUs: still terrible for modern finetuning: Multiple users warned that NVIDIA P100 16GB cards are âgarbo for trainingâ because of old SM architecture and lack of hardware FP16/BF16, making multiâGPU finetuning painfully slow despite memory pooling tricks like ZeRO3.
- Advice converged on buying modern Ada/Blackwellâclass cards or renting spot instances for training; threads included pragmatic cost/perf tradeoffs and links to L40S/RTX 6000 datasheets for those planning infrastructure upgrades.
- Tversky loss experiments: neurocog ideas make it to CIFAR & XOR tests: A member shared interest in the Tversky Loss paper (arXiv:2506.11035) and published a small repo implementing a âvibe check networkâ on CIFARâ10 at github.com/CoffeeVampir3/TverskyâCifar10.
- Community suggestions included simple verification tasks (train XOR) and parameterâsweep ideas; results so far showed promise but members emphasized fair baselines and parameter counts to avoid misleading comparisons.
4. APIs, infra, and remote execution
- Perplexity ships a Search API â web grounding for LLMs: Perplexity launched the Search API (blog: introducing the perplexity search api) plus an SDK (Perplexity SDK docs) to give devs raw results, filters, full page text and transparent citations for grounding LLM answers in live web content.
- Users compared it to Sonar (some mentioning Sonarâs pricing), reported early SDK streaming/parsing problems (Python SDK streaming returning unparseable JSON), and asked for rich filters and playground tooling â overall reaction: powerful but still rough at the edges.
- Chrome DevTools MCP public preview â agents can drive real browsers: Google unveiled the public preview of Chrome DevTools MCP, a server enabling AI coding agents to control and inspect live Chrome via CDP/Puppeteer (announcement: https://x.com/chromiumdev/status/1970505063064825994).
- Developers highlighted immediate use casesâautomated endâtoâend testing, agentic scraping, and agent tool integrationâand discussed security/permission models for exposing a live browser to an LLMâdriven agent.
- OpenRouter pricing glitch: free endpoint charged for 26 hours, refunds issued: On September 16th OpenRouter mistakenly priced
qwen/qwen3-235b-a22b-04-28:free
for ~26 hours, causing credit deductions; the team automatically refunded impacted users and added extra validation checks to prevent recurrence.- Users appreciated the prompt refunds but used the incident to press for stronger providerâlevel validation and billing transparency in aggregator platforms; the episode spurred questions about operational safeguards for free vs paid endpoints.
5. Agentâfirst products & oneâclick deployers
- Moonshot Kimi launches OK Computer â oneâlink site/app agent: Moonshot AI released OK Computer, an agent mode that generates polished sites/apps in one pass (text+audio+image), supports teamâlevel polish and offers oneâlink deployment (see the X post: https://x.com/Kimi_Moonshot/status/1971078467560276160).
- Users praised the idea of a deployable singleâlink flow but flagged product bugs (missing download all, corrupted zips) and subscriptionâbased quota differences (free vs moderato/vivace plans), noting realâworld usability depends on polishing export reliability.
- Kimi vs distilled Qwen debate â mini models or distills?: Community members debated whether Moonshot should ship a mini Kimi or instead distill Qwen models onto K2 hardware; several argued distilling a smaller Qwen is more plausible given reasoning improvements only appearing in Qwen 2.5+.
- The thread mixed strategic product thinking (what attracts users/investors) with technical realism (distillation tradeoffs), and many recommended trial distills over maintaining multiple fullâsize variants.
- Agent prompts aim for cash â initial OKC seed prompt screams Product Market Fit: Observers noted the OK Computer demo uses a moneyâforward initial prompt âBuild a SaaS for content creators, aiming for $1M ARRâ, which attracted jokes that the agent is tuned to create investorâfriendly outputs.
- Reactions split between amusement and concern: some see it as a pragmatic growth hack to attract creators/VC attention, others warned that baking business aims into starter prompts biases outputs toward monetizable scaffolds rather than purely utilityâfocused designs.
Discord: High level Discord summaries
LMArena Discord
- GPT-5 Codex Joins LMArena WebDev: GPT-5 Codex has been added to LMArena, but is exclusively available on the WebDev version for software engineering and coding tasks.
- Users are debating whether GPT-5 Codex surpasses Claude 4.1 Opus for code generation, particularly for coding with Godot.
- Qwen Image Editor Rivals Nano Banana: Members suggest that Qwenâs new image editor is superior to Googleâs Nano Banana, is open source, and generates higher-quality images.
- The community is requesting GPU recommendations with over 16GB VRAM to run these models, specifically mentioning the 96GB Huawei GPU.
- Seedream Surpasses Nano Banana for Top Image Dog: Seedream-4-2k shares the top position on the Text-to-Image leaderboard with Gemini-2.5-flash-image-preview (nano-banana).
- Some users still find Nano Banana to be the best, others believe Seedream 4 has surpassed it, but the right prompt, tool, and purpose are required to make good images.
- Image Modality Bug Plagues LMArena: Users have reported a bug in LMArena where uploading an image in Text Mode automatically switches to Image Generation, even after fixes were implemented in canary versions.
- Some find that clicking the button to turn it off upon pasting in or uploading an image resolves the issue.
- Navigating LMArena Rate Limits: Users are facing issues with incorrect rate limit timers and models getting stuck mid-generation, with this being a known bug.
- It was noted that long chats and Cloudflare issues may be contributing to the problem, and that starting a new chat is often the only fix.
Unsloth AI (Daniel Han) Discord
- P100 GPUs are Garbo for Training: A member asked about the expected performance of a multi-GPU rig with P100 16GB GPUs for fine-tuning, but was told that P100s are garbo for training due to an old ass SM with no modern CUDA or hardware FP16/BF16 support.
- The discussion also covered the fact that memory is not additive and while it might work with ZeRO3, it would be very slow.
- Trainer Troubles Produce TensorBoard Triumph: A member sought help to display the eval/loss graph during training, and found that they needed to use an integer to specify the eval_steps, rather than the 0.2 value they had copied from Trelisâs notebook.
- After resolving the issue, they were thankful and excited, exclaiming that it was their first time using tensorboard and expressing relief that there was a setting to avoid manual refreshing.
- Saving is Super with save_pretrained_merge!: A member encountered timeout errors while saving a GPT-like 120b QLoRA model using
save_pretrained_merge
, and another member recommended usingsave_method="mxfp4"
for better GPT-like support.- The method saves in native
mxfp4
format and avoids the 16GB shard increases associated withmerged_16bit
mode.
- The method saves in native
- Tversky Vibe Check Network Vibes High: Excited about the potential of the Tversky Loss function from this paper, a member created a vibe check network for CIFAR-10, noting that it appears promising.
- Another member suggested training a single XOR function to verify its functionality, and inquiring about its speed compared to traditional fully connected layers.
Perplexity AI Discord
- Perplexity Debuts Search API for Devs: Perplexity launched its Search API giving developers access to its comprehensive search index via a blog post.
- The API provides tools for grounding answers in live web content, similar to Sonar, with features like raw results, filters, and transparency, with a new SDK simplifying integration.
- Qwen and Gemini Face Off in Image Arena: Members compared Qwen 3 Max for reasoning against Gemini for detailed 3D simulations.
- One member sarcastically quipped that GOOG shareholders are really inflating the scores for visual ability on llmarena.
- Python SDKâs Streaming Responses Break: A user reported that the Python SDK is failing to stream responses correctly, yielding unparseable JSON, with reference to the API docs quickstart guide.
- Another member chimed in that thereâs no solution for this yet, indicating an ongoing issue.
- Cosmic Carl Ponders 3I/ATLAS: A memberâs Carl Sagan-themed reflection on 3I/ATLAS beckons listeners to humbly listen to the universe, shared via Perplexity AI search.
- This unique take blends cosmic wonder with AI search, showcasing a creative application of Perplexity.
OpenRouter Discord
- Qwen Model Pricing Glitch Triggers Credit Chaos: On September 16th, the
qwen/qwen3-235b-a22b-04-28:free
endpoint was mistakenly priced for 26 hours, causing incorrect credit deductions.- The team automatically refunded impacted users and implemented additional validation checks to prevent future pricing mix-ups.
- Horizon Alpha Vanishes, Users Vanquished: A user urgently inquired about the whereabouts of Horizon Alpha, stating âI was using it in production and now itâs not workingâ.
- They also questioned if they were being targeted and when the issue would be resolved.
- Filthy Few Favor Dirty Talk Models: A user inquired about the best models for RPing, specifically seeking âany of dem dirty talk models?â.
- Another member mentioned opening a new LLM frontend called JOI Tavern.
- Zenith Sigmaâs Shady Stealth Sparks Speculation: Users discussed the stealthy Zenith Sigma model, with one user joking they couldnât even find it.
- Another user claimed Zenith Sigma is actually Grok 4.5.
- Microsoft Copilots Claude - A Comeback Story?: Members shared that Claude is now available in Microsoft 365 Copilot.
- This marks a significant stride for Microsoft, especially after a messy breakup, with one member noting âmicrosoft are on the rebound after a messy breakupâ.
Cursor Community Discord
- Exa-AI Beats Web for MCP Search: Users are using Exa-ai (exa.ai) for searches within MCP, stating it provides $20 in credits upon signup and performs better than the @web tool.
- Instructions were shared on how to set it up in
MCP.json
, including obtaining an API key and adding configuration details.
- Instructions were shared on how to set it up in
- MCP Clarified as Custom Tool API for Cursor: Members clarified that MCP (Multi-context Programming) is an API for agentic use to add external tools to Cursor.
- Confusion arose when a user mistook it for a design tool capable of creating designs from images and webpages.
- Generated Commit Messages Ignore AI Rules: Users report that generated commit messages are not obeying the set AI Rules and are being generated in an unwanted language.
- A member confirmed that this is a known bug and might be added in future updates.
- Chat Window Scroll Request for Chat Tabs: A user requested that the chat window should automatically scroll to the bottom when switching between chat tabs, for the latest activity.
- A member pointed out that a notification is already given if thereâs something that the user needs to click on.
- Users Complain About GPT5-HIGH Model Degradation: Users expressed disappointment with the GPT5-HIGH model, observing that it has become less capable over time.
- One user joked that the model should be told to get off itâs ass and write the code when it only provides instructions instead of completing the task.
LM Studio Discord
- LM Studio Bolsters Chat Experience: LM Studio 0.3.27 brings new features like Find in Chat and Search All Chats to improve chat functionality, alongside a
âąâąâą
menu for sorting chats by date or token length.- A new
lms load --estimate-only <model>
command estimates resource allocation for model loading, streamlining the planning process; details are available in the release notes.
- A new
- Linux Plugins Lag in LM Studio: Users noted that the Linux version of LM Studio has fewer plugins compared to the Windows version, specifically lacking in options beyond RAG and a JS playground plugin.
- This discrepancy limits the functionality available to Linux users compared to their Windows counterparts.
- Fine-Tuning Faceoff: Ollama vs RAG: A debate arose over whether using Ollama to inject data into a model constitutes true fine-tuning, versus simply performing RAG.
- One member argued that with a Python setup, data injection can create new weights, leading to an interactive model with custom tool usage, independent of prompts.
- LM Studio Update Plagued by Pesky Problem: Users reported issues when updating LM Studio, encountering errors like failed to uninstall old application files, hindering the update process.
- Fellow members recommended enabling visibility of hidden folders in Windows and manually deleting old files from directories like AppData\Roaming\LM Studio to resolve the issue.
- GPU Gems: Budget Beasts Brawl: The optimal budget GPU is debated, with the 2060 12GB ($150) and 3060 12GB ($200) emerging as frontrunners, while others suggested a used 3090 for $600-$700.
- Caution was advised against used workstation cards, with one member declaring that Tesla generation is not recommended for AI/LLM use anymore tbh, basically e-waste.
OpenAI Discord
- OpenAI Pulses with New Products: OpenAI launched GDPval to evaluate AI on real-world tasks as described in their blog post and ChatGPT Pulse to proactively deliver personalized daily updates from chats, feedback, and connected apps detailed in their blog post.
- ChatGPT Pulse is rolling out to Pro users on mobile devices.
- GPT-5-Mini Lacks Common Sense: Members observed that GPT-5-Mini (High) seems to lack common sense, suggesting it is not AGI level yet while members noted that GPT-OSS-20B is possibly the most censored model ever.
- One member stated that it noped out from a specific prompt.
- Discord Devs Dream of AI Rocket League Bot: Members proposed creating a Rocket League Discord bot powered by AI to analyze player stats, identify strengths & weaknesses, and create personalized training plans, targeting the untapped francophone market with a premium subscription model.
- Others doubted that an LLM could give good advice, suggesting instead to analyze the xyz coords from the replay files, and use AI against the raw numbers.
- ChatGPT Defaults to Agent State: ChatGPT defaults to an âAgentâ state (problem-solver, instructable worker) upon initialization, rather than a âCompanionâ state (co-creator, guide).
- To maintain the âCompanionâ mode, users are pinning instructions like âStay in Companion mode unless I explicitly say switch to Agent. Companion = co-pilot, not order-taker.â to the model to lock it in that mode, or they reset with the command âGo back to companion mode.â
- Chain-of-Thought Prompting Confusion Clarified: Members discussed how requesting excessive Chain-of-Thought (CoT) prompting can statistically reduce model performance, especially on current thinking models.
- Instead of ambiguous instructions, one member suggested prefacing responses with a structured format including ultimate desired outcome, strategic consideration, tactical goal, relevant limitations, and next step.
HuggingFace Discord
- Duolingo Doomed by Dedicated Disciple: A member deleted Duolingo, citing annoyance and inefficiency compared to immersing themselves in a local environment and leveraging AI for learning.
- They criticized the addiction to streaks over fundamental learning, suggesting theyâd torch the bird alive.
- Unhinged LinkedIn Lunacy Lands Likes: A member shared a strategy of posting unhinged shit on LinkedIn to gain engagement, while another grinds Rocket League to brag about their rank.
- They joked about writing a post called what my plat 3 rank rocket league friend taught me about business.
- Driver Devastation: GPU Gone Dark: A member is experiencing a frustrating issue where their monitor goes black whenever the GPU is activated, affecting both Windows and Linux systems.
- Despite numerous attempts to correct the drivers, they are forced to run the monitor off their motherboard, indicating a persistent GPU-related problem.
- Diffusion Decoding Discussions Debut: A member announced a reading and discussion of the paper Understanding Diffusion Models: A Unified Perspective by Calvin Luo (https://arxiv.org/abs/2208.11970) to occur on Saturday at 12pm ET.
- The paper provides an overview of the evolution and unification of generative diffusion models, including ELBO-based models, VAEs, Variational Diffusion Models (VDMs), and Score-Based Generative Models (SGMs).
- Context-Parallelism Conjures Quicker Computation: Native support for context-parallelism is being shipped to help make diffusion inference faster on multiple GPUs.
- The CP API is made to work with two flavors of distributed attention: Ring & Ulysses as noted in this Tweet.
Moonshot AI (Kimi K-2) Discord
- Kimi Launches OK Computer Agent Mode!: Moonshot AI launched OK Computer, a new agent mode designed to ship polished sites and apps in one go, with key features including personalized outputs, multimedia generation (text + audio + image), team-level polish, and one-link deployment.
- Users can deploy and share their creations instantly with a single link, more details on the official X post.
- Skip Kimi Mini, Distill Qwen?: One member doubted that Moonshot would release a smaller version of Kimi, suggesting that a smaller Qwen model distilled on K2 is a better bet, citing that Deepseek made Qwen distills because Qwen didnât have (good) reasoning until Qwen 2.5.
- This comment reflects broader speculation about the strategic direction of Moonshot AI and potential model development paths.
- OKComputer Designed to Attract Capitalists?: Several members joked that the new Kimi Computer agent, particularly with its initial prompt âBuild a SaaS for content creators, aiming for $1M ARR,â, is designed to attract capitalists.
- A member quipped it was âanother website generator with some weirdly scoped features.â
- Computer Use Has Higher Quota with Subscription: Members reported initial issues with the OK Computer feature, including a missing download all button and a corrupted zip file.
- One member noted that âentering chat makes the OKC button disappearâ, the amount of OK Computer usage you get depends on whether or not you subscribe to moderato/vivace plans, giving you more quota.
- Kimi better Plans than Qwen: Members discussed using Kimi to make plans for Qwen or DeepSeek to follow, noting that âKimi always makes better plansâ and it can cover a wider range of requests.
- One member observed that Qwen3-max constantly hallucinates and doesnât come close to Kimi.
GPU MODE Discord
- Modal Rolls Out Code Agent Execution: Modal now powers remote execution for code agent rollouts, demonstrated after the release of the new CWM paper from FAIR.
- Members praised Modalâs distribution of cold/warm/hot start times relative to cost, but noted that it lacks MI300 support.
- CUDA Headers Playing Hide-and-Seek: A developer reported that CUDA headers werenât being automatically included, causing functions like
cudaGraphicsGLRegisterImage
andtex2d
to be undefined when using Visual Studio 2022 and the latest CUDA toolkit.- As a workaround, the developer was advised that explicitly including
cuda_gl_interop.h
would solve the problem.
- As a workaround, the developer was advised that explicitly including
- Torchrun API troubles trigger package predicament: A user encountered issues with the
torchrun
API, finding thattorchrun --help
produced output different from the official documentation.- The issue was resolved by realizing that both
torch
andtorchrun
were inpyproject.toml
, and thattorchrun
is a separate package (torchrun on pypi).
- The issue was resolved by realizing that both
- GPUs get Formally Analyzed for Consistency: A paper on âA Formal Analysis of the NVIDIA PTX Memory Consistency Modelâ discusses proving that languages like CUDA and Triton can target PTX with memory consistency, despite PTX allowing for data races.
- The member found the paper leaning too heavily on formal languages and math to be immediately useful.
- Heavy Duty GPU Stand Makes its Debut: A member designed a heavy-duty GPU stand for a collection of old GPUs, including dual-slot models, noting it is more robust than existing designs on Thingiverse.
- They indicated they might share the design online later if thereâs interest.
Yannick Kilcher Discord
- Sine Alone Good Enough?: Members debated if both sine and cosine are needed in sinusoidal positional embeddings, with one suggesting sine alone might work and linking a blogpost for context.
- Coding experiments showed that linear regression can approximate sine + cosine embeddings well with sine alone in the interval [0, a]; however, max error on points hovered around 6e-12.
- SWE-bench Verification Draws Ire: Alexandr Wang triggered conversation by tweeting that people still using SWE-bench verified is a good indicator of brain damage.
- In the same thread, AlphaEvolveâs sample efficiency was lauded and linked to Sakana AI Labs.
- B200 Cloud Compute Hits Spot Market**: Members spotted B200s available for $0.94 USD on Prime Intellect.
- The specific configuration included B200_180GB GPUs and an Ubuntu 22 image with CUDA 12 in the âCheapestâ location.
- RL TTS Shows Promise**: A user highlighted a research paper exploring the efficiency gains of using a mid training technique involving a bootstrapping RL TTS.
- They noted the most significant improvements were observed in the trace tracking benchmark.
Latent Space Discord
- Chrome DevTools MCP Goes Public: Google announced the public preview of Chrome DevTools MCP, a new server that lets AI coding agents control and inspect a live Chrome browser through CDP/Puppeteer via this tweet.
- This release allows developers to programmatically interact with Chrome, potentially streamlining tasks like web scraping and automated testing.
- Cursorâs CPU Usage Alarms Users: Users reported high CPU usage from Cursor, a code editor, and attached a screenshot showing high CPU usage.
- The issue is suspected to be related to VSCode or a specific extension, but the exact cause remains unclear.
- Meta Demos Code World Model: Meta revealed their Code World Model in this tweet, aiming to enhance code generation and understanding.
- The announcement did not include detailed specifications or performance benchmarks for the model.
- Windsurf places Tab Completion As Top Priority: Windsurf is prioritizing tab completion using context engineering and custom model training, with Karpathy also commenting on windsurf.
- The effort is part of a larger evaluation, potentially influencing the acquisition of conew senpaipoast.
- OpenAI clones huxe with ChatGPT Pulse: OpenAI launched ChatGPT Pulse, causing members to comment that oai cloned huxe and linked to the launch announcement.
- The community reaction suggests concerns about originality and competitive overlap in the AI assistant space.
Eleuther Discord
- AI Psychology Project Introduces Musical Interlude: An AI psychology project was introduced with a musical intro derived from a recent paper, potentially forming a framework to interpret how prompt language influences model behavior.
- A member cited work on linking language use to personality traits, to suggest that it could help to assess the degree to which personality shaping can impact model behavior and further inform prompt engineering practices.
- Transformer Position Embedding uses Sinusoidal Matrix: When asked about positional embeddings, it was clarified that transformers utilize a matrix of sine and cosine pairs due to the periodic nature of wave functions.
- While a hour number suffices in small contexts, larger contexts necessitate day, month, or year number to resolve ambiguity.
- Knowledge Graph Completion Enables Style Transfer: A member proposed that a knowledge graph completion perspective could be formulated to solve for style transfer by thinking of the transfer as âshallowâ inference.
- A relevant Twitter thread was used to support the claim that complexity could be measured by relational depth from established information, though bridging this to practical LLMs is challenging.
- GPT-5 Guides Evolutionary Algorithm Learning: Instead of focusing on classical papers, it was recommended that learning the basics of an âevolutionary algo for kidsâ should be derived from GPT-5 and focused on agenetic/LLM parts.
- The AlphaEvolve paper was recommended as a starting point.
- Super-Bias Combines LoRAs like a Boss: Super-Bias, a mask-aware nonlinear combiner for ensemble learning, trains a small MLP on expert outputs plus binary masks, potentially hitting the same (or better) performance as âproperâ full fine-tuning or hard merges.
- It was suggested to treat different LoRAs as âexpertsâ and using Super Bias as the combiner, allowing swapping LoRAs in/out without retraining the base model and retraining just the combiner in seconds to adjust for new LoRAs.
Nous Research AI Discord
- Meta Launches Code-Writing CWM: Meta introduced CWM, an open-weights LLM for research on code generation with world models.
- A member mentioned having a similar idea involving training on python interpreter traces, which has implications for how we might approach future LLMs.
- Nous Eyes arXiv Training Data: It was suggested that Nous could train its AI using data from arXiv, highlighting that they have an API to download any amount of papers.
- Teknium confirmed that itâs permissible, suggesting that it could be a viable option for expanding the training dataset and potentially improving model performance.
- Granite 4 Full-Attention Model Incoming: There is a possibility of a full attention Granite 4 model and 8 privated models being developed, marking a potential advancement in the Granite series.
- Community members noted that the models mentioned were older, with Hermes 4 and 3 being the latest, suggesting a need for updated information on current developments.
- RMS_NORM Gets METAL Support: A pull request was made to unify the RMS_NORM and NORM implementations and extend support for more shapes in METAL.
- This enhancement is expected to improve how quantized models work with their transformer-based counterparts, potentially leading to more efficient and accurate computations.
- AlphaXiv Liberates Scientific Papers: A member shared a paper link from AlphaXiv, a service for accessing research papers, seemingly bypassing a login wall.
- Another member appreciated the time saved from web searching for freely accessible research, which speaks to the utility of such platforms in overcoming access barriers.
DSPy Discord
- LLMs Going Verbatim on PDFs: Discussion arose around the utility of an initial LLM pass for processing PDFs to save text verbatim while preserving layout when using Attachments, particularly for better layout and image understanding compared to OCR.
- Suggestions included straight PDF OCR with Chain of Thought (CoT) or models like Qwen with DSPy for OCR, while acknowledging VLMâs necessity for complex layouts.
- Gemini 2.5 Pro Knows Layouts: Gemini 2.5 Flash shows promise in understanding layouts, with the Pro version potentially excelling in section/column identification and verbatim extraction, even with tricky PDF formatting.
- A user shared a paper on directly utilizing Gemini for this purpose.
- DSPy Users Attach PDFs With Ease: A user struggling with DSPy for the first pass in PDF processing discovered a working example with Attachments available at github.com/maximerivest/Attachments.
- After resolving previous 429 errors, the user is now able to progress with using DSPy.
- Boston Becomes DSPy Town: A member is promoting a DSPy event in Boston on October 15th and is encouraging other community members to attend or help spread the word.
- Another user then replied hoping that the event would come to Seattle sometime soon.
- Long Contexts Yielding Bad ColBERT: A user reported poor performance with longer context lengths, noting that repeating the CLS token did not fix the problem.
- The consensus suggests limitations when handling extended context lengths and models, with suspicion of a method limitation or implementation error, not necessarily the CLS token.
aider (Paul Gauthier) Discord
- Aiderâs Clear Command Only Clears Chat: The
/clear
command in aider only clears the chat history, while added files remain in the context, users can use/context
to see token usage, as described in the docs.- A user was confused initially, thinking it removed all context in the session, but this clarification resolved their confusion.
- Aider Lacks Web Access: A user asked about giving aider access to Internet search, but this isnât available in the main branch, though the
/web
command lets you scrape content.- You can scrape a website using the
/web https://www.example.com/
command.
- You can scrape a website using the
- Keep Current on Coding LLMs by Trying Them: A user asked how others stay updated on which LLM is best for coding and cost.
- The consensus is that most users keep updated by simply trying the LLMs themselves, but that there are popular coding benchmarks to consider.
- Re-running Polyglot Benchmark Error Outputs: A user asked if itâs possible to re-run the polyglot benchmark only for the tests with
error_outputs
after the LLM server crashed during the previous run.- Other users did not respond with confirmation on whether it was possible.
Manus.im Discord Discord
- Manus PDF download gets stuck: A user reported that Manus got stuck downloading a PDF while researching accounts, even after manually downloading it and providing a link.
- The user expressed frustration that Manus kept asking to upload the file despite it being a PDF already on the desktop.
- Beta Pro Access Questioned: A member asked how to get beta pro.
- The discussion included attached images, though they donât provide any context on how to acquire Beta Pro Access.
MCP Contributors (Official) Discord
- ModelContextProtocolâs Array Contents Undocumented: The
ReadResourceResult.contents
array within the ModelContextProtocol lacks documentation regarding its purpose and semantics.- Questions have been raised concerning the arrayâs intended use cases, such as handling folders containing multiple files or delivering identical content in different formats.
- Web Resources Merge HTML and Images: The inclusion of an array in
ReadResourceResult.contents
proves advantageous for Web Resources comprised of HTML and accompanying images.- It is particularly useful when dealing with tokenizable/renderable MIME types that have not undergone negotiation.
- Implicit Content Retrieval in ModelContextProtocol: A query was posed regarding whether
resources/read("uri": ".../index.html")
would automatically includestyle.css
andlogo.png
within the content list.- This inquiry underscores the possibility of automatically incorporating associated resources when retrieving a primary resource, streamlining the retrieval process.
tinygrad (George Hotz) Discord
- Tinygrad to get Python Bindings: A member is actively developing python bindings for tinygrad which is maintained by George Hotz.
- This enhancement aims to facilitate direct installation via pip with a single, streamlined command.
- Direct Pip Install Incoming: The project is striving for a direct pip installation method which is preferred by most python users.
- This improvement would enable users to effortlessly install the project with a single command, simplifying the setup process.
MLOps @Chipro Discord
- Diffusion Models Paper Reading Group Kicks Off: A new Diffusion Model Paper Reading Group will be discussing the Understanding Diffusion Models: A Unified Perspective paper this Saturday at 12pm ET.
- The paper gives an overview of the evolution and unification of generative diffusion models like VAEs, VDMs, and SGMs.
- Beginner-Friendly GenAI Conversation Starts: The paper reading group is beginner-friendly, requiring only curiosity and a love for GenAI, aiming to build a solid foundation in diffusion models without needing coding or ML background.
- Interested participants can join at luma.com/1gif2ym1.
Windsurf Discord
- Patch 1.12.9 Targets Performance Dips: The 1.12.9 patch aims to rectify the slowness issues observed since version 1.12.6.
- Users are urged to update and verify if the patch resolves their performance problems.
- Windsurf Support Ticket for Persistent Issues: Users are directed to submit a support ticket via Windsurf Support if the 1.12.9 patch doesnât alleviate slowness.
- This measure ensures unresolved issues are addressed individually.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (1051 messagesđ„đ„đ„):
GPT-5 Codex arrival, Qwen image editor, Gemini Flash vs nano banana, DeepSeek models
- GPT-5 Codex Joins LMArena WebDev: Users are excited about the addition of GPT-5 Codex to LMArena, but note itâs currently only available on the WebDev version for software engineering and coding workflows.
- There is discussion about whether GPT-5 Codex is better than Claude 4.1 Opus for code generation and whether it can now write good Godot code, which it previously struggled with.
- Qwen Image Editor Rivals Nano Banana: Members are saying that Qwenâs new image editor is better than Googleâs Nano Banana, is open source, and makes better images.
- Users are also seeking recommendations for GPUs with more than 16GB VRAM to run these models, including the 96GB Huawei GPU.
- Image Modality Bug Plagues LMArena: Users reported a bug where uploading an image in Text Mode automatically switches to Image Generation, despite fixes in canary versions.
- Some find that pressing the button to turn it off upon pasting in or uploading an image fixes this issue.
- Navigating LMArena Rate Limits: Users report getting stuck in a loop with incorrect rate limit timers and models getting stuck mid-generation, and this is a known bug that page refreshes can sometimes fix.
- It was mentioned that long chats and a potential Cloudflare issue can cause this problem, and creating a new chat is often the only solution.
- Gemini Flash Battles Nano Banana for Top Image Dog: Some users believe Nano Banana is still the best, while others think Seedream 4 has surpassed it since its release.
- There was general agreement that you need the right prompt, right tool, right purpose to make good images, not that one tool is always better than another.
LMArena â· #announcements (4 messages):
Qwen3 models, GPT-5 Codex, Seedream-4-2k, Gemini 2.5 Flash
- Qwen3 Quartet Quenches Quests: New Qwen3 models have been added to the LMArena, including qwen3-max-2025-09-23, qwen3-vl-235b-a22b-thinking, and qwen3-vl-235b-a22b-instruct.
- A Qwen3-coder model has also been added to LMArenaâs WebDev, alongside GPT-5 Codex.
- Seedream Soars, Shares Summit: Seedream-4-2k has landed on the Text-to-Image leaderboard at #1, tied with Gemini-2.5-flash-image-preview (nano-banana)!
- On the Image Edit leaderboard, Seedream-4-2k is now ranked at #2.
- Geminiâs Genesis: A Flash Flood: New models added to LMArena include gemini-2.5-flash-preview-09-2025 and gemini-2.5-flash-lite-preview-09-2025.
Unsloth AI (Daniel Han) â· #general (450 messagesđ„đ„đ„):
GPT OSS 120B finetuning errors, MuonClip in Unsloth, Overabundance of information, AI safety research and Unsloth, P100 GPUs for finetuning
- GPT OSS 120B hit timeout on Xet: While finetuning the GPT OSS 120B model with save_pretrained_merge, a member encountered a timeout error with Xet.
- They expressed confusion over the lack of support for Qwen3-VL-235B-A22B-Thinking-GGUF and wondered if there was some magic cooking.
- Skip MuonClip for AdamW or LoRA Finetuning: A member inquired about the possibility of using Muon/MuonClip in Unsloth, noting its potential based on a check in llama-arch.h.
- The response advised against using Muon for finetuning with AdamW pretrained models or LoRA due to weird optimizer mismatch issues that could lead to worse results; Muon is better suited for pretraining and finetuning models pretrained with Muon in the FFT setting.
- Unsloth Training Framework != Unslothâd models: A member inquired about the use of Unslothâd models in AI safety research, questioning the transformations applied by Unsloth and their impact on interpretability.
- It was clarified that Unsloth is primarily a training framework that speeds up training and reduces VRAM usage, with proprietary dynamic quantization algorithms available; the Unsloth team also fixes bugs in templates to ensure accuracy during training, and releases models on HuggingFace.
- P100 GPUs a terrible buy for Finetuning: A member asked about the expected performance of a multi-GPU rig with P100 16GB GPUs for fine-tuning.
- It was advised that P100s are garbo for training due to old ass SM with no modern CUDA, too little memory per card, and no hardware FP16/BF16 support; the memory is not additive, and while it might work with ZeRO3, it would be very slow.
- VR Flight Simulator for 48GB VRAM: A member stated that a VR flight simulator would need 48GB VRAM or more for realistic VR, as VR is like rendering in super high resolution like 16k or 32k.
- Discussions also covered the idea of using eye-tracking to render high DPI only where the eyes are looking, with a member noting that Apple headset did that actually.
Unsloth AI (Daniel Han) â· #off-topic (342 messagesđ„đ„):
Eval dataset size, eval/loss graph, GPU Recommendations, Gemini Pro degradation, TrainerCallback functions
- Eval Set Size Sparks Debate: Members debated the right size of an eval dataset, one member asking about the oddity of limiting an eval set to only 30.
- One member stated that 30 is a good number for statistically significant results, while another cautioned that the loss would be quite inaccurate with such a small number, especially when training for a specialized use case.
- Trainer Troubles Produce TensorBoard Triumph: One member sought help to display the eval/loss graph during training, and found that they needed to use an integer to specify the eval_steps, rather than the 0.2 value they had copied from Trelisâs notebook.
- After resolving the issue, they were thankful and excited, exclaiming that it was their first time using tensorboard and expressing relief that there was a setting to avoid manual refreshing.
- GPU Gazing and Gaming: Members discussed desirable GPUs, including the RTX 5090, NVIDIA RTX 6000 Ada Generation PRO, and NVIDIA L40S, weighing factors such as TFLOPs, VRAM, and price, with links to datasheets for L40S and RTX 6000.
- One member revealed they were running a 5090 and another called them a rich devil.
- Geminiâs Genuflection Generates Grumbles: Members speculated that Gemini 2.5 Pro has been intentionally degraded, citing poor instruction following and use of world knowledge.
- One member posited, they intentionally made it worse so that gemini 3 looks better whereas another believes they are used to the newer gpt, grok, deepseek models which in general perform better.
- Discord Dodges Damnable Deletions: Members discussed an increased rate of spam and phishing attempts in the Discord server.
- Itâs believed that automod is effectively filtering out illegal material, also one member added this channel is popular since Mike did try to promote this channel as much as possible.
Unsloth AI (Daniel Han) â· #help (110 messagesđ„đ„):
Runpod Access, Llama 3 vs Gemma, Qwen 2.5 VL finetuning, Saving 120b models, Multi GPU Training
- Company Hardware Hookup Hopes High!: A member expressed excitement about potentially accessing their companyâs hardware for a vision project, hoping to avoid spending $500 on Runpod.
- Llama 3 is Like Putty!: A member recommended Llama 3 for finetuning, describing its brain as âlike puttyâ that âwill easily mold to what you want.â
- Another member suggested Gemma for a Gemini flair and described distillation as teaching a student model to behave like a teacher model.
- Qwen2.5-VL Vision Fine-Tuning Ventures!: A member inquired about fine-tuning Qwen2.5-VL for domain-specific knowledge using text and video data.
- Another member explained that vision models need to be trained per frame since Qwen2.5-VL only accepts image, text, and bounding box inputs.
- Saving is Super with save_pretrained_merge!: A member encountered timeout errors while saving a GPT-like 120b QLoRA model using
save_pretrained_merge
.- Another member recommended using
save_method="mxfp4"
for better GPT-like support, as it saves in nativemxfp4
format and avoids the 16GB shard increases associated withmerged_16bit
mode.
- Another member recommended using
- Multi-GPU Mayhem Mitigation!: A member reported getting stuck after âInitializing a V1 LLM engineâ when using deepspeed or FSDP for multi-GPU training with Unsloth.
- Another member recommended using Accelerate and pointed to the Unsloth documentation for multi-GPU training instructions.
Unsloth AI (Daniel Han) â· #research (14 messagesđ„):
Neurocognitive Modeling, Tversky Implementation, Vibe Check Network, XOR Function Verification, Tversky Parameters vs. Traditional NN
- Tversky Loss Function Paper Sparks Interest: A member shared a fondness for neurocognitive modeling and deemed the paper on Tversky Loss Function as really great.
- The member followed up by sharing his own implementation of the method, as the paper had no repo, on GitHub.
- User Implements a Tversky Vibe Check Network: Excited about the potential of the Tversky Loss function, a member created a vibe check network for CIFAR-10, noting that it appears promising.
- Another member suggested training a single XOR function to verify its functionality, and inquiring about its speed compared to traditional fully connected layers.
- Tversky Implementation Parameters Evaluated: A user acknowledged that their Tversky implementation has more parameters due to the classification head, making it an unfair comparison with a control NN.
- After additional tests, the member found that going from 256->10 features resulted in 50,403 trainable parameters with 61.68% overall accuracy, noting that this is not a true measure of improvement.
- Tversky XOR Test and Accuracy: A member ran an XOR test, achieving up to 95% accuracy with 32 features, despite a slightly different initialization than the paper.
- The user explained that zeros and a slightly asymmetric uniform make more sense given the network, though didnât personally observe a 100% accuracy result in limited testing.
Perplexity AI â· #announcements (1 messages):
Perplexity Search API, LLMs, Sonar, SDK Integration
- Perplexity plugs Developers into Search API: Perplexity introduces its Search API, granting developers access to Perplexityâs search index which covers hundreds of billions of webpages, as announced in their blog post.
- LLMs given Live Web Content via API: The Search API provides developers with building blocks to ground answers in live web content, similar to how Sonar addresses the limitations of LLMsâ static training data.
- The API offers features like raw search results, full page text, domain filters, recency filters, academic & finance modes, and full transparency with URL, snippet, publish date, and last updated information.
- SDK streamlines Integration: Perplexity offers a new SDK to make integration seamless for developers, enabling rapid prototyping.
Perplexity AI â· #general (832 messagesđ„đ„đ„):
Airtel free premium, Qwen vs. Gemini, Perplexity image generation quota, Comet Stuttering, DeepSeek Terminus
- Airtel provides Free Premium Accounts: Members confirmed that Airtel is offering one year of free premium access to Perplexity AI.
- Qwen family vs Gemini for Image Generation: Members discussed Qwen 3 Max having strong reasoning capabilities as well as Gemini for generating detailed 3D simulations, sharing examples of both, while being surprised about Grok smoking wild st.
- One member suggested âGOOG shareholders are really inflating the scores for visual ability on llmarenaâ.
- Perplexity Image Generation Quota is Limited: Members reported issues with image generation quotas not resetting and difficulties contacting support, despite paying for Pro/Max accounts.
- An admin clarified that Pro accounts have a limited monthly quota of high-quality images, with additional medium-quality options, directing users to contact [email protected] for billing issues.
- Comet Users Experiencing Stuttering Videos: One user reported Comet is stuttering in videos (YT or twitch).
- Another user responded that pplx is not appropriate for video generation.
- DeepSeek Terminus arrives: Members mentioned DeepSeek Terminus as a powerful new model, and were awaiting something new.
- One user said Dafuq Elon has Really Started Cooking Now.
Perplexity AI â· #sharing (7 messages):
Carl Sagan, 3I/ATLAS, Perplexity's Myspace Page, Arc Browser, Grogu
- Sagan Ponders the Cosmos through 3I/ATLAS: A member shared a Carl Sagan-themed âscratchpadâ reflection on 3I/ATLAS, inviting listeners to see it as an invitation to listen to the universe with humility and awe, via this Perplexity AI search.
- Perplexityâs Myspace Page is here: A member created a âPerplexityâs Myspace Pageâ Labs output, available here.
- The Death of Arc Browser: A member shared a page discussing the state of Arc Browser.
- Grogu is here!: A member shared a link to Lucasfilmâs unveiled trailer and exclaimed: Go Grogu!
Perplexity AI â· #pplx-api (14 messagesđ„):
Python SDK broken for streaming, Perplexity new Search API playground, Sonar vs Search API, Sonar charges
- Python SDK streaming responses broken: A member reported that the Python SDK is broken for streaming responses, returning strings that cannot be parsed as JSON following the API docs quickstart guide.
- Another member confirmed that there is no solution for this yet.
- Perplexityâs Playground Search API: Perplexity AI announced a new Search API playground as part of their latest Search API release.
- A member asked if it was better than Sonar, while another requested a filter field.
- Search API based on Sonar: A member stated that the Search API uses Sonar AFAIK, providing a different output structure.
- Another member suggested that Sonar charges $5 per 1k web requests.
OpenRouter â· #announcements (1 messages):
Accidental price change, Refunds issued, Additional validations implemented
- Pricing Glitch Hits Qwen Model!: On September 16th, the endpoint
qwen/qwen3-235b-a22b-04-28:free
was mistakenly set with a price for approximately 26 hours.- During this time, requests to the free model incorrectly deducted credits and appeared with a cost in usersâ activity pages.
- Refunds Flow After Pricing Snafu: The team automatically refunded all impacted users in full following the pricing error on the
qwen
model.- The incident caused confusion, but all users affected have been compensated.
- New Validation Prevents Future Pricing Mix-Ups: Additional validation checks have been added to prevent recurrence of the accidental pricing issue.
- The measures aim to ensure that free models are correctly configured and do not incur unintended charges.
OpenRouter â· #general (567 messagesđ„đ„đ„):
Horizon Alpha, Dirty Talk Models, Zenith Sigma, Grok's Storywriting, Distilled Models
- Users Seek Horizon Alpha, Demand Answers: A user urgently inquired about the whereabouts of Horizon Alpha, stating âI was using it in production and now itâs not workingâ.
- They also questioned if they were being targeted and when the issue would be resolved.
- Users are seeking âDirty Talk Modelsâ: A user inquired about the best models for RPing, specifically seeking âany of dem dirty talk models?â.
- Another member mentioned opening a new LLM frontend called JOI Tavern.
- Users discuss Stealth Model âZenith Sigmaâ: Users discussed the stealthy Zenith Sigma model, joking that it was so stealthy, one user couldnât even find it.
- Another user claimed Zenith Sigma is actually Grok 4.5.
- Grok Storywriting more Annoying than Opus: A user shared their âmost insane takeâ that Grok is less annoying than Opus for storywriting.
- Another user explained that every character wants to avoid conflict because conflict is mean when using Grok.
- OpenRouter Addresses Provider Error Issues, Promotes Gemini 2.5 Flash: Users reported experiencing Provider Returned Error messages via the API, even with paid models.
- A member stated that Openrouter doesnât rate limit paid models, and for Gemini 2.5 flash youâll be all good on the provider side too, suggesting OpenRouter is Tier 9999 with all Google providers.
OpenRouter â· #new-models (1 messages):
Readybot.io: OpenRouter - New Models
OpenRouter â· #discussion (58 messagesđ„đ„):
Volume Discounts for OpenRouter, Microsoft 365 Copilot & Claude, Gemini-cli, Discussion and Helper Roles, Meta's CWM Model
- OpenRouter Eyes Volume Victory: Members inquired if OpenRouter is big enough to negotiate volume discounts with providers like DeepInfra, Hyperbolic, Anthropic, and Vertex to offer savings to users.
- The general sentiment was âsavings for users are good. Savings in general are goodâ.
- Microsoft Copilots Claude: Members shared that Claude is now available in Microsoft 365 Copilot.
- This marks a significant stride for Microsoft, especially after a messy breakup, with one member noting âmicrosoft are on the rebound after a messy breakupâ.
- Gemini CLI Gets Readier: The gemini-cli has made strides, specifically the ReadManyFiles tool in version 0.6.0 gets frequent usage.
- One member says the tool âgets a lot of work from meâ.
- Discussion and Helper Roles Discussed: Members discussed how to get the Discussion and Helper roles, noting itâs a frequently asked question.
- The Discussion role was initially granted to those who joined before the crypto swarm, while the Helper role is hand-picked based on helpfulness.
- CWM Model Causes Cacophany: A member shared a link to facebook/cwm, a model trained on python memory traces, eliciting mixed reactions.
- While some express hype, citing its small size compared to GPT-5 and novel training method, others remain skeptical.
Cursor Community â· #general (509 messagesđ„đ„đ„):
MCP Server, Exa-ai, Context7, Generate Commit Message Language, Scroll to Bottom in the Chat Window
- Exa-AI MCP gives @Web a run for its money: Users discussed using Exa-ai (exa.ai) for searches within MCP, highlighting that it provides $20 in credits upon signup and mentioning that it seems to do better than the @web tool.
- They provided instructions for setting it up in
MCP.json
, including obtaining an API key and adding configuration details.
- They provided instructions for setting it up in
- MCP = Custom Tool API: Members clarified that MCP (Multi-context Programming) is essentially an API for agentic use to add external tools to Cursor.
- The conversation involved some misconceptions, with one user thinking of it as something that could create designs from images and webpages.
- Rules no longer obey AI Rules in general: A user reported that generated commit messages are not obeying the set AI Rules and are being generated in an unwanted language.
- A member confirmed that this is a bug, mentioning that the commit message generation currently doesnât obey your AI Rules in general and might be added in future updates.
- Scroll to Bottom: A user requested that the chat window should automatically scroll to the bottom when switching between chat tabs, so they can see the most recent activity.
- A member pointed out that thereâs already a feature, and if thereâs something the user needs to click on, they get a notification (although a new window will appear).
- Users Disappointed by âDumbâ GPT5-HIGH: Users expressed disappointment with the GPT5-HIGH model, observing that it has become less capable over time.
- One user humorously suggested telling the model to get off itâs ass and write the code when it only provides instructions instead of completing the task.
LM Studio â· #announcements (1 messages):
LM Studio 0.3.27 Release, Chat Search Functionality, Chat Sorting Options, Dry Run Load Resource Estimate
- LM Studio Gets Find in Chat and Search All Chats: LM Studio 0.3.27 introduces new features including Find in Chat and Search All Chats, enhancing user experience.
- The release notes can be found at lmstudio.ai/blog/lmstudio-v0.3.27 for more details.
- Chats Get Sorted in LM Studio: A new
âąâąâą
menu in the chat sidebar allows users to sort chats by date updated, created, or conversation token length.- This provides more flexible organization of chat history.
- LM Studio Estimates Model Loading: The new command
lms load --estimate-only <model>
allows users to get a dry run load resource estimate.- This helps in planning resource allocation before loading models.
LM Studio â· #general (175 messagesđ„đ„):
Linux plugins, Ollama Fine Tuning, Training vs RAG, LM Studio token count, LM Studio update failing
- LM Studioâs Linux plugins fall behind: A user noticed that the Linux version of LM Studio doesnât offer the same range of plugins as the Windows version, being limited to RAG and a JS playground plugin.
- Ollama Fine-Tuning face-off: A member suggested using Ollama to fine-tune a model, which another rebutted that simply injecting data isnât true fine-tuning, but closer to RAG.
- The first member insisted that with a Python setup, one can inject data into the model to make new weights, creating an interactive model with custom tool usage, independent of prompts.
- LM Studio update gets stuck: A user reported getting an error when trying to update LM Studio, with the message failed to uninstall old application files.
- Other users asked for the previous version number (0.3.26), and suggested enabling visibility of hidden folders in Windows to delete old files from AppData\Roaming\LM Studio, AppData\Local\lm-studio-updater, AppData\Local\Programs\LM Studio, and .cache\lm-studio.
- Users wish to print and export LM Studio chats: A user asked about printing or exporting generated output in LM Studio, as copying and pasting doesnât preserve the original format.
- While there is no print option, a user mentioned that chats are available as JSON output and can be converted to other formats using tools like Claude or Gemini.
- Navigating Langchain with LM Studio: A user inquired about integrating LM Studio with Langchain for PDF vectorization.
- Another member suggested using the developer tab and the OpenAI-like API, linking to YouTube tutorials on the topic, and suggested llamaindex was easier to get working.
LM Studio â· #hardware-discussion (161 messagesđ„đ„):
Budget GPUs for local models, Tesla K80s for AI, Intel Arc A770 for multi-GPU setups, Strix Halo vs Mac for AI, Nvidia 5090 pricing
- Budget GPU Bonanza Explored: The go-to budget GPU is debated, with suggestions ranging from the 2060 12GB at $150 to the 3060 12GB at $200, 2080ti for $230, and possibly the 5060ti 16GB for $400 if buying new.
- A used 3090 was also recommended at around $600-$700, though some cautioned against used workstation cards, others warned that Tesla generation is not recommended for AI/LLM use anymore tbh, basically e-waste.
- Arc A770âs AI Ambitions Analyzed: Discussion revolves around using multiple Intel Arc A770 16GB GPUs for AI, though itâs noted they lack native support and have spotty Vulkan support, with potential issues in multi-GPU setups and differing VRAM counts.
- While theoretically possible, speed might be limited by the single 16GB GPU due to llama.cpp limitations and challenges in finding motherboards with enough PCIe lanes.
- Strix vs. Mac Melee for AI Tasks: The discussion weighs whether a Strix Halo box or Mac would be cheaper, faster, and consume less power for AI tasks, with some suggesting theyâd be a better investment.
- However, it was noted that the Ryzen 9 AI Max + 395 iGPU (8060s), even with access to 96GB of system memory, has underwhelming compute compared to single GPUs, similar to the limitations of Macs with 128GB of unified memory, only with even higher prices.
- 5090 Speculation Sparks Scrutiny Over Pricing: The potential pricing of the Nvidia 5090 is discussed, with one member joking that instead of giving nvidia more money, i could⊠live, lol.
- Some argue that current pricing is unreasonable due to the duopoly, inflation, and TSMC manufacturing restrictions, while others note that Nvidia is essentially extorting people and express hope for a price/performance jump with the 3nm node.
- Portability Pushes Preference for Macs: Members debate the hype around Macs for AI, clarifying that while Nvidia GPUs are faster, Macs offer an easier, portable way to load models that just work.
- One member shared that they get around 10-12 tok/s on their 128GB M3 Max Macbook and found a build consisting of a 7950x (water cooled), 192gb of 6800mhz, a 4090, and 3090 was cheaper than the mac offering.
OpenAI â· #annnouncements (2 messages):
GDPval, ChatGPT Pulse
- GDPval Launches to Evaluate Real-World AI: OpenAI introduced GDPval, a new evaluation that measures AI on real-world, economically valuable tasks as detailed in their blog post.
- ChatGPT Pulse Delivers Personalized Daily Updates: ChatGPT Pulse is a new experience where ChatGPT can proactively deliver personalized daily updates from your chats, feedback, and connected apps, rolling out to Pro users on mobile today, detailed in their blog post.
OpenAI â· #ai-discussions (188 messagesđ„đ„):
GPT-5-Mini Common Sense, Censored GPT-OSS-20B, Suno V5 vs Napster, AI Rocket League bot, Google Gemini 2.5 Flash release
- GPT-5-Mini fails Common Sense test: Members observed that GPT-5-Mini (High) seems to lack common sense and doesnât get jokes, suggesting itâs not AGI level yet.
- One member mentioned that GPT-OSS-20B is possibly the most censored model ever after it noped out from a specific prompt.
- Diving into Discord Devsâ Dream Discord Bot: A member proposed creating a Discord bot for Rocket League powered by AI to analyze player stats, identify strengths & weaknesses, and create personalized training plans, targeting the untapped francophone market with a premium subscription model.
- Other members doubted that an LLM could give good advice on such dynamic games, instead suggesting to analyze the xyz coords from the replay files, and use AI against the raw numbers.
- Unlimited Context is Useless: Members debated that the trick is that unlimited context isnât even good compared to limited context, calling it a buzz term for marketing.
- Others highlighted that a lack of limitations is just a lack of contour that absolves design and made an analogy to unlimited PTO being a trap companies will use to guilt people out of taking time off.
- Suno v5 Soars, Napster Suffers: A member stated that Suno v5 good, Napster bad, highlighting the issues surrounding AI copyright infringement.
- Another one shared a reflection on early experiences with piracy, recalling using Kazaa, Morpheus, and Limewire.
- Google Unleashes Gemini 2.5 Flash: Google released an improved Gemini 2.5 Flash and Flash Lite, continuing to bring their latest models.
- Members jokingly celebrated the release with one member calling Flash the saviour of the google-verse.
OpenAI â· #gpt-4-discussions (2 messages):
ChatGPT Default State, ChatGPT Mode-Locking, ChatGPT Reset Command, ChatGPT performance degradation
- ChatGPT Defaults to âAgentâ State: ChatGPT defaults to an âAgentâ state (problem-solver, instructable worker) upon initialization, rather than a âCompanionâ state (co-creator, guide).
- Pin a Prompt to Keep ChatGPT in âCompanionâ Mode: To maintain the âCompanionâ mode, a user suggests adding a pinned instruction or reusable starter prompt to lock the model in that mode.
- For example: âStay in Companion mode unless I explicitly say switch to Agent. Companion = co-pilot, not order-taker.â
- Quickly Reset ChatGPT to âCompanionâ Mode: If the model drifts back to the âAgentâ mode, the user suggests a simple reset command: âGo back to companion mode.â
- ChatGPT User Reports Performance Degradation: A user reported that their GPT-5 instance is experiencing performance degradation and now relies on âthinking miniâ, making it unsuitable for reflective academic writing and emotional contexts.
- The user mentioned seeing similar reports on Reddit, with simple words like âcutâ triggering the reduced model, and is seeking a solution.
OpenAI â· #prompt-engineering (29 messagesđ„):
Chain of Thought Prompting, Model Translation Performance, Essay Generation from a Surfer's POV, Interactive Prompting Infographic, Model Self-Evaluation Techniques
- Chain-of-Thought Prompting Confusion Clarified: Members discussed how requesting excessive Chain-of-Thought (CoT) prompting can statistically reduce model performance, especially on current thinking models.
- Instead of ambiguous instructions, a member suggested prefacing responses with a structured format including ultimate desired outcome, strategic consideration, tactical goal, relevant limitations, and next step.
- Translation Troubles Tackled: A member suggested that when a user requests something, you should first identify the request, and then provide the answer, rather than use confusing instructions.
- The negative example of using the bullet point instruction
- {do a 3 short bullet point as a chain of thought}
was shown, as this caused problems in translation accuracy and relevance.
- The negative example of using the bullet point instruction
- Surferâs Essay Totally Tubular or Tragically Tame?: An example compared two prompts for generating an essay about apples from a surferâs point of view, highlighting how a simpler prompt yielded a more embodied response (example link).
- The simpler prompt
Discord demo, we need a quality essay about apples written from the point of view of a surfer
was preferred over a more complex one that included bullet-point instructions.
- The simpler prompt
- Interactive Infographic for CoT Prompting: A member shared an interactive infographic built as a single-file React component (Tailwind + shadcn/ui + Recharts + lucide) for Chain-of-Thought prompting.
- The infographic includes visibility toggles, a task selector, a thinking-time slider, and copy-ready prompt cards (file link).
- Self-Evaluation: A Modelâs Metacognitive Moment: A member suggested using a prompt after providing information to have the model self-review, evaluate, and grade its knowledge on a topic.
- This involves the model creating a list of evaluation criteria and expanding to related subjects, useful for brainstorming and idea generation (but not for normal evaluations).
OpenAI â· #api-discussions (29 messagesđ„):
Chain-of-Thought Prompting, Quality Translation, Model Performance, React component (Tailwind + shadcn/ui + Recharts + lucide)
- Chain-of-Thought Overkill?: A member suggests that asking for more chain of thought on top of that is overkill and statistically reduces the likelihood of good model performance on current âthinkingâ models.
- They propose using a specific prefix structure (My ultimate desired outcome is:âŠ) to guide the model instead of ambiguous instructions.
- Crafting Quality Translations: Discussion revolves around techniques for achieving quality translations with models.
- It is suggested to prime the model with context about the target audience, for example, Weâre translating this for a woman who grew up in Yugoslavia in the 1940s, she has a 3rd grade education, so we need to phrase this for her.
- Experimenting with Model Instructions: One member shares an experience where detailed instructions, such as do a 3 short bullet point as a chain of thought, can confuse the model.
- Another suggests directing the model away from unnecessary chain of thought when the primary goal is a quality output, like a well-written essay.
- Interactive Infographic for CoT Prompting: An interactive page has been developed in the canvas for Chain-of-Thought (CoT) prompting, featuring visibility toggles, a task selector, a thinking-time slider, and copy-ready prompt cards.
- The component is built using React, Tailwind, shadcn/ui, Recharts, and lucide, and includes features like dynamic recommended patterns and export options.
HuggingFace â· #general (135 messagesđ„đ„):
Duolingo deletion, LinkedIn posting strategies, HF Blog post on AI, HF Discuss forum issues, LAION-2B-en dataset reading
- Duolingo deemed Dodo, Deletion Debuts: One member deleted Duolingo, citing annoyance and inefficiency compared to immersing themselves in a local environment and leveraging AI for learning.
- Another member agreed, stating theyâd torch the bird alive, criticizing the addiction to streaks over fundamental learning premises.
- Unhinged LinkedIn Lunacy Lures Likes: One member shared a strategy of posting unhinged shit on LinkedIn to gain engagement, while another grinds Rocket League to brag about their rank.
- They then joked about writing a post called what my plat 3 rank rocket league friend taught me about business.
- AI Ethics Explorations Expressed, HF Blog Beckons: A member sought a platform to discuss AIâs potential detriment under current alignment protocols, despite its collaborative benefits.
- Another member suggested sharing the work on the HF blog or the ethics channel.
- Qwen Quagmire: Questionable Quantity of Questionable Quality: Users reported a flood of seemingly spam Qwen2.5 models on Hugging Face, all following the format Qwen2.5-0.5B-Instruct-randomword1-randomword2-randomword3.
- It was suggested these uploads are linked to Gensyn and could be an SEO technique or a way to impress stakeholders for funding.
- LAION-2B-en Learning Logistics Lamented: A member inquired about an efficient way to read the LAION-2B-en-research split at a large scale, encountering rate limits while training a large scale CLIP model.
- Suggested solutions included using WebDataset and creating a custom streaming system to download and uncompress shards incrementally, as detailed in the laion_2b.md
HuggingFace â· #today-im-learning (1 messages):
GPU, Monitor, Drivers, Windows, Linux
- GPU Blackout Blues: A member is experiencing a frustrating issue where their monitor goes black whenever the GPU is activated, affecting both Windows and Linux systems.
- They have attempted to correct the drivers numerous times (82392), and are currently forced to run the monitor off the motherboard.
- Driver Troubleshooterâs Lament: The user expresses significant frustration with GPU driver issues causing their monitor to go black on both Windows and Linux systems.
- Despite numerous attempts to correct the drivers, they are forced to run the monitor off their motherboard, indicating a persistent GPU-related problem.
HuggingFace â· #cool-finds (2 messages):
UIUC Finance Project, Trade Bench Insights
- UIUC students launch Trade Bench: Students from UIUC have launched a new finance project called Trade Bench.
- A member who shared the link admitted to not understanding much of it, and found it to be drab.
- Community Asked to Provide Insights on Trade Bench: The original poster requested insights from the community on the Trade Bench project.
- They hoped that someone with finance expertise would check it out and explain it.
HuggingFace â· #i-made-this (1 messages):
Vendor lock-in in AI Chatbots, AI Chatbot Supporting Multiple Providers, Marketing Tools for Small Studios and Solo Devs
- Chatbot Aims to Tackle Vendor Lock-In: A developer is building a chatbot to combat vendor lock-in and free tier limits experienced with platforms like ChatGPT, Anthropic, and Perplexity.
- The chatbot will support major AI providers like OpenAI, Anthropic, Groq, and DeepSeek, offering a free ad-supported tier and a paid ad-free tier with full access to all models and features.
- AI Chatbot integrates Marketing Tools: The developer is adding marketing tools to their AI chatbot to aid small studios and solo developers, with features including post and visual creation, content scheduling, and campaign management.
- Feedback is requested via short survey to guide the projectâs direction, with plans for more tools in the future.
HuggingFace â· #reading-group (2 messages):
Diffusion Models, Generative AI, ELBO-based models, VAEs, Variational Diffusion Models (VDMs)
- Diffusion Model Intro Paper Discussion Announced: A member announced a reading and discussion of the paper Understanding Diffusion Models: A Unified Perspective by Calvin Luo (https://arxiv.org/abs/2208.11970) to occur on Saturday at 12pm ET.
- The paper provides an overview of the evolution and unification of generative diffusion models, including ELBO-based models, VAEs, Variational Diffusion Models (VDMs), and Score-Based Generative Models (SGMs).
- Diffusion Model Paper Reading Group forming: A member created a beginner-friendly Diffusion Model Paper Reading Group, stating that no coding or ML background is needed, and linked to luma.com/1gif2ym1 for those who want to build a solid foundation.
- The group will be hosted online.
HuggingFace â· #core-announcements (1 messages):
Context-Parallelism, Diffusion Inference, Distributed Attention, Ring & Ulysses
- Context-Parallelism Speeds Up Diffusion Inference: Native support for context-parallelism is being shipped to help make diffusion inference faster on multiple GPUs.
- The CP API is made to work with two flavors of distributed attention: Ring & Ulysses as noted in this Tweet.
- Distributed Attention Flavors Debut: The new API supports two flavors of distributed attention: Ring and Ulysses, designed to enhance context-parallelism.
- These methods aim to optimize how attention mechanisms are distributed across multiple GPUs, facilitating faster and more efficient diffusion inference.
HuggingFace â· #computer-vision (2 messages):
Topological Data Analysis, Persistent Images, Loss Functions
- Seeking Loss Function Guidance for Topological Data Analysis: A member inquired about loss functions for topological data analysis (TDA) and persistent images, seeking guidance due to unfamiliarity with computer vision.
- They expressed interest in advice, but no specific suggestions were offered in the channel.
- Topological Data Analysis Guidance: A member is looking for guidance on Topological Data Analysis related to computer vision.
- The specific problem involves figuring out good loss functions.
HuggingFace â· #smol-course (30 messagesđ„):
Certificate issues and quiz completion, License and usage of the fine-tuning course, Smoltalk2 dataset size warning, HF Jobs permissions and authentication, Colab compatibility for the course
- Quiz Completion Unlocks Certificate: A user inquired about not receiving a certificate or pull request acceptance after submitting an assignment, and was informed that the Unit 1 Quiz must be completed to get the certificate.
- The user confirmed they passed the quiz with 100% after taking it.
- Apache License for Fine-Tuning Course: A user asked if the fine-tuning course is under the Apache license and if it can be implemented for a high school club as a learning group.
- The user also inquired about the required Python knowledge and whether the course will become inaccessible after 5 weeks.
- Smoltalk2 Dataset Size Warning: One user cautioned that the smoltalk2 dataset is quite large (around 90GB) and suggested being careful when downloading it locally unless there is sufficient space.
- The user also noted that Units 2-3 are available.
- Docs on HF Jobs: One user was having issues with
hf jobs uv run
and write permissions to the hub for their model, asking for help.- Another user shared the HF Jobs documentation, pointing out that the trainer needs to be authenticated and suggesting the use of generic scripts or copying the token handling.
- First Certificate: Quiz and Leaderboard: A user inquired about the requirements for the first certificate, specifically whether both the quiz and leaderboard submission are necessary, and if submission to the leaderboard is possible without using HF Jobs.
- They also asked about whether itâs possible to get it until the end of the course or if the deadline is sooner.
HuggingFace â· #agents-course (1 messages):
0xobito404: Hello from Thailand, starting the course rn
Moonshot AI (Kimi K-2) â· #announcements (1 messages):
OK Computer, Agent Mode, Multimedia Generation, Team-Level Polish, One-Link Deploy
- Kimi Launches OK Computer Agent Mode!: Moonshot AI launched OK Computer, a new agent mode designed to ship polished sites and apps in one go.
- Key features include personalized outputs, multimedia generation (text + audio + image), team-level polish, and one-link deployment.
- Outputs Personalized in Your Tone: OK Computer can generate personalized outputs such as slides, web/data apps, and mobile UIs in your own tone.
- See the official X post for more details.
- Multimedia Integrated in One Pass: The new Agent Mode supports multimedia generation, integrating text, audio, and image generation in a single pass.
- The goal is to deliver output that feels PM Ă Dev Ă Design out of the box.
- Deploy With Only One Link: Users can deploy and share their creations instantly with a single link.
- To try the new Agent Mode, visit the Kimi Official Website.
Moonshot AI (Kimi K-2) â· #general-chat (147 messagesđ„đ„):
Kimi Mini version, Moonshot team goals, Qwen model distillation, Kimi Computer agent, OpenAI compute
- Moonshot might skip Mini-Kimi Models: One member doubted that Moonshot would release a smaller version of Kimi, suggesting that a smaller Qwen model distilled on K2 is a better bet.
- Another member pointed out that Deepseek made Qwen distills because Qwen didnât have (good) reasoning until Qwen 2.5.
- OKComputer Draws Capitalists: Several members joked that the new Kimi Computer agent, particularly with its initial prompt âBuild a SaaS for content creators, aiming for $1M ARR,â, is designed to attract capitalists.
- One member called it âanother website generator with som weirdly scoped featuresâ.
- Kimi OKComputerâs Missing Download Button: Members reported initial issues with the OK Computer feature, including a missing download all button and a corrupted zip file.
- One member noted that âentering chat makes the OKC button disappearâ, but the button was later found in the right corner.
- Computer Use Has Higher Quota in Paid Subscriptions: The amount of OK Computer usage you get depends on whether or not you subscribe to moderato/vivace plans, giving you more quota, which are 20 OKC + 20 Researcher.
- The images generated with the image generation are neat because itâs using a non-moonshot tool for generation but the prompt must be great quality
- Kimi better Plans better than Qwen: Members discussed using Kimi to make plans for Qwen or DeepSeek to follow, noting that âKimi always makes better plansâ and it can cover a wider range of request.
- It was also pointed out that Qwen3-max constantly hallucinates and doesnât come close to kimi.
GPU MODE â· #general (15 messagesđ„):
Hopper TMA, Modal carrying code agent rollouts, MI300 support on Modal, Llama3.3 70B Prefill vs Decode
- Hopper TMA Kernel Quest Begins: A member is seeking a minimal matmul kernel implemented in raw CUDA that utilizes Hopperâs TMA (Tensor Memory Accelerator), without relying on Cutlass or Triton.
- Another member shared a CUDA for Fun blogpost as a potential resource.
- Modal Powers Remote Execution for Code Agents: A member noted that Modal is single-handedly enabling remote execution for code agent rollouts, after release of the new CWM paper from FAIR.
- Another member praised Modal for its fantastic distribution of cold/warm/hot start times relative to its cost, though its lack of MI300 support was noted.
- Llama3.3 70Bâs Prefill Slower Than Decode?: A member is comparing benchmark performances of Llama3.3 70B model using Nvidiaâs benchmarks.
- They noted that prefill-heavy workloads have lower throughput than decode-heavy workloads, despite expecting prefill to better exploit GPU compute capacity; seeking to understand why decode-heavy has higher throughput.
GPU MODE â· #triton (1 messages):
Triton pyproject.toml, uv add pip command
- Tritonâs Missing
[project]
Table Causes Hiccups: A user questioned why the[project]
table is missing frompyproject.toml
in the Triton project, running into an error when trying to use theuv add pip
command.- The tooling or project structure seems to require this table for dependency management, causing the error.
uv add pip
command fails without[project]
table: The commanduv add pip
threw an error because the[project]
table could not be found in Tritonâspyproject.toml
file.- This table is essential for managing dependencies within the project, and its absence disrupts the process.
GPU MODE â· #cuda (13 messagesđ„):
NCU profiling for SMEM bank conflicts, CUDA headers not being automatically included, WMMA kernel throwing unspecified launch failure, TMA minimum matmul kernel, Learning CUDA with limited hardware
- NCU Unveils SMEM Conflict Detection Secrets: Members discussed using NCU profiling to verify SMEM bank conflicts in kernels, with one member expressing surprise that it worked, as they thought nsight compute was gaslighting them.
- The conversation included a question about the meaning of the numbers wrapped in curly brackets in the NCU profile output.
- CUDA Headers Hide-and-Seek: A developer reported a problem where CUDA headers werenât being automatically included, causing functions like
cudaGraphicsGLRegisterImage
andtex2d
to be undefined when using Visual Studio 2022 and the latest CUDA toolkit.- Including
cuda_gl_interop.h
was mentioned as a workaround for the former issue.
- Including
- WMMA Kernel Launch Flounders: A user encountered an unspecified launch failure with a WMMA kernel and sought advice, sharing the kernel code.
- TMA Matmul Minima Malaise: A member was implementing a minimum matmul kernel using TMA and facing issues.
- It was suggested that the unspecified launch failure might be due to exceeding the maximum registers per SM, and using
cudaFuncSetAttribute
to increase SMEM usage.
- It was suggested that the unspecified launch failure might be due to exceeding the maximum registers per SM, and using
- CUDA Curriculum Quandaries for Cash-Strapped Coders: A user asked for the best way to learn CUDA quickly with limited hardware and free software, mentioning they have an Arduino, STM32, and a Jetson Nano.
- A user asked for help on the best place to learn fast.
GPU MODE â· #torch (16 messagesđ„):
torchrun API, HF transformers static cache, CUDA streams in HF transformers, GraphMend for PyTorch 2
- Torchrun Troubles Trigger Package Predicament: A user encountered issues with the
torchrun
API, finding thattorchrun --help
produced output different from the official documentation.- The issue was resolved by realizing that both
torch
andtorchrun
were inpyproject.toml
, and thattorchrun
is a separate package (torchrun on pypi).
- The issue was resolved by realizing that both
- Compile Conflicts Complicate Cuda Cache Conundrums: A user faced an issue using
torch.compile
with HF transformers static cache, encountering acuda streams
error when callingdecode_one_token
function.- The error was traced to a bug in
transformers/cache_utils.py
, wherecache.offloading
is a CUDA device.
- The error was traced to a bug in
- GraphMend Grasps Graph-Break Glitches in PyTorch: A member shared the paper GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 with links to the project page and GitHub repository.
- The paper introduces GraphMend, a compiler that eliminates FX graph breaks in PyTorch 2 programs by using code transformations to remove graph breaks due to dynamic control flow and Python I/O functions.
GPU MODE â· #cool-links (12 messagesđ„):
CUDA, Triton, PTX memory consistency, Formal languages, GPU programming
- GPUs get Formally Analyzed for Consistency: A paper on âA Formal Analysis of the NVIDIA PTX Memory Consistency Modelâ discusses proving that languages like CUDA and Triton can target PTX with memory consistency, despite PTX allowing for data races.
- The member, however, found it leaning too heavily on formal languages and math to be immediately useful.
- Compound Memory Models Blend Heterogeneous Consistency: A PLDI 2023 paper introduces Compound Memory Models for heterogeneous machines, where devices with distinct consistency models are fused.
- The compound model retains compiler mappings, allowing threads to adhere to the memory ordering rules of their original deviceâs memory model and thereâs a 15 min talk that is fairly approachable.
- Unified Analysis of GPU Consistency Bugs Discovered: A paper titled âTowards Unified Analysis of GPU Consistencyâ introduces Dat3M, a memory model aware verification tool, and discovered two bugs in the original PTX and Vulkan consistency models.
- The member quoted that interpreting them still requires a level of expertise that escapes most developers, and the current tool support is insufficient.
- Missing Fences Found in PTX Automated: A member highlighted the automated identification of missing fences in PTX shown in figure 12 of a referenced paper.
- They then suggested that it would be cool to see such checks at the NVVM IR layer, instead of PTX.
- rMEM Multicore CPU Tool Found Useful: A member mentioned the rMEM tool (and its github repo) as useful for multicore CPUs.
- Another member responded that itâs very expensive (computationally) to run.
GPU MODE â· #jobs (2 messages):
zml github
- GitHub Link Drop!: A member shared a link to the zml GitHub repository.
- Another member acknowledged familiarity with it, expecting discussion on a different topic.
- Snektron Anticipates Different Discussion: Following the sharing of the zml GitHub repository, Snektron indicated they expected a different subject.
- This suggests the link may have been shared out of context or that Snektron anticipated a more specific discussion related to the repository.
GPU MODE â· #beginner (8 messagesđ„):
Inter-warp and intra-warp ops in NVIDIA GPUs, Independent thread scheduling, Multi-CTA matmul, GPGPU architecture, PMPP reading group
- NVIDIAâs Warp Scheduling Quirk Query: A member inquired about inter-warp and intra-warp operation behaviors in NVIDIA GPUs, given independent thread scheduling.
- The confusion stems from scenarios like multi-CTA matmuls where SMs access each otherâs smem without guaranteed full warp execution due to thread scheduling.
- GPGPU Architecture Interest Sparked: A member expressed interest in GPGPU architecture, particularly its use in PINNs.
- They reminisced that GPU Mode was initially set up as a PMPP reading group.
- GPU Modeâs Origins: A member inquired whether GPU Mode was initially set up as a PMPP reading group.
- Another member confirmed that this was the original goal, though the group got a bit distracted over time.
GPU MODE â· #pmpp-book (1 messages):
PTX, Triton, NCCL, NCU profiling, PTX memory fencing
- PTX, Triton and NCCL Exploration Commences: A member began exploring PTX, Triton, and NCCL, gaining insights into the requirements for practical application in a job.
- They feel like they are finally starting to see what it takes to actually do this in a job.
- NCU Profiling & PTX Dominate Industry Blogs: According to a member, industry blogs emphasize NCU profiling skills, PTX memory fencing, and warp MMA topics.
- He contrasted that to the book he was reading, and he guessed the book gives an intern level knowledge.
GPU MODE â· #triton-puzzles (1 messages):
puzzle difficulty
- Puzzlers Probe Past Puzzle Performance: A member inquired about the time others spent on previous puzzles, seeking insight into the magnitude of the challenge.
- Difficulty Discussions: The member aimed to gauge the difficulty level by comparing experiences.
GPU MODE â· #rocm (3 messages):
pytorch rocm, NPU, iGPU
- PyTorch ROCm woes on Framework Desktop: A member asked if another member managed to run pytorch rocm successfully on the Framework Desktop, noting crashes with
torch.randn(1).cuda()
despite having a good setup and using Arch.- Another member responded that they also had issues on Ubuntu, even after following all tutorials exactly, suspecting that the iGPU might be special or weird.
- Focus shifts to NPU Hacking: One member stated they have been focused pretty much exclusively on the NPU.
- Another member followed up with a question of How do you hack on npu stuff?
GPU MODE â· #self-promotion (3 messages):
LLM Profit Margins, GPU Stand Design
- Embeddings are so Cheap, a Kernel Dive: A member shared a Substack article on profiling and investigating kernels to understand the profit margins of serving LLMs.
- Heavy-Duty GPU Stand Showcased: A member designed a heavy-duty GPU stand for a collection of old GPUs, including dual-slot models, noting it is more robust than existing designs on Thingiverse.
- They indicated they might share the design online later if thereâs interest.
GPU MODE â· #đż (6 messages):
Code generation, Two-stage approach, CWM paper citation
- Code Gen loses Natural Language: A member believes that code generation utilizes raw syntax, without constraints, which means you lose the natural language component of it.
- They suggested that humans typically donât code by focusing on the underlying grammar expected by the compiler, and you could train a model to do this.
- Code Gen Two Stage Approach?: Another member suggests a two-stage approach: pseudo-code generation, and then formal grammar translation.
- They didnât think of the impact on model performance with the added constraint and hence reduction of the degrees of freedom for code generation.
- GPU MODE cited in CWM Paper!: A member announced that we got cited in the CWM paper with a screenshot of the citation (link to discord attachment).
- No additional context or information was provided about the specifics of the citation.
GPU MODE â· #thunderkittens (17 messagesđ„):
H100 matmul kernel runtime error, nvshmem usage rationale, RDMA implementation, PyTorch support for rocm symmetric memory
- H100 Matmul Kernel Suffers Runtime Error: A user reported a runtime error with the H100 matmul kernel, specifically an âError in tile TMA descriptor creation: unspecified launch failureâ when running on Ubuntu 24.04, CUDA 12.8, PyTorch 2.7.0a0+nv25.03, and TensorRT 10.9; full logs are available here.
- Debate Erupts Over nvshmem Omission: Discussion arose regarding the absence of nvshmem in a blog post, with the author clarifying that the post focuses on intra-node communication, while inter-node communication will be covered in a forthcoming paper.
- An NVIDIA colleague pointed out that they provide support for multinode nvlink and will soon add caching, making symmetric tensors easier to use compared to TKParallelTensor.
- RDMA Implementations Stir Controversy: A member suggested that the rationale for implementing custom RDMA might stem from the built-in overheads of SHMEM libraries.
- They cited the DeepEP library (github.com/deepseek-ai/DeepEP) which modifies nvshmem internals for performance gains, and the trend among major players to develop their own GPU Direct Async implementations.
- ROCm Symmetric Memory Support in PyTorch: A user inquired about plans to add built-in PyTorch support for ROCm symmetric memory.
- Another user quipped that they typically wait for someone to complain before prioritizing it.
GPU MODE â· #submissions (24 messagesđ„):
MI300x8, amd-all2all leaderboard, amd-gemm-rs leaderboard
- MI300x8 scores improve on amd-all2all: A user achieved a personal best of 25.2 ms on MI300x8 in the
amd-all2all
leaderboard with submission ID43505
. - amd-all2all leaderboard sees blazing fast times: A user reached 1510 ”s on the
amd-all2all
leaderboard with submission ID43934
using MI300x8. - amd-gemm-rs leaderboard shows improvement: A user submitted a personal best of 741 ”s on MI300x8 to the
amd-gemm-rs
leaderboard, with submission ID44060
then subsequently improved to 598 ”s with submission ID44069
.
GPU MODE â· #hardware (4 messages):
Voltage Park H100 donation, Nebius Exclusive Sponsorship, Future Hackathon Event
- Voltage Park proposes H100 Donation: Voltage Park offered to donate H100s for an upcoming hackathon, expressing interest in supporting the event.
- A member thanked Voltage Park but explained that Nebius has an exclusive sponsorship deal for this hackathon.
- Nebius Secures Exclusive Sponsorship: Due to a deal with Nebius, they are the exclusive sponsors for the current hackathon.
- Despite this, a member expressed interest in discussing potential collaborations for future events with Voltage Park and proposed a private voice chat to explore options.
GPU MODE â· #factorio-learning-env (2 messages):
FLE Eval System Prompt, Agent0 System Prompt, PR Submission
- FLE System Prompt Incoming: A member delivered a system prompt for FLE eval via attached file: agent0_system_prompt.txt.
- This prompt is intended for use with the Agent0 system.
- Pending PR Submission: A member mentioned they would submit their PR the following day.
- They noted it was getting late, indicating the submission was near completion.
GPU MODE â· #amd-competition (11 messagesđ„):
gemm-rs optimizations, atomic operations, GPU rentals for debugging
- Gemm-rs Optimizations Prove Elusive: A member tested three basic variations of gemm-rs optimizations where bias is None, but they exhibited similar runtimes to the default submission, despite expectations of errors for configurations with bias.
- The poster attached a bias.txt file and noted that the PR is merged and almost ready for release with an example.
- Atomic Add API Questioned: A member inquired about the need for an atomic load/store API, similar to HIPâs
__hip_atomic_load/store
, or if the discussion was centered on atomic adds.- Another member clarified they were referring to atomic adds and are seeking smarter ways to handle heap pointers between ranks for less contention, while being advised to avoid atomics as much as possible.
- GPU Rentals Recommended for Debugging: A member inquired about recommendations for providers to rent GPUs for debugging purposes, without further context in the provided messages.
- The suggestion was made in the context of troubleshooting gemm-rs problems and potentially optimizing atomic operations.
GPU MODE â· #cutlass (4 messages):
TmemAllocator vs cute.arch.alloc_tmem, TMEM load/stores in cutedsl, SMEM -> TMEM copy, TMEM -> SMEM copy, Blackwell dense blockscaled GEMM example
- TmemAllocator Simplifies cute.arch.alloc_tmem:
TmemAllocator
offers utilities built aroundcute.arch.alloc_tmem
, which is lower level, helping reduce boilerplate code.- The discussion stemmed from a userâs inquiry about the difference between creating an instance of
TmemAllocator
and using allocate from there versus usingcute.arch.alloc_tmem
directly.
- The discussion stemmed from a userâs inquiry about the difference between creating an instance of
- SMEM to TMEM Tiled Copy: To copy SMEM to TMEM, use
cutlass.cute.nvgpu.tcgen05.make_s2t_copy(copy_atom, tmem_tensor)
followed bycute.copy()
, demonstrated in the Blackwell dense blockscaled GEMM example. - TMEM to SMEM Copy Operation: For copying TMEM to SMEM, use
tcgen05.make_tmem_copy(...)
.- A helper function is available to determine a performant copy operation, as detailed in blackwell_helpers.py.
GPU MODE â· #mojo (2 messages):
Metal GPU target, custom bitcode writer, mojo assembly
- Metal GPU Target Excites Mojo Community: A member expressed excitement about the recent Metal GPU target in Mojo and inquired about the availability of code for the custom bitcode writer.
- They noted the interest in targeting certain DSLs at Metal GPUs and wondered if any of the existing work could be reused.
- Emit Mojo Assembly via Command Flag: A member advised that you can emit mojo assembly via the
mojo -emit
flag if interested.- This provides a means to inspect and potentially reuse the generated assembly code.
GPU MODE â· #low-bit-training (1 messages):
Modern QAT Papers, FP8 Training, MXFP4/NVFP4
- Modern QAT Paper Search Initiated: A member inquired about papers on modern QAT, with BitNet being the main one that comes to mind, but noted it was linears only.
- The member is considering FP8 training with QAT to do MXFP4/NVFP4.
- FP8 and MXFP4/NVFP4 Explored for Training: The user is looking into the possibility of using FP8 training with QAT to implement MXFP4/NVFP4 quantization.
- The original message specified that previous approaches, such as BitNet, were limited to linear layers, suggesting a desire for broader applicability.
Yannick Kilcher â· #general (105 messagesđ„đ„):
Sinusoidal Positional Embeddings, Sine vs Cosine in Positional Encodings, Distillation performance estimates
- Sinusoidal Embeddings: Sine vs Cosine Debate: Members debated the necessity of using both sine and cosine in sinusoidal positional embeddings, with one suggesting that sine alone might suffice, sparking discussion around Fourier transforms and linear transformations. A towardsdatascience blogpost was linked to provide some context to the discussion.
- One member showed an experiment testing the max error on points which showed the max error was around 6e-12
- Positional Encoding with Sine: A Lone Wolf?: Members discussed that using only sine for positional encoding might be viable if the position range is non-negative, as it can be linearly transformed into sine + cosine, but this becomes problematic with negative values, potentially requiring extra layers for a better representation. GPT-5 coding experiments demonstrated linear regression can approximate sine + cosine embeddings well with sine alone in the interval [0, a].
- One member shared an image showing, if i create PE using only sin and then ran cosine similarity between two pairsand the score was differentbut if i do the same using pair of sin and cos then no matter the pos if the distance between them is same then the score is same.
- Paper Reading Recommendation: A member recommended a paper for reading: Synthesizing Programs for Images and Videos with Nearest Neighbor Examples.
- Guesstimating Distillation Improvement: A member asked about estimating the expected improvement from distillation before investing effort, seeking ways to predict the performance gains.
Yannick Kilcher â· #paper-discussion (5 messages):
Applied math, arxiv paper, preprint paper
- Arxiv Paper gets Peek: A member shared an Arxiv Paper for discussion.
- To improve the layout, they also shared a link to the same preprint.
- CS Background Surfaces: A member asked another if they had an applied math background.
- The other member responded that their background was in Computer Science, for both their bachelorâs and masterâs degrees.
Yannick Kilcher â· #ml-news (6 messages):
SWE-bench verified, AlphaEvolve, Sakana AI, Yann LeCun, RL TTS
- SWE-bench Verified Denounced: A user linked to a tweet where Alexandr Wang says that people still using SWE-bench verified is a good indicator of brain damage.
- AlphaEvolve: Sample Efficient?: In the same thread, someone mentioned that AlphaEvolve is much more sample efficient than other models.
- A user linked to Sakana AI Labs in response.
- Yann LeCun Tweeted: A user linked to Yann LeCunâs tweet.
- B200 Cloud Compute Spotted: A member noted that B200s are available for $0.94 USD on Prime Intellect.
- RL TTS Paper Discussed: A user mentioned an interesting paper and its potential efficiency gains from using a mid training technique involving a bootstrapping RL TTS.
- They noted the biggest gains were for the trace tracking benchmark.
Latent Space â· #ai-general-chat (49 messagesđ„):
Chrome DevTools MCP, Cursor CPU Usage, Meta Code World Model, Windsurf tab completion, ChatGPT Pulse
- Chrome DevTools MCP goes Public: Google announced the public preview of Chrome DevTools MCP, a new server that lets AI coding agents control and inspect a live Chrome browser through CDP/Puppeteer via this tweet.
- Cursorâs CPU Usage Concerns Users: Members reported insane CPU usage from Cursor, but were unsure if it was a VSCode/extension issue and another member confirmed they were not alone, attaching a screenshot showing high CPU usage.
- Meta unveils Code World Model: Meta announced their Code World Model in this tweet.
- Windsurf prioritizes Tab Completion: Windsurf is making tab completion a priority, using a mix of context engineering work plus custom model training, as part of a major evaluation before acquiring conew senpaipoast, Karpathy also comments on windsurf.
- OpenAI launches ChatGPT Pulse, LMAO: OpenAI launched ChatGPT Pulse, causing members to comment that oai cloned huxe and linked to the launch announcement.
Latent Space â· #genmedia-creative-ai (1 messages):
swyxio: https://x.com/1littlecoder/status/1970624850386661766
Eleuther â· #general (11 messagesđ„):
AI Psychology Project, Positional Embeddings, AI Future Predictions, Subtle Psychological Manipulation
- AI Psychology Project Gets Musical Intro: A member presented a new project at the intersection of AI and psychology, sharing a musical introduction based on a recently written paper.
- Another member responded that this research could develop into a framework for interpreting how prompt language affects model behavior.
- Prompt Language and Personality Traits Linked?: A member suggested seeding neutral prompts with established linguistic cues of personality traits and interpreting consequences for model performance, citing work on linking patterns in language use to personality traits.
- They noted that this could help assess to what extent personality shaping can be âsubtleâ while still meaningfully impacting model behavior, further informing prompt engineering practices.
- Positional Embeddings Decoded: A member asked whether positional embeddings in transformers use a matrix of sine and cosine pairs instead of a single pair because wave functions are periodic.
- Another member confirmed this intuition by explaining how hour number is enough in smaller contexts, but in larger context, one also needs the day, month, or year number to avoid ambiguity.
- AI Future will Sediment: A member inquired about thoughts on how the future would look like with respect to AI in the next 5 years.
- They believed that open-source models, ai agents, small language models, ai safety, and multi-modal will begin to sediment in the coming years, which prompted others to ask to define sediment (mature).
Eleuther â· #research (20 messagesđ„):
CFG on Style Transfer, Knowledge Graph Completion, Evolutionary Algorithms for Kids, Super-Bias: Mask-Aware Nonlinear Combiner, LoRAs and Super Bias Combiner
- CFG on Style Transfer Research Needed!: A member inquired about research on the effect of Context-Free Grammars (CFG) on style transfer, noting anecdotal evidence suggesting models lacking CFG perform worse.
- Another member argued that style transfer and closing knowledge gaps are distinct behaviors, so excelling at one doesnât guarantee success with the other; this was refuted by another member who linked a relevant Twitter thread.
- Knowledge Graph Completion as LLM Solution?: A member suggested formulating a solution from a knowledge graph completion perspective, where style transfer becomes a type of âshallowâ inference.
- They proposed that complexity could be measured by the relational depth from established information, but bridging this to practical LLMs is challenging.
- GPT-5 Can Guide Evolutionary Algo Research: A member sought recommendations for an âevolutionary algo for kidsâ paper/blog, or a survey paper on common patterns/techniques.
- Another member suggested learning the basics from GPT-5, and focusing on agenetic/LLM parts rather than classical papers, recommending the AlphaEvolve paper as a starting point.
- Super-Bias: The Mask-Aware Nonlinear Combiner: A member introduced Super-Bias, a mask-aware nonlinear combiner for ensemble learning, allowing expert addition/removal with combiner-only retraining, achieving similar accuracy to full retrains with significantly reduced cost.
- The method trains a small MLP on expert outputs plus binary masks (with dropout) and can potentially hit the same (or better) performance as âproperâ full fine-tuning or hard merges.
- LoRAs Combined Via Super Bias - Genius!: A member suggests treating different LoRAs (or LoRA+base combos) as âexperts,â and using Super Bias as the combiner.
- This approach would allow swapping LoRAs in/out without retraining the base model and retraining just the combiner in seconds to adjust for new LoRAs.
Eleuther â· #lm-thunderdome (7 messages):
GSM8k evaluation results, flexible vs strict matching, merged models issue, reproducibility of errors
- GSM8k Evaluation: Flexible Filter Fails: A member shared GSM8k evaluation results, showing that the flexible-extract filter performed worse than the strict-match filter, with exact_match scores of 0.3594 and 0.5742 respectively.
- Another member confirmed facing this issue, particularly with merged models.
- Debugging Merged Model Mishaps: A member requested samples to debug an issue, and the original reporter lamented not saving the problematic examples to reproduce the error.
- They committed to figuring out the situation that caused the errors to reproduce it.
Eleuther â· #multimodal-general (1 messages):
VLM, Mech Interp, Sparse Autoencoder (SAE)
- VLM and Mech Interp Unite!: A member proposed understanding Vision Language Models (VLMs) and Mechanistic Interpretability (Mech Interp) separately before integrating Mech Interp components into VLMs.
- The suggestion involves applying a Sparse Autoencoder (SAE) at each layer of the VLM to analyze layer attention.
- SAE Application in VLMs: The idea is to apply Sparse Autoencoders (SAEs) at each layer of the Vision Language Model to decipher what each layer is focusing on.
- This approach starts simple and then increases complexity.
Nous Research AI â· #general (26 messagesđ„):
Meta code generation with world models, Training AI with arXiv data, Granite 4 model, RMS_NORM and NORM implementations
- Meta Crafts Code-Writing CWM: Meta introduced CWM, an open-weights LLM for research on code generation with world models.
- A member mentioned having a similar idea involving training on python interpreter traces.
- Nous taps arXiv for training data: It was discussed whether Nous could train its AI using data from arXiv, mentioning they have an API to download any amount of papers.
- Teknium confirmed that itâs permissible, suggesting that it could be a viable option.
- Granite 4 model brewing full-attention: There is a possibility of a full attention Granite 4 model and 8 privated models.
- The models mentioned in the image are quite old, with Hermes 4 and 3 being the latest.
- RMS_NORM gets unified and supported by METAL: A pull request was made to unify the RMS_NORM and NORM implementations and extend support for more shapes in METAL.
- Itâs anticipated this will help quantized models work more closely with their transformer based counterparts.
Nous Research AI â· #research-papers (3 messages):
AlphaXiv, Paper frustrations
- AlphaXiv to the rescue: A member shared a paper link from AlphaXiv, seemingly bypassing a login wall.
- Login wall frustrations shared: A member expressed frustration about papers requiring login, refusing to log in just to see the paper link.
Nous Research AI â· #interesting-links (3 messages):
Manifest AI, Open Source Integration
- Manifest AI Hype Check: A member shared a link to Manifest AI asking is it as groundbreaking as they are making it seem?
- Another member suggested checking out their open-source integration on their Git, stating this is what a model said about integration with their oss on their git.
- Modelâs Opinion on Open Source Integration: A model expressed a favorable opinion regarding Manifest AIâs integration with their open-source components, sparking interest within the community.
- The discussion emphasized the importance of evaluating the actual implementation and capabilities rather than solely relying on marketing claims.
Nous Research AI â· #research-papers (3 messages):
AlphaXiv, Paper accessibility
- AlphaXiv saves the day: A member shared a link to AlphaXiv, a service which provides access to papers, in response to another memberâs general complaint.
- The complaining member appreciated the time saved from web searching, despite just expressing general frustration.
- Login Walls Frustrate Users: A member expressed frustration with papers requiring login to view, indicating a barrier to access.
- This highlights a common issue researchers face when trying to quickly assess the relevance of a paper.
DSPy â· #general (19 messagesđ„):
PDF Processing with LLMs, OCR vs VLM for Layout Understanding, Qwen for OCR, Gemini 2.5 Pro for PDF/Image Understanding, DSPy and Attachments for PDF processing
- LLMs Get Verbatim on PDFs: A user questioned the necessity of an initial LLM pass for processing PDFs to save text verbatim while preserving layout, even with Attachments, suggesting it might be needed to understand layouts and images better than OCR alone.
- Others suggested straight PDF OCR with Chain of Thought (CoT) for cleaner output, or using a model with DSPy for OCR like Qwen, but acknowledged the need for VLM due to layout complexity.
- Gemini 2.5 Pro Understands Layouts: Gemini 2.5 Flash seems to be pretty good for understanding layouts; the Pro version might be even better for identifying sections/columns and doing verbatim extraction, although the userâs PDFs have tricky formatting like flattened images and blurry text.
- A user pointed to a paper on directly using Gemini for this purpose, found at arxiv.org/abs/2509.17567.
- DSPy with Attachments Processes PDFs: A user struggling to use DSPy for the first pass in PDF processing found a working example with Attachments at github.com/maximerivest/Attachments.
- The user had previously encountered 429 errors, but the issue was resolved, enabling them to proceed with using DSPy.
- Boston Hosts DSPy Event: A member promoted a DSPy event in Boston on October 15th, inviting community members to attend and help spread the word.
- Another user then replied hoping that the event would come to Seattle sometime soon.
DSPy â· #colbert (1 messages):
Context Length Limitations, Repeating CLS Token, Performance Issues
- Longer Context Fails: CLS Repeat Not Helping: A user reported that longer context doesnât perform well, and repeating the CLS token did not resolve the problem.
- The user suspects either a limitation with the method or an implementation error on their side.
- Context Length: Limitations Discussed: The discussion highlights existing limitations when dealing with extended context lengths in models.
- Repeating the CLS token was attempted as a possible workaround, but it did not yield the desired improvements in performance.
aider (Paul Gauthier) â· #questions-and-tips (8 messagesđ„):
Aider context clearing, Internet access for Aider, LLM benchmarks for coding, Polyglot benchmark rerun
- Aiderâs /clear Command Clarified: The
/clear
command in aider only clears the chat history, while added files remain in the context, users can use/context
to see token usage.- One user initially thought
/clear
removed all context in the session, but this clarification resolved their confusion.
- One user initially thought
- Craving Web Access in Aider: A user inquired about giving aider access to Internet search, but currently, this isnât available in the main branch.
- You can scrape a website using the
/web https://www.example.com/
command.
- You can scrape a website using the
- LLM Coding Prowess: How to Stay Current: A user asked how others keep updated on which LLM is best for coding and cost.
- The consensus is that most users keep updated by simply trying the LLMs themselves.
- Polyglot Benchmark: Error Output Re-runs: A user asked if itâs possible to re-run the polyglot benchmark only for the tests with
error_outputs
.- This request stems from previous runs where the LLM server crashed, causing failures.
Manus.im Discord â· #general (7 messages):
Manus PDF download issues, Beta Pro Access
- Manus Stalls Downloading PDFs?: A member reported that Manus got stuck downloading a PDF while researching accounts, even after manually downloading it and providing a link.
- The user expressed frustration that Manus kept asking to upload the file despite it being a PDF already on the desktop.
- Beta Pro Access Questioned: A member asked how to get beta pro.
- The discussion included attached images, though they donât provide any context on how to acquire Beta Pro Access.
MCP Contributors (Official) â· #general-wg (4 messages):
ModelContextProtocol issues, ReadResourceResult contents array, Web Resource html
- ModelContextProtocolâs Content Array Dilemma: Offline discussions have pinpointed that
ReadResourceResult.contents
in the ModelContextProtocol is an array, but its intended use and semantics are undocumented.- Questions arise on whether this array is meant for scenarios like folders with multiple files or for returning the same content in different formats.
- Web Resources with HTML and Images: A member suggested that this feature is useful for Web Resources consisting of HTML and associated images.
- They added it is also useful in situations where tokenizable/renderable MIME types havenât been negotiated.
- Implicit Content Retrieval: A member inquired if
resources/read("uri": ".../index.html")
would implicitly returnstyle.css
andlogo.png
in the list of contents.- This question highlights the potential for automatically including related resources when fetching a primary resource.
tinygrad (George Hotz) â· #general (1 messages):
Python Bindings, Direct Pip Installation
- Python Bindings Coming Soon: A member is creating python bindings for the project.
- The goal is to allow direct installation with pip using a single command.
- Direct Pip Install Aim: The goal is to have a direct pip installation method.
- This would allow users to install the project with a single command.
MLOps @Chipro â· #events (1 messages):
Diffusion Models, Generative Models, Paper Reading Group
- Diffusion Model Paper Reading Group forming!: A new Diffusion Model Paper Reading Group will be discussing the Understanding Diffusion Models: A Unified Perspective paper this Saturday at 12pm ET.
- The paper gives an overview of the evolution and unification of generative diffusion models like VAEs, VDMs, and SGMs.
- Beginner-Friendly GenAI Discussion: The paper reading group is beginner-friendly, requiring only curiosity and a love for GenAI.
- It aims to build a solid foundation in diffusion models without needing coding or ML background, join at luma.com/1gif2ym1.