a quiet day
AI News for 9/24/2025-9/25/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (194 channels, and 2885 messages) for you. Estimated reading time saved (at 200wpm): 230 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
You can catch Day 2 of AIE Paris here, where tickets for AIE Europe 2026 were announced. You should also apply for Wave 2 of AIE CODE in NYC in November â itâll be a big one.
AI Twitter Recap
Alibabaâs Qwen3 push: Max, VL, Coder and a $52B roadmap
- Qwen3-Max, Qwen3-VL, and shipping velocity: Alibaba/Tongyi unveiled a sweep of models: flagship Qwen3-Max (now default in Anycoder) and open-sourced Qwen3-VL with a native 256K context (expandable to 1M), stronger OCR in 32 languages, precise event localization in 2h videos, GUI operation/coding, and leading risk detection. Releases hit Hugging Face, ModelScope, GitHub, and Alibaba Cloudâs Model Studio; community platforms onboarded quickly (e.g., Yupp added Qwen3 Max and Qwen3 VL 235B A22B Instruct/Thinking; LMArena added three Qwen3 models). Alibaba touted unmatched shipping velocity (~3.5 releases/month, many open weights) and a multi-year infrastructure roadmap discussed at Yunqi, with commentary noting a â$52B war chestâ and major compute scale-up claims. See announcements and threads: @huybery, @huybery on Qwen3-VL, @Ali_TongyiLab (VL release), Anycoder defaults, Yupp adds Qwen models, LMArena adds Qwen3, shipping velocity, Yunqi recap, exec clips/roadmap.
- Qwen3-Coder-Plus and API improvements: The coding line got targeted upgrades (terminal tasking, scaffold adaptation; API fixes), with early competitive signals in WebDev Arena and agent toolchains. Details: API update, WebDev Arena prompt.
Coding models and agents: GPT-5 Codex lands; Metaâs 32B CWM
- GPT-5 Codex (agent-optimized) is live: OpenAIâs âCodexâ variant is in the API and agent tools. Highlights: up to 400K context, âadaptive reasoningâ with variable thinking that uses far fewer tokens on simple tasks and more on complex ones, and pricing around $1.25/$10 per million tokens. Itâs integrated in Cline (with a âthinking sliderâ), and being benchmarked in webdev arenas and agent workflows. Links: API availability, Cline integration, Cline details, WebDev Arena. Field reports compare throughput vs Sonnet/GPT-5 on long-context and agent runtimes: example, long-context retrieval comparison.
- Meta FAIRâs Code World Model (CWM) 32B (research): Meta released an open-weight 32B dense model under a research license that frames code generation as planning with a world model of code execution. Reported pass@1: 65.8% SWE-bench Verified, 68.6% LiveCodeBench, 96.6% Math-500, 76.0% AIME 2024. Technical report, weights, and code are public, with a safety preparedness report from SEAL/AI Security. Links: @AIatMeta, @ylecun, metrics summary, safety prep.
- Ecosystem updates: GitHub Copilotâs new embedding model and training writeup (for faster, more accurate code search) blog link; Jules agent now acts on PR feedback link; Claude Sonnet 4 and Opus 4.1 are now in Microsoft 365 Copilot Anthropic.
Systems and infra: vLLM DCP, multimodal data plumbing, and platform moves
- vLLM 0.10.2 adds Decode Context Parallel (DCP): Contributed by Kimi/Moonshot, DCP shards KV cache across GPUs to cut duplication, enabling up to 8Ă larger KV and 2â3Ă throughput on single-node H200âespecially helpful for KV-heavy workloads (RL, offline data generation). Quickstart:
vllm serve deepseek-ai/DeepSeek-V3.1-Terminus -tp 8 -dcp 8
. Links: @vllm_project, day-0 guides. - Multimodal infra from Perceptron: The team shared the design behind TensorStreamâa tensor-like abstraction for interleaved multimodal data powering their training/inference codeâand released technical details for Isaac 0.1, a small VLM emphasizing a simple training recipe and robust grounding. Good discussion on âcomplexity budgetsâ and native multimodal abstractions: design post, Isaac report, commentary, abstractions +1.
- MCP builders and compliance: Figmaâs MCP server lands in VS Code (and is usable in OpenHands) for âdesign-to-codeâ flows VS Code, OpenHands; Weaviate gets ISO 27001 link; AMD expands partnership with Cohere (models on AMD Instinct, sovereign AI posture) AMD; Modular raises $250M to push its unified AI infra platform Modular.
Video and multimodal generation: Alibaba Wan2.5, Runway A2D, NVIDIA Lyra, Kling 2.5
- Alibaba Wan2.5-Preview (native multimodality): New architecture aligns text, image, video, and audio natively with joint multimodal training and RLHF; supports controllable inputs (text/img/audio), synchronized multi-speaker A/V, 1080p 10s cinematic video, and stronger image gen/editing (typography, charts, pixel-level edits). Announcement.
- Runway A2D: autoregressive-to-diffusion VLM: Adapts existing AR VLMs for parallel diffusion decoding to unlock speedâquality trade-offs without training from scratch; dev preview from internship work shows practical path to diffusion LMs for vision-language. @runwayml, author thread.
- NVIDIA Lyra (3D/4D scene reconstruction): Feed-forward 3D and 4D scene generation from a single image/video via video diffusion self-distillation; weights on HF. Overview, model.
- Kling 2.5 Turbo: Internal blind tests show significant wins over Seedance/Veo variants across text-to-video and image-to-video; community reels and contests rolling out. Results, contest.
Reasoning, RL, and evaluation science
- RLPT (RL on Pre-Training Data): Trains with self-supervised rewards via next-segment reasoning (ASR+MSR) directly on pretraining corporaâno human labels. On Qwen3â4B, reported gains: +3.0 MMLU, +8.1 GPQAâDiamond, +6.6 AIME24, +5.3 AIME25. Paper: tweet, arXiv.
- APRIL (Active Partial Rollouts in RL): Cuts rollout long-tail inefficiency; up to 44% throughput and 8% final-accuracy improvements across GRPO/DAPO/GSPO. tweet, code/paper.
- âSoft Tokens, Hard Truthsâ: First scalable RL for continuous CoT; soft-token training matches discrete pass@1 and outperforms at pass@32 by boosting diversity; best practice: train soft, infer hard. tweet, arXiv.
- Effective reasoning â longer CoTs: Across 10 LRMs, longer chains and review can correlate with lower accuracy. New metric âFailed-Step Fractionâ predicts correctness; FSF-based reranking lifts pass@1 up to +10%. tweet, arXiv.
- Medical multimodal brittleness: Stress tests show frontier models often guess correctly without images, flip under trivial prompt changes, and fabricate convincing but flawed reasoningâleaderboards mask fragility. tweet, arXiv.
- Related: Googleâs Test-Time Diffusion Deep Researcher (TTD-DR) applies diffusion-style iterative refinement to long-form research, reporting up to 74.5% win-rates vs OpenAI Deep Research on certain tasks with better qualityâlatency tradeoffs. overview.
Top tweets (by engagement)
- Alibabaâs Wan2.5-Preview: native multimodal A/V generation and editing â 1453
- Qwen3âVL open-sourced: 256Kâ1M context, 32âlang OCR, precise video event localization â 1410.5
- Sam Altman on datacenter buildout progress in Abilene â 9917
- Semiconductor node names (â3nmâ, â2nmâ) as marketing shorthand, not literal dimensions â 9032.5
- Claude Sonnet 4 and Opus 4.1 arrive in Microsoft 365 Copilot â 1265
- Gemini app hits 5B images in <1 month â 1183
- GPTâ5 can solve âminorâ open math problems; early evidence and preprint â 952
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. MiniModel-200M and DeepSeek-V3.1-Terminus Local Release Benchmarks
- MiniModel-200M-Base (Score: 223, Comments: 35): MiniModel-200M-Base is a ~200M-parameter LLM trained from scratch on
10B
tokens in ~110k
steps (~1 day) on a single RTX 5090, with no gradient accumulation, achieving an effective batch of64Ă2048
and peak VRAM<30 GB
. Efficiency is attributed to an Adaptive Muon optimizer (claimed~2.1Ă
data-efficiency vs AdamW), Float8 pretraining (attention in bf16) for~30%
lower VRAM and~20%
higher throughput, ReLUÂČ (Primer), bin-packing to reduce padding from>70%
to<5%
, and full attention with scalar-free QK-norm for stability. Early capability demos include deterministic Fibonacci codegen and recalling the first 20+ digits of Ï; Apache-2.0 weights/config/tokenizer are released: Hugging Face. Top commenters are primarily asking for release of the training code/scripts and more detail on the data mixture; interest centers on reproducibility of the setup.- A commenter questions the emphasis on âno gradient accumulation,â arguing it should be mathematically equivalent to a larger effective batch. They note practical caveats where it can diverge: optimizer step-count coupling (e.g., AdamW bias correction, per-step weight decay), LR schedules tied to steps vs tokens, gradient clipping across micro-batches, and stochastic elements (dropout RNG, data order). Theyâre effectively asking for the concrete rationale or benefits (e.g., throughput/activation memory trade-offs, benchmarking fairness) behind avoiding GA in this training run.
- Multiple requests ask for release of training code and scripts to enable reproducibility. The implied need is for end-to-end pipelines (data loader, tokenizer, optimizer/scheduler configs, logging/checkpointing) and exact seeds, so others can replicate results on a 200M-parameter setup and compare against baselines.
- Interest in the data mixture details: commenters want the composition and mixing strategy (domain ratios like code/math/dialogue, up/down-weighting, dedup/filtering, and total pretraining tokens). Given small modelsâ sensitivity to data curation, theyâre asking for the precise recipe to understand why MiniModel-200M-Base performs as reported.
- You can now run DeepSeek-V3.1-Terminus on your local device! (Score: 163, Comments: 29): Unsloth released Dynamic GGUF quantizations of DeepSeekâV3.1 Terminus enabling local inference on ~170 GB RAM (and a ~162 GB Ollama-ready build) by per-layer âsmartâ 1âbit quantization, shrinking the original ~715 GB model by ~80%. Their Dynamic 3âbit DeepSeekâV3.1 (thinking) GGUF scores
75.6%
on the Aider Polyglot benchmarkâreported as surpassing Claudeâ4âOpus (thinking)âwith runnable builds via llama.cpp and an example Ollama taghf.co/unsloth/DeepSeek-V3.1-Terminus-GGUF:TQ1_0
; resources: blogpost, HF repo, guide. The image appears to be benchmark charts illustrating Dynamic GGUF performance versus baselines and proprietary models. Top comments question practicality for home usersâasking if similar methods could compress 70Bâ200B models for 16â24 GB VRAM GPUsâwhile others note the high VRAM/RAM requirement and offer praise.- A key question is whether the same approach can make
70B
or100â200B
models run on16â24 GB
consumer GPUs. This implies extreme quantization/offloading to fit VRAM, and home-user practical utility hinges on this. - One commenter cites a memory footprint drop from
715 GB
to170 GB
alongside âsolid tool-callingâ. They want head-to-heads against GLM-4.5 and Qwen, suggesting evaluation on tool-use/agentic benchmarks to validate quality vs compression. - Even with reductions, practical deployment may still demand on the order of
~100 GB
VRAM (âNow to find another ~100 gb of vramâ). This would exceed typical16â24 GB
gaming GPUs, underscoring remaining hardware barriers for local use.
- A key question is whether the same approach can make
2. DIY Local AI Hardware: RTX 3080 20GB Mods and Ryzen AI MAX+ 395
- My second modified 3080 20GB from China , for local Ai inference , video and image generation.. (Score: 219, Comments: 101): OP showcases a China-modded GeForce RTX 3080 upgraded to 20 GB VRAM (likely 10Ă16Gb GDDR6X on the 320âbit bus) for local AI inference/video/image workloads, opting for a tripleâfan cooler over a blower for acoustics. The 2.5âslot card reportedly holds <
75°C
under ~300W
stress, suggesting adequate thermal headroom versus blower variants; otherwise it behaves like a standard RTX 3080. Commenters probe value vs a RTX 3090 (more cores,24 GB
VRAM) and ask about price and driver/vBIOS compatibility for the 20 GB mod. Thereâs curiosity about a hypothetical30 GB
3080 using3 GB
GDDR6X chips; feasibility is unclear due to GA102 memoryâcontroller/board routing support for 24Gb densities (see GDDR6X).- Value/perf trade-off vs RTX 3090: a 3080 20GB mod still has the 320âbit bus (
~760 GB/s
) and fewer SMs than a 3090âs 384âbit bus (~936 GB/s
), so for AI/image workloads that are both bandwidth- and VRAM-sensitive, the 3090âs24GB
and wider bus can be materially faster and allow larger batch sizes/checkpoints. Given used 3090 pricing often hovers around the$500
mark, commenters argue a$500
3080â20GB is hard to justify unless priced closer to$350
âotherwise a 3090 (or upcoming 24GB nextâgen options) is a better buy. Specs refs: RTX 3080, RTX 3090. - Feasibility of a 30GB 3080 using 3GB (24Gb) GDDR6X: in theory, 10Ă
24Gb
chips would yield30GB
on the 320âbit GA102, but it hinges on GA102âs memory controller/BIOS supporting 24Gb densities and proper timing strapsâno retail GA102 board shipped with 24Gb devices, so compatibility is unproven. Even if recognized by VBIOS, stability/thermals and memory training could be problematic without AIB-level firmware support. Micron has sampled24Gb
GDDR6X dies, which makes the capacity plausible on paper: Micron 24Gb GDDR6X. - Driver/VBIOS considerations for 20GB mods: NVIDIA drivers enumerate VRAM from the VBIOS; as long as the device ID matches and the BIOS includes correct memory straps for the installed GDDR6X density, stock drivers generally work. Many China-market 20GB boards ship with custom VBIOS that properly reports
20GB
; flashing mismatched BIOS can cause instability or bricking, and Ampere BIOS editing is limited, so sourcing a vendor-matched 20GB VBIOS is key. Reference: TechPowerUp VBIOS collection.
- Value/perf trade-off vs RTX 3090: a 3080 20GB mod still has the 320âbit bus (
- The Ryzen AI MAX+ 395 is a true unicorn (In a good way) (Score: 218, Comments: 205): OP evaluates the cost/perf of the 128âŻGB Framework Desktop Mainboard (AMD Ryzen AI Max 300 series) for local AI inference versus a DIY desktop with similar specs. A comparable DIY parts list (seeking 4âchannel DDR5 â„8000âŻMT/s) tallies to ~
$2240
: consumer 4âchannel DDR5 motherboard>$600
, CPU âequivalentâ to the 395 MAX+ via the Ryzen 9950X3D~$660
+ Noctua NHâD15~$130
, 128âŻGB DDR5â8000 (4Ă24âŻGB)~$450
, and a dGPU âsimilarâ to the boardâs iGPU (RTX 4060/4060âŻTi 16âŻGB)~$400
. OP argues the Framework boardâs unified memory avoids PCIe bandwidth/latency penalties when the GPU accesses large model weights, and the discrete build would draw âł2Ă the power (more heat/noise; cf. roomâheating post). They add that Apple M4 Pro/Max have higher bandwidth but poorer diffusion throughput at ~2Ă the cost for similar RAM/GPU, while truly higherâthroughput Nvidia setups (e.g., 4ĂâŻRTXâŻ3090) are far more expensive and powerâhungry; edit: the cited 9955HX3D doesnât support 4âchannel memoryâThreadripper would, but with slower memory speeds. Top replies request concrete benchmarks (ânumbersâ) and suggest a potential stepâfunction if AMD ships 256âŻGB unified memory. One commenter recommends an RTXâŻ5080 within the same budget for diffusion workloads (VRAM > system RAM), while agreeing that for LLMs, larger unified memory (128âŻGB+) is advantageous for bigger contexts and model footprints.- Workload fit and memory-vs-throughput tradeoff: commenters note that for diffusion/vision workloads an RTX 5080âclass GPU will outperform at similar price points, and you donât need
128GB
RAM for images/video. For LLMs, larger system/unified memory is more valuable (fits bigger models/contexts), aligning with the âtruck (capacity) vs sports car (throughput)â analogy; a hypothetical256GB
unified memory SKU is seen as market-shifting for LLM use cases. - Bandwidth bottleneck concern: one user flags â<
256 Gb/s
memory bandwidth,â implying large-context capability but slow inference, especially in prefill where LLMs are memory-bandwidth bound. Unified memory helps host bigger contexts, but limited bandwidth can throttle tokens/sec during prefill, so the device may feel responsive only in generation once KV/cache is warm. - Anecdotal perf comparison vs high-end GPU: a user with a RTX 5090 +
96GB
RAM (â+$1k vs Ryzen AI Max) reports ongpt-oss-120B
that token generation (TG) speed is roughly similar, but prefill (PP) is4â15Ă
faster on the 5090. Takeaway: for local LLMs dominated by prefill, the Ryzen box may underperform compared to top-tier GPUs despite comparable TG throughput.
- Workload fit and memory-vs-throughput tradeoff: commenters note that for diffusion/vision workloads an RTX 5080âclass GPU will outperform at similar price points, and you donât need
3. LLM Performance Growth Claims and Hype Reactions
- Large Language Model Performance Doubles Every 7 Months (Score: 152, Comments: 57): Post asserts an empirical âAI Mooreâs Lawâ where large language model capability doubles about every
~7
months, illustrated by a progress chart (image) and framed as sustained exponential gains in benchmark performance. The claim echoes prior explainers on accelerated AI progress, e.g., Computerphileâs overview of an AI analogue to Mooreâs Law (video); the post itself does not detail methodology or which benchmarks were aggregated. Commenters highlight that costs are falling alongside quality (token/model pricing dropping), crediting open-source competition for price pressure; others argue the observation is not new, pointing to earlier coverage like the Computerphile video.- Methodology critique of the chart: It appears to convert LLM capability into âhuman time to complete taskâ and uses a
50%
success threshold per task, which is highly subjective and task-dependent. Examples raised: âfind a fact on the webâ can range from seconds to days depending on specificity; âoptimize code for a custom chipâ isnât well-defined and could span hours to months; and âstart a new companyâ at167h
isnât a meaningful, measurable unit. Without standardized benchmarks and precise task specs, a claim like âdoubling every 7 monthsâ risks cherry-picking and misrepresenting true progress. - Cost/performance dynamics: Commenters note capability gains alongside falling inference costs, with open models intensifying price competition. Practitioners still rely on 2024â2025 open models like Mistral, Llama 3.1, and Qwen 2.5 Coder, implying perceived improvements are task- and deployment-dependent; cost/perf trade-offs (e.g., local inference vs API), stability, and tooling can outweigh headline âdoublingâ metrics. Reporting both capability and $/token or $/task would better capture real-world value.
- Prior art on scaling: The linked Computerphile video, AIâs Version of Mooreâs Law? (https://www.youtube.com/watch?v=evSFeqTZdqs&t=1s), reviews LLM scaling trends and distinguishes hardware-driven FLOPs/$ gains from algorithmic efficiency improvements that together create apparent capability doubling. It frames progress as arising from larger models, better training data/recipes, and inference optimizations, cautioning against treating a single âdoubling periodâ as universal across tasks.
- Methodology critique of the chart: It appears to convert LLM capability into âhuman time to complete taskâ and uses a
- Oh my God, what a monster is this? (Score: 590, Comments: 124): The image (chart) appears to be a benchmark leaderboard where multiple LLMs reach near- or exactly
100
on a task, suggesting a saturated/ceilinged evaluation that can no longer differentiate top-tier models. Commenters note that Chinese frontier models are at or near the top of the chart, implying performance parity with leading Western models. Notable takes: âIf models score 100 then itâs a useless benchmark,â arguing the metric has lost discriminative power; others highlight that Chinese models have reached the frontier, while one criticizes the portrait-mode screenshot of a square chart for poor readability.- Benchmark saturation concern: if models hit
100
, it indicates a ceiling effect and weak discriminative power. This raises risks of overfitting/test contamination and pushes the community toward harder or adversarial suites like MMLU-Pro and GPQA, and robustness/long-context evals, rather than relying on classic MMLU, GSM8K, or HumanEval alone. See MMLU paper, MMLU-Pro paper, GPQA paper. - Multiple commenters note the showcased Qwen result is not âlocal,â which matters because API-hosted models can differ from downloadable weights and local performance after quantization. On-device constraints (VRAM, throughput) and quantization (e.g.,
Q4_K_M
) typically cost~1â5
points on reasoning/code benchmarks and change latency; e.g., running a7B
at Q4 needs ~5â6 GB
VRAM,14B
~9â10 GB
,32B
~20â24 GB
(llama.cpp quantization). - The claim that Chinese models have reached frontier levels aligns with recent reports: Qwen2.5, DeepSeekâV2, and Yi series publish competitive MMLU/GSM8K/MTâBench and coding scores versus established frontier models. See Qwen2.5 blog, DeepSeekâV2 paper, and Yi models on Hugging Face (Yiâ34B); exact ranking depends on eval setup (prompting, CoT, decoding) and whether tests are contaminationâcontrolled.
- Benchmark saturation concern: if models hit
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Qwen Image Edit 2509 Release Benchmarks and Workflows
- Quick comparison between original Qwen Image Edit and new 2509 release (Score: 580, Comments: 74): Side-by-side test of the original Qwen Image Edit vs the new â2509â build, both quantized as
Q5_K_M
GGUF and run in default ComfyUI, with the 2509 model requiring the âQwenImageEditPlusâ text encoder for correct operation. Using first-sample outputs (no LoRAs), the 2509 release is notably more consistent in preserving source style and composition; remaining issues include slight whole-body scale shifts during expression edits and loss of the blue tint on glasses (the original sometimes loses the glasses entirely). The updated text encoder also provides an observed ~5â10%
speedup. Sample image. Comments largely corroborate improved consistency and perceived quality in the 2509 build; no substantial counterpoints were raised.- Multiple users report noticeable quality improvements in the Qwen Image Edit 2509 release over the original, with one sharing an example edit (âitâs actually good nowâŠâ) suggesting more reliable prompt adherence and cleaner outputs. Example image: https://preview.redd.it/6vbfk01cs1rf1.png?width=1030&format=png&auto=webp&s=e8c0ff1dac9266fbb30d4b27c82c6cdc14445344
- A technical clarification is requested on the ânew text encoderâ: whether this implies a swap to a different encoder model (e.g., a changed CLIP/ViT variant impacting tokenization/conditioning) versus merely updating the encoder node in the pipeline/graph. This distinction affects reproducibility, compatibility with existing workflows, and potential changes in prompt-conditioning behavior.
- QWEN IMAGE Gen as single source image to a dynamic Widescreen Video Concept (WAN 2.2 FLF), minor edits with new (QWEN EDIT 2509). (Score: 304, Comments: 49): Creator showcases a ComfyUI pipeline that turns a single Qwen-generated image into a dynamic widescreen video using the âWAN 2.2 FLFâ workflow, with minor passes via âQWEN 2509 EDIT.â Assets and reproducibility are emphasized: a custom LoRA is provided on CivitAI (link), full workflows for Qwen Image (pastebin), WAN 2.2 FLF (pastebin), and QWEN 2509 EDIT (pastebin), plus a ZIP archive containing all video parts/alternates, image parts, MP3 music, .pdn edit files, and prompts for every stage (Drive). A mirror is on X (post), and the original Qwen image/prompt (dark-fantasy anime style with explicit composition/wardrobe constraints) is shared (preview). Top comments highlight the single-image-to-video experiment and that all steps were executed in ComfyUI; one commenter asks about required hardware specs (no config provided in-thread).
- OP outlines a ComfyUI-only pipeline that animates a single Qwen Image still into a dynamic widescreen video via WAN
2.2
FLF, with minor revisions using QWEN2509
EDIT. They provide full reproducibility: a LoRA (civitai.com/models/1955327), all Comfy workflows (Qwen Image WF, WAN 2.2 FLF WF, QWEN 2509 EDIT WF), and a ZIP containing all video parts/alternatives, source images, .pdn edits, prompts for every stage, and an AI-generated MP3 track (Google Drive). They specifically note solving text-related challenges (text effects, transition effects, and text clarity) directly within Comfy. - The seed image prompt tightly constrains style and compositionââDark fantasy anime,â exaggerated body proportions, blue silk dress with triangle-cut motifs, red textured stockings, and a triangle-branded phoneâhelping maintain feature consistency when expanding motion from a single still. The original still used to drive the video is shared for reference (preview), suggesting the workflow relies on strong prompt-locked anchors to preserve identity and scene elements across frames.
- OP outlines a ComfyUI-only pipeline that animates a single Qwen Image still into a dynamic widescreen video via WAN
2. AI in Games: Among Us Deception Benchmark and Veo-3 Game Video
- Researchers made AIs play Among Us to test their skills at deception, persuasion, and theory of mind. GPT-5 won. (Score: 416, Comments: 61): A report from 4wallai (âAmong AIsâ) claims to benchmark LLMsâ deception, persuasion, and theory-of-mind by having agents play Among Usâstyle socialâdeduction games (report). The shared graphic appears to show a leaderboard where âGPTâ5â ranks first and Anthropicâs Claude Sonnet second; beyond rankings, methodological specifics (e.g., match counts, roleâbalanced win rates, meeting/vote influence metrics, or toolâuse interfaces) are not detailed in the post, and some model coverage (e.g., Grok) seems absent. Commenters praise the idea as a creative benchmark, question Sonnetâs placement humorously, ask why Grok isnât included, and request clearer, nonâslang terminology in the writeâup for broader accessibility.
- Commenters question model coverage and selection: Was xAI Grok included, and why benchmark Claude Sonnet instead of the stronger Claude Opus? They imply results could shift materially by model variant, so authors should list exact model names/versions, decoding settings (
temperature
,top_p
), and any tool access/vision toggles to ensure reproducibility. - For broader technical adoption, a request to avoid slang like âlow-keyâ or âtaskmaxxâ and use clear, standardized terminology. Define the evaluation protocol and metrics (e.g., deception success rate per round, persuasion attempt counts, ToM proxy tasks, confusion matrices for role classification) so results are unambiguous and comparable.
- A relevant deeper study is linked: arXiv:2504.04072, which reportedly examines deception/persuasion/Theory-of-Mind in LLM multi-agent social deduction settings. Cross-referencing its methodology and baselines could strengthen this benchmarkâs design and enable apples-to-apples comparisons.
- Commenters question model coverage and selection: Was xAI Grok included, and why benchmark Claude Sonnet instead of the stronger Claude Opus? They imply results could shift materially by model variant, so authors should list exact model names/versions, decoding settings (
- If they made a video game about the life of Stalin (Score: 870, Comments: 125): OP shares a short historical vignette allegedly generated with Googleâs Veoâ3 (Veo; clip posted to Reddit: video), depicting Stalinâs early life and the initial phase of Operation Barbarossaâaccurately noting the Wehrmachtâs early gainsâending before Stalingrad. Commenters flag that many visuals look indistinguishable from Red Dead Redemption 2 assets, raising questions about direct asset reuse versus modelâdriven style/asset mimicry, and that Stalin appears as an adult in the 1880s, likely due to contentâsafety constraints on rendering minors in video generation models. Discussion touches on the aesthetic fit of RDRâstyle cinematics with AI video and on IP/asset provenance risks if outputs replicate identifiable game assets; age inaccuracies are attributed to generators disallowing children.
- Commenters note assets appear âdirectly rippedâ from Red Dead Redemption 2 (RDR2). Technically, models/textures can be extracted via tools like OpenIV and composited, then paired with generative pipelines (e.g., Stable Diffusion img2img + ControlNet or a LoRA fineâtuned on RDR2) to swap identities while preserving clothing, PBR materials, and lighting. This explains the high fidelity and the unmistakable RDR2 aesthetic; however, IP/licensing constraints apply per Rockstarâs mod policy.
- The ânot allowed to generate childrenâ remark points to ageârelated safety filters in common image generators. Many UIs implement conservative moderation heuristics that block prompts implying minors (e.g., âchild/teenâ) or bias outputs toward adultâlooking subjects to reduce risk, which can distort historical depictions. Policies vary by providerâsee OpenAIâs usage policiesâso whether a prompt is blocked or âaged upâ depends on the model and the platformâs safety layer.
- What do you sell at The Strangest Flea Market? Pt. 6 (Score: 230, Comments: 16): Video post âWhat do you sell at The Strangest Flea Market? Pt. 6â is the sixth entry in a creative series showcasing novelty items; the linked media at v.redd.it/tg1hmx7522rf1 currently returns HTTP
403 Forbidden
due to Redditâs network-security gate (requires an authenticated Reddit session or a developer token; troubleshooting via Reddit Help). Based on visible top comments, featured items likely include a âcloud catâ and a âTV shirt,â though the video content cannot be verified given the403
block. Comment sentiment is positive; one user reports seeing similar content on TikTok, implying cross-platform reposting or discovery, and another expresses purchase intent (âIâll buy the cloud cat, and the TV shirtâ).
3. ChatGPT Photo Editing and AI Cultural Satire Projects
- Asked chatgpt to remove my father from my wedding photo. (Score: 471, Comments: 187): User used ChatGPTâs image editing (likely diffusion-based inpainting) to remove a person from a wedding photo; the generated outputs exhibit global identity/attribute drift and facial artifacts: a womanâs eyeglasses disappear, a childâs ear morphology changes (âhalf-elfâ), and several faces show texture/geometry mismatches producing an uncanny, âskin-walkerâ lookâtypical failure modes when instance segmentation and identity constraints are weak during generative fill. One variation also deletes an adjacent subject on the same side, consistent with mask bleed/region-growing across subject boundaries. Image previews: edit 1, edit 2; original gallery: Reddit (403 without login). Top comments note the âsubtle upgradesâ sarcastically and ask âat what cost?â, highlighting that current AI photo editors often lack robust instance-level control and can degrade photorealism when editing crowded human scenes.
- Multiple users highlight classic inpainting artifacts: non-target regions get unintentionally altered. Examples include facial distortions/uncanny âskin-walkerâ textures and identity drift, like removed eyeglasses and altered ear geometry in the child (example 1, example 2). These are typical failure modes when the model prioritizes global coherence during generative fill, causing identity features to be re-synthesized rather than preserved.
- Thereâs an implicit masking/scope issue: removal propagates beyond the intended subject, likely due to an over-broad mask or the modelâs semantic grouping of adjacent people. This can lead to adjacent subjects being partially or fully re-synthesized/removed, introducing artifacts or unintended deletions, as seen in the follow-up output with deformed heads (link).
- Tool/model notes: one result attributed to Google Gemini shows a visible gap and background inconsistency after removal (Gemini output). Another user recommends trying ânano banana,â sharing a sample that they claim performs better (sample), suggesting meaningful variance across editorsâ inpainting/fill quality.
- Cultural Satire (Score: 226, Comments: 35): OP states a video titled âCultural Satireâ was produced with generative AI: âMost Images were made with ChatGPT. It also helped me with the editing.â The linked Reddit video (https://v.redd.it/h0wf6exqq3rf1) is currently inaccessible (HTTP
403 Forbidden
), so the underlying media and prompts/workflow cannot be verified or analyzed. Top comments allege the piece is derivative, calling it a âblatantâ ripoff of Neural Viz and closely mimicking Unanswered Odditiesâ format and phrasing (e.g., âtotally worth it joyâ), and recommend checking out Neural Viz instead. Specific critique notes a recurring structure: a blob-like announcer, a third âresearcher/interviewee,â and a âskeptic.â- Multiple commenters assert the video closely copies the structure and phrasing of existing AI-video channels, especially Neural Viz and âUnanswered Oddities.â Cited specifics include reuse of the phrase âtotally worth it joyâ and a near-identical 3-role format: a blob-like announcer avatar, a third âresearcher/interviewee,â and a skeptic, suggesting minimal originality in the production template rather than new technical contributions.
- A technical question is raised about whether character movement is being generated via ChatGPT. No details are provided in-thread about the animation/motion pipeline (e.g., LLM-driven control vs. separate motion-generation or keyframed rigs), so the implementation approach for character movement remains unclear.
- The race is on (Score: 584, Comments: 296): Non-technical meme image titled âThe race is onâ implying an AI arms race measured by electrical power draw (with a cited figure of â1 TWâ) rather than by model capability or efficiency. The context suggests a comparison of AI orgs by total energy consumption as a proxy for progress, not a presentation of benchmarks or technical results. Commenters question the relevance of using power usage as a competitive metricâlikening it to comparing cars by gasoline consumption instead of speedâand debate the plausibility/significance of a â1 TWâ target.
- Energy-scope clarification: a claim that â1 TW is 1/3 of global energy usageâ conflates electricity with total primary energy.
1 TW
of continuous load equals8,760 TWh/yr
, which is roughly ~30% of annual global electricity generation (~28â30k TWh/yr; see Our World in Data: https://ourworldindata.org/electricity-mix), but only ~5% of total primary energy (~170k TWh/yr; IEA/Energy Institute: https://www.energyinst.org/statistical-review). So itâs accurate only if explicitly referring to global electricity, not total energy. - Metric debate: one commenter argues that focusing on absolute power draw is like âcompeting for which car uses more gasoline,â suggesting capability should be evaluated via energy-normalized performance metrics. For AI, that could mean tokens/sec/W, training FLOPs per kWh, or end-to-end task quality per joule, alongside datacenter efficiency (PUE) and hardware utilization rates, rather than headline MW/TW figures.
- Energy-scope clarification: a claim that â1 TW is 1/3 of global energy usageâ conflates electricity with total primary energy.
- Mr Altman, probably (Score: 531, Comments: 163): Non-technical meme referencing Sam Altman (âMr Altman, probablyâ), implying that achieving AGI/singularity primarily requires vastly more compute/energy, with a top comment joking about needing
gigawatts/terawatts
and âsend more money.â No concrete model details, benchmarks, or implementations are provided; the image serves as satire about funding and power demands rather than technical substance. Commenters largely dismiss the post as low-effort (âcontributes nothing,â âboth subs are a jokeâ), while one highlights energy/compute scale as a bottleneck for AGI.- A commenter argues that achieving âsingularityâ-level AI would demand
gigawatt
toterawatt
scale power, implying multi-GW campuses, grid-scale interconnects, and massive cooling footprints. This shifts the primary bottleneck from GPUs to energy procurement and infrastructure (transmission, long-term PPAs), where opex/capex is dominated by power availability and delivery rather than model architecture. - Another commenter frames the financing as â
hundreds of billions
â for equity/profit-sharing against utopian projections, highlighting the extreme capex and long-duration risk of frontier model training. The implied thesis is investors are underwriting negative near-term unit economics for outsized option value (first-mover/platform rents), accepting potential write-offs if scaling bets on data/compute/power pay off.
- A commenter argues that achieving âsingularityâ-level AI would demand
- Iâm almost going crazy with these suggestions. (Score: 1155, Comments: 99): OP shows a ChatGPT UI behavior on GPTâ4.1 (and a âGPTâ5â label in their client) where the assistant repeatedly injects a hardcoded followâup promptââDo you want me to suggest another topic or continue the current one?ââeven after explicit instructions to stop. This suggests a serverâside/product UX feature (autoâsuggestions) not controllable by the model via prompts, with no visible setting to disable it; the screenshot appears to capture the persistent suggestion banner in the chat thread. Commenters report the suggestions are often irrelevant and that they were unable to disable the behavior despite extended attempts, reinforcing that itâs not userâcontrollable in current builds.
- Suggestion relevance is poor: one user notes the assistant proposes actions unrelated to the current task âhalf the time.â This indicates weak context alignment of proactive prompts, leading to workflow interruptions instead of task-focused assistance.
- Suppression of proactive prompts appears unreliable: a user spent âa solid hourâ trying to stop the behavior and âfailed miserably.â Even after explicit rejections, the recurring âwant me toâ prompt still appears later (example screenshot: https://preview.redd.it/dsta4lpxx0rf1.jpeg?width=750&format=pjpg&auto=webp&s=400dfe226d3b57fe860ec36185a84871b808c35c), suggesting no durable preference memory or insufficient cooldown logic.
- Thereâs a perceived regression (âkeeps getting worseâ), implying the frequency or aggressiveness of auto-suggestions may have increased. Users report that refusals donât attenuate future prompts, pointing to weak negative-feedback handling for suggestion triggers.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. MCP Tooling for Agentic Browsers and IDEs
- Chrome DevTools MCP Gives Agents the Wheel: Google announced the public preview of Chrome DevTools MCP, letting AI coding agents (Claude Code, Cursor, VS Code, Gemini) control a live Chrome via CDP/Puppeteer with a oneâline npx install, including performance traces, DOM/console inspection, screenshots, and network capture, as posted on Chrome DevTools MCP (public preview).
- Developers highlighted the one-line npx install and discussed pairing MCP with Claude Code and Cursor for fullâloop browser debugging and E2E tests.
- MCP Servers Supercharge Local Agents: Cursor users clarified MCP servers act as an API surface for agents, enabling web search with exa.ai, analysis, and integrations like Playwright MCP, Context7, Azure DevOps MCP, and GitHub MCP to automate local coding workflows.
- They framed MCP as a unifying contract that lets agents compose capabilities (search, run, analyze) into agentic coding loops across editors and CLIs.
- Spec Scrutiny Tightens MCP Semantics: Contributors noted that Model Context Protocol â Embedded resources implies a resource âtitleâ missing in schema.ts and opened a discussion on the
ReadResourceResult.contents
array in issue #1533 to clarify multiâpart web resources.- They debated adding both title and name for embedded resources that arenât retrievable via read calls and suggested using Claude Code to draft an SEP as a âgood testâ.
2. Gemini Live and the Model BakeâOffs
- Gemini Live Talks, Listens, and Calls Functions: Googleâs Logan Kilpatrick announced the Gemini Live model with native audio, improved function calling, and more natural conversations, shared on Gemini Live model.
- Early testers praised conversational flow and accents but flagged iOS Safari issues, backgroundânoise sensitivity, sessionâlength limits, and STT accuracy concerns.
- GPTâ5 Codex Labors on Livebench: Perplexity users reported GPTâ5 Pro (aka GPTâ5 Codex) being evaluated on livebench, citing long thinking times and cases where the model produced only half an answer.
- Members asked whether Perplexity had reliability issues with GPTâ5 Codex, suggesting the model may still be midâiteration.
- 4o Outwits GPTâ5 in CommonâSense Clips: OpenAI community posts claimed 4o beat GPTâ5 on commonâsense imageâbased tests, prompting debates about experimental setup and validity.
- Skeptics reminded that itâs âhard to say without hearing the reasoning of gpt 5â, noting the model might have inferred the prompter was joking.
3. GPU Kernels and Consistency: Hopper TMA to PTX Proofs
- PTX Consistency Gets Formal with Dat3M: Engineers surfaced A Formal Analysis of the NVIDIA PTX Memory Consistency Model and followâups on compound/unified GPU memory models, with the Dat3M tool translating PTX/Vulkan into Dartagnan for verification.
- They pointed to automated identification of missing PTX fences and suggested moving such checks to the NVVM IR layer for earlier detection.
- Chasing Minimal Hopper TMA Matmul: The community sought a minimal Hopper TMA matmul kernel in raw CUDA (no CUTLASS/Triton), inspired by FAIRâs new Causal World Models (CWM) paper, while others hit
unspecified launch failure
with WMMA+TMA.- Debug threads traded ncu profiling tips for smem bank conflicts and headerâinclude fixes when CUDA Graphics/texture APIs appeared undefined.
- ThunderKittens Trips on H100 TMA: A ThunderKittens H100 matmul crashed with a runtime error under CUDA 12.8/PyTorch 2.7 nightly, with full logs and build details shared for reproduction.
- Authors indicated nvshmem support would arrive in a followâup (paper 2), per the attached image.
4. Modularâs Mega Round and Mojoâs Metal Move
- Modular Bags $250M for a Unified Compute Layer: Modular announced a $250M raise to accelerate work on AIâs unified compute layer, crediting community momentum and outlining faster feature delivery.
- Staff invited wouldâbe contributors to DM in community channels, signaling a more open collaboration model in the coming year.
- Mojo Targets Metal with Custom Bitcode: Developers cheered a Metal GPU target in Mojo, including a custom bitcode writer that could be reused to aim DSLs at Metal GPUs.
- They asked whether the bitcode writer was available and reusable, eyeing crossâstack portability for domainâspecific compilers.
5. Prompting, Evaluation, and VLM Studies
- Flexible Extract Flops on GSM8k: On GSM8k v3 (5âshot), flexibleâextract scored 0.3594 exact_match, underperforming strictâmatch at 0.5742, surprising evaluators tracking extraction robustness.
- One member joked âhaha how can flexible be worse than strictâ, fueling debate on precisionâfirst matching vs. permissive extraction.
- ChainâofâThought: Less Can Be More: Practitioners warned heavy CoT can hurt performance on âthinkingâ models, sharing an interactive CoT infographic (React component) with task presets, visibility toggles, and a latency slider.
- They advocated outcomeâfocused prompting (persona, verifyâthenârespond) over forcing verbose CoT, and to validate via experiments rather than boilerplate CoT.
- VLMs Defy LLM Prompting Habits: Researchers requested benchmarks and interpretability studies for VLM prompting, noting normal LLM prompting techniques often falter with visionâlanguage models.
- Proposals included mechâinterp probing and exploring an LLMâequivalent of CFG to bridge concepts and fill missing knowledge.
Discord: High level Discord summaries
OpenRouter Discord
- OpenRouter Bumbles Qwen Pricing: The endpoint
qwen/qwen3-235b-a22b-04-28:free
was mistakenly priced for 26 hours, leading to unintended charges and the team apologized, then issued automatic refunds.- The team implemented extra validation checks to prevent future pricing errors, ensuring system safeguards are enhanced.
- Qwen3 VL Ratelimits Driving Users Nuts: Users are complaining about insane ratelimits on Qwen3 VL, reporting the model works only 30% of the time with frequent 429 errors even when using a proxy.
- One member suggested OpenRouter create an FAQ page to address these issues, with a pinned link in the support channel.
- SillyTavern Dominates Janitor AI: Members mocked a new OpenRouter user who referred to their API key as a proxy, admitting they were a JAI user unfamiliar with SillyTavern and its customizability.
- Users say Janitor AI is just an LLM front end that is constantly throwing 429 errors.
- Encoder LLMs Tokenize Vectors: Encoder LLMs convert text into vectors by tokenizing the text and utilizing a lookup table to transform tokens into their pre-trained vectors.
- The conversation clarified itâs essentially token embedding versus full sentence embedding, where a sentence is processed as one token after passing through the network; discussions mentioned value matrices within the qwen3 embedding 0.6B model.
- Microsoft Courts Anthropic After Messy Breakup: Microsoft is now integrating Claude in Microsoft 365 Copilot, marking a big partnership.
- Discussion wondered if OpenRouter is big enough to discuss volume discounts with DeepInfra, Hyperbolic, Anthropic, Vertex, etc.
Unsloth AI (Daniel Han) Discord
- Strix Halo Shipping Debacle!: A userâs Strix Halo machine arrived with a damaged shipping label, raising concerns it was swapped with a B200.
- Despite the damage, the device could be turned on, alleviating immediate concerns about its functionality.
- Eval Set Size Sparks Hot Debate!: A user questioned the limited evaluation set size of 30 examples, concerned about loss inaccuracy, while others argue 30 provides statistically significant results under hardware/time constraints.
- Small eval set sizes yields an unstable graph that is still useful for a specialised use case.
- Gemini 2.5 Pro Downgraded for Future Gains?: A user alleges that Gemini 2.5 Proâs instruction following, world knowledge, and prompt understanding have declined compared to Flash.
- They speculate this downgrade may be intentional to enhance the perceived performance of Gemini 3, suggesting a strategic manipulation.
- Vision Project Gets Company Boost!: A member is excited to get company hardware access for a vision project, avoiding additional costs on Runpod.
- The member aims to convince the company to release it, but probably wonât win that fight.
- Llama 3 Molds Perfectly!: Members suggest Llama 3 for fine-tuning due to its brain being like putty, easily molded to specific tasks and preferences.
- Alternatively, members suggest Gemma for those seeking the Gemini flair in their models.
Perplexity AI Discord
- Comet Browser Invades: Members are using the Comet Browser daily and giving away free invites, but note seeing Comet access as less exclusive after the announcement channel.
- One user experienced issues redeeming their student access.
- GPT-5 Pro undergoing Livebench: Users reported that GPT-5 Pro is being tested out in livebench but is exhibiting long thinking times; itâs also known as GPT-5 Codex.
- Another user notes that the model only produced half the answer, and another asked whether Perplexity had problems with GPT-5 Codex.
- Novel Crafter: Creative Writing Savior: Users are using Novel Crafter for creative writing due to its essential tools and customizable features, enabling users to customize tools and implement code without rewriting.
- One user notes it has code implemented so you can mention a snippet in a prompt without ever having to write again.
- Perplexity Max Plummets: Users express disappointment with Perplexity Max, noting only one email address can be integrated, leading to cancellations after 30 days.
- Members suggest more API credits are needed, deeming the email integration feature for a single account useless.
- Portkey AI to Meetup in SF: Portkey AI will host an in-person event on September 25th in SF for running LLMs in production, partnering with Exa; you can RSVP here.
- Limited spots are available, so those interested should register quickly.
Cursor Community Discord
- Gemini one-ups Sonnet for Code Design: A user proposed that Gemini surpasses Sonnet for design tasks due to Sonnetâs inaccuracy with colors.
- The user claimed that Gemini has a superior ability to execute design-related coding tasks, due to Sonnetâs deficiencies in color accuracy.
- GPT-5-Codex afflicted with Click Bugs: Users reported encountering bugs in the updated GPT-5-Codex model, specifically related to unclickable buttons, with a screenshot of the bug provided for reference.
- These bugs are interfering with the usability of the model for some users, but the team has responded and is working on fixes to obey the AI rules.
- Windsurf waves in Generous Free Plan: Users are taking advantage of Windsurfâs free plan, which includes various models and promotions, noting that model availability can depend on using personal keys for payment.
- The free plan gives users 25 credits per month and provides a pro trial with 200 credits.
- MCP Servers unlock Agent Coding Powers: Users discussed how MCP servers could enhance local coding, clarifying that these servers function as an API for agent use, supporting tasks such as web searches with exa.ai and analysis.
- The conversation mentioned several MCPs like Playwright MCP, Context7, Azure DevOps MCP, and GitHub MCP as examples of tools that provide web search capabilities for agents.
- Cursor Commits Portuguese Localization Bug: Users have observed that Cursor generates commit messages in their local language instead of English, and are seeking feedback on nightly builds to address this issue.
- The team replied stating that it is mostly heuristics, with speculation that the localization might be intentionally implemented in future updates to align with AI rules.
OpenAI Discord
- GPT-5 Mini Flunks AGI Test: Members using attached prompts for psychological profile creation (Psychological_Mathematics_Guide.pdf, Advanced_Psychological_Profile_Creation_Prompt_3.pdf) determined that GPT-5-Mini (High) is just as dumb as its predecessors.
- One user suggested that Kimiâs response felt more aligned with AGI, noting that GPT-5 High doesnât get the joke. Not AGI level yetâŠ
- 4o Smokes GPT-5 in Brainpower Bout: Members shared images indicating 4o outshining GPT-5 in tests of common sense, prompting debate about the validity of the results.
- It was mentioned that itâs hard to say without hearing the reasoning of gpt 5 and perhaps it was aware that the prompter was joking.
- Unlock Companion Mode for Chatbot Bliss: ChatGPT defaults to an âAgentâ persona, designed for problem-solving, but users can switch to âCompanionâ mode for a co-creative experience.
- To maintain the âCompanionâ mode, members can use âMode-Lockingâ and if ChatGPT drifts back, a simple âMode-Switchingâ command can reset it to its original state.
- CoT Prompting: Sometimes Less Is More: Members suggest that adding excessive Chain of Thought (CoT) requests can reduce model performance, particularly on models already designed for logical deduction.
- Experimentation is vital and prompts should focus on the desired outcome rather than prescribing a specific thought process.
- Prompt Engineers Reverse Translating: To enhance translation results, members suggest providing detailed context about the target audience, such as Weâre translating this for a woman who grew up in Yugoslavia in the 1940s, she has a 3rd grade education, so we need to phrase this for her.
- This approach improves how the model adapts the translation for the intended audience.
HuggingFace Discord
- Hugging Face Cache Cleansing: A user purged 100GB of Hugging Face cache, lamenting the indefinite persistence of datasets and the frustration of repeated downloads.
- They noted the annoyance of repeatedly downloading the same datasets, sparking discussion on cache management strategies.
- Language App Users in Disgust: Users trashed one unnamed language learning app, with one saying i would torch the bird alive if that were an option.
- Another shared that they deleted the unnamed app because it was a waste of time.
- Qwen Model HF Spam: Someone is flooding Hugging Face with Qwen 2.5 models following the naming convention Qwen2.5-0.5B-Instruct-randomword1-randomword2-randomword3, linking it to Gensyn.
- The motivation is suspected to be SEO-related, inflating the model count with smaller models and linking back to gensyn.ai for promotional purposes.
- GPU Driver Black Screen Blues: A user reported their monitor blacks out whenever the GPU activates in both Windows and Linux.
- Despite multiple attempts to correct the drivers, the problem persists, forcing them to run the monitor off the motherboard.
- 3090 Runs out of Memory?: A member experienced an OOM error on a 3090 (Linux), even without LoRA, while attempting to allocate 20.00 MiB on a GPU with 23.55 GiB total capacity.
- Itâs unclear whether fine-tuning should work in 24G GPU RAM without LoRA.
GPU MODE Discord
- Hopper TMA Kernel Implementation sought: Members sought a minimal implementation of a matmul kernel using the Hopper TMA (Tensor Memory Accelerator), inspired by the new CWM (Causal World Models) paper from FAIR, specifically in raw CUDA without relying on CUTLASS or Triton.
- Another member faced an
unspecified launch failure
while implementing a minimum matmul kernel using WMMA and TMA.
- Another member faced an
- PTX Data Races Formally Analyzed: A formal analysis of the NVIDIA PTX Memory Consistency Model (ACM link) explores how languages like CUDA and Triton can target PTX with memory consistency, even though PTX allows for data races.
- The Dat3M tool (GitHub link) translates PTX and Vulkan models into the Dartagnan verification tool, making it the first analysis tool for multiple GPU consistency models.
- Torchrun API Documentation Discrepancy: A user reported that
uv run torchrun --help
shows different options compared to the official documentation of the new torchrun API, causing confusion.- The discrepancy in torchrun âhelp output caused confusion about the correct usage, due to a different set of options than expected based on the PyTorch Elastic documentation.
- Kernel Profiling Reveals LLM Embedding Pricing: A member shared a Substack article detailing kernel profiling techniques to understand the profit margins of serving LLMs, along with a related X/Twitter post.
- The investigation suggests profiling and investigating kernels can provide insights into the profit margins of serving LLMs.
- Singularity Transforms into Apptainer: The open source project previously known as Singularity was renamed to Apptainer when it became part of the Linux Foundation, likely to distinguish it from the commercial fork called Singularity[CE] by Sylabs.
- Despite the renaming, Apptainer might still support the singularity alias for the CLI.
LM Studio Discord
- Set Thinking Budget for Seed-OSS: A user inquired about how to set the thinking budget for Seed-OSS.
- No solution was provided in the context.
- Markdown Parser Sought for Conversation.json: A member is seeking an effective method to parse
.conversation.json
files into a human-readable markdown format.- The need arises due to the variability in models and regeneration versioning.
- LM Studio Linux Plugin Disparity: The Linux version of LM Studio reportedly offers fewer plugins compared to the Windows version.
- The user did not elaborate further on specific missing plugins or functionalities.
- Ollama LoRA Injection: Fine-Tuning?: A debate emerged on whether injecting data and using LoRA with Ollama constitutes fine-tuning.
- Some members claimed that the knowledge is baked into the model files themselves, and isnât just a system prompt, with a user confirming that Ollama allows injecting LoRA weights, customizing system prompts, and embedding knowledge directly into the model.
- Budget GPUs Evaluated for LLMs: Recommendations for budget GPUs included the 2060 12GB at $150, 3060 12GB for $200, 2080ti for ~$230, and 5060ti 16GB for $400 (new).
- A used 3090 was also suggested, but its $700 price tag was deemed not budget-friendly, and Tesla K80s were dismissed for AI/LLM use as basically e-waste.
Eleuther Discord
- Flexible Extraction Flunks Math Test: In GSM8k benchmark version 3,
flexible-extract
scored 0.3594 which is worse thanstrict-match
which scored 0.5742 inexact_match
metric with 5-shot learning.- Members found it funny and questioned how can flexible be worse than strict.
- DeepIgnorence Faces Generalization Gap: Members discussed how DeepIgnorence requires a difficult type of generalization, especially because models are really good at style transfer but struggle with more complex inference and reasoning.
- One member noted the danger of this, that we should not expect to be able to train on clothed minors and nude adults and have an image model that canât generate CSAM.
- Seeking Math to Model Knowledge Completion: A member inquired about a mathematical formalism to distinguish settings where knowledge completion works, highlighting the complexity of the problem, especially for a very specific fact which is independent of other knowledge and unknown to the model.
- They suggested that in the worst case, it seems information theoretic.
- Can CFG Bridge Knowledge Gap?: Members discussed the effect of techniques like CFG on style transfer where one member heard anecdotally that models that donât use it cannot perform style transfer as well.
- One member proposed, maybe some research can be done with the LLM equivalent of CFG to see if it can bridge the gap between concepts to fill in missing knowledge.
- VLMs Resist Normal Prompting?: Members are seeking studies that benchmark different prompting methods in VLMs and interpretability studies explaining their effectiveness.
- They note having seen several studies which discuss how ineffective normal LLM prompting techniques are for VLMs and are considering a mech-Interp oriented probing study.
Moonshot AI (Kimi K-2) Discord
- Kimi Shuts Down Delusions: A user tested Kimi and appreciated that it doesnât encourage delusions when presented with outlandish claims.
- Kimiâs blunt denial of claims related to private voices, pets getting raptured, and baseless hype around the 2025 date went viral, as shared in this X post.
- Mini-Kimi on the Horizon?: A member inquired about the possibility of a mini version of Kimi that retains the same writing style but with a smaller footprint.
- Speculation arose that distilling a smaller Qwen model on K2 might be a viable alternative if Moonshot doesnât pursue a mini version.
- Distilling Reasoning with Kimi on Qwen: Doubts were raised about the rationality of distilling a Qwen model with Kimi, with some arguing that Deepseek only did it because Qwen initially lacked good reasoning capabilities.
- Counterarguments suggested that K2âs distinct problem-solving style and writing prowess could benefit a smaller Qwen3 model through distillation, particularly in areas like prose and referencing obscure knowledge.
Latent Space Discord
- Gemini Goes Live with Killer Audio: Logan Kilpatrick from Google announced the new Gemini Live model that features native audio, improved function calling, and more natural conversations on X.
- Initial feedback includes praise for conversational flow and accents, but highlights iOS Safari issues, background-noise sensitivity, session-length limits, and STT accuracy concerns.
- Chrome DevTools MCP Opens Up For AI Agents: Google has released a public preview of Chrome DevTools MCP, a new server enabling AI coding agents like Claude Code, Cursor, VS Code, and Gemini to control a live Chrome browser via CDP/Puppeteer, announced on X.
- Agents now have the ability to run performance traces, inspect the DOM and console, capture screenshots and network traffic, and debug web apps in real time with a one-line installation via npx.
MCP Contributors (Official) Discord
- Embedded Resources Missing Title and Name: A member noticed the Model Context Protocol documentation implies embedded resources have a title, but itâs missing in
schema.ts
and no name field matches the Resource object.- The member questioned whether title and name are needed because embedded resources arenât always retrievable via a read resource call.
- Claude Code Debated for Writing SEP Documentation: A member proposed using Claude Code to draft an SEP (Standard Enhancement Proposal) documentation as a good test of the toolâs capabilities.
- Another member agreed that obtaining an SEP for the subject matter should be straightforward.
- ReadResourceResultâs contents Array Semantics Questioned: A discussion started about the
ReadResourceResult.contents
array in this GitHub issue, with questions about its intended purpose and semantics due to lack of documentation.- A member explained its potential use with Web Resources, such as a webpage composed of html and images, or situations without negotiated tokenizable/renderable mime types.
Nous Research AI Discord
- Anthropic Report Focuses on Cybercrime Misuse: A member shared Anthropicâs report on detecting and countering AI misuse, highlighting that the actual threats are low-grade cybercrime, or vibe hacking.
- The discussion included whether applying for jobs with fabricated credentials is illegal, and the report specifically mentions completely fabricated masterâs degrees.
- LLMs automate personal life: A member reported that an LLM did all the legwork in achieving a recent accomplishment.
- According to them, all they had to do was spend many hours self-reflecting and feeding info about myself into the AI.
aider (Paul Gauthier) Discord
- Aiderâs Clear Command Tidies Chat History: The
/clear
command in Aider removes the chat history, but added files remain in the context.- Users can use the
/context
command to view the token allocation for each file, allowing for better context management.
- Users can use the
- Aider Grabs Web Content via URL: Aider doesnât natively support Internet search, but users can utilize
/web https://www.example.com/
to scrape content from specific URLs.- This feature allows users to integrate external information into the Aider context without direct search capabilities.
Yannick Kilcher Discord
- Saturday Evening Talks Anticipated: A member expressed excitement for the upcoming Saturday evening talks (European time) hosted by Yannick Kilcher, noting the announcement was made earlier in the week.
- Another member mentioned the desire to read the discussed papers beforehand to better understand the presentations.
- Hyperparameters Beat DPM++2m: The author of the paper âHyperparameters are all you needâ is presenting their work, which employs a five-step inference method for diffusion models.
- The research indicates that an 8-step inference surpasses DPM++2mâs 20-step inference in FID scores with an approximate 60% reduction in computational cost without retraining existing models; using existing models without retraining, inviting feedback, collaborators, and application ideas.
- ODE Solvers Eclipse DPM++2m: According to a recent paper, an 8-step Diffusion ODE Solver outperforms 20-step DPM++2m without needing additional training, with focus on inference speed critical applications.
- The author seeks feedback and invites discussion on ODE solver improvements, especially from those working on diffusion efficiency.
- Alibaba Qwen Announced: A user shared a link to Alibabaâs Qwen on X.com.
- No further context was provided.
Manus.im Discord Discord
- Manus PDF Download Stymied: A user reported that Manus was getting stuck while downloading a PDF for researching accounts and even after manually downloading the file and providing a link, Manus kept asking to upload the file.
- The user sought advice on resolving this issue, but the conversation ended there.
- Beta Pro Access Remains Elusive: A user inquired about obtaining access to Beta Pro.
- The discussion ended without a response, leaving the method for acquiring Beta Pro access unresolved.
Modular (Mojo đ„) Discord
- Contributors Explore Modular: A user inquired about contributing to Modular, a staff member suggested a DM to explore potential collaboration avenues.
- Details around specific skills and contributions were not mentioned in the public channel.
- Modular Closes Massive $250M Funding Round: Modular announced it has raised $250M to accelerate building AIâs unified compute layer and thanked the community for its contributions and feedback.
- Modular will focus on community empowerment through feature enhancements and expedited response to feedback in the coming year.
tinygrad (George Hotz) Discord
- clspv plagued by Build Errors: The main branch of clspv is currently failing to build due to errors, but a user found that reverting to previous commits resolves the issue and shared a forked repository with a working stable branch.
- Users can pull the forked repository and checkout the stable branch to build clspv successfully.
- Python Bindings to Pip install clspv: A user is developing Python bindings for clspv, with the goal of enabling direct installation via pip using a single command.
- This enhancement would streamline the installation process, making clspv more accessible to Python developers.
DSPy Discord
- DSPy Gains New Attachment: The
attachments
add-on for DSPy helps engineers add new files to their projects.- The add-on features standalone
uv add
functionality, helping engineers streamline projects in Python.
- The add-on features standalone
- ColBERT Has Trouble with Long Context: A member confirmed that longer context doesnât work well with ColBERT, even when repeating the CLS token.
- It remains unknown whether this is a limitation of ColBERTâs implementation, or an issue with the model architecture itself.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
OpenRouter â· #announcements (1 messages):
Qwen Pricing Incident, Automatic Refunds, Validation Checks
- Qwenâs Pricing Snafu: The endpoint
qwen/qwen3-235b-a22b-04-28:free
was mistakenly priced for 26 hours on September 16th, causing unintended credit deductions.- Users saw erroneous charges for the supposedly free model in their activity logs.
- Refunds Rolled Out: All impacted users received automatic and full refunds for the incorrect charges.
- The team apologized for the confusion.
- Validation Checks Fortified: Extra validation checks have been implemented to prevent a recurrence of this pricing error.
- The team is ensuring that future pricing mishaps are avoided through enhanced system safeguards.
OpenRouter â· #general (709 messagesđ„đ„đ„):
Qwen3 VL ratelimits, Deepseek alternatives, Janitor AI vs SillyTavern, OpenRouter API key as proxy, GPT-5 features
- Qwen3 VLâs Ratelimits are Insane: Members complained about the ratelimits on Qwen3 VL, noting that the model works only 30% of the time.
- The model has had problems, with users experiencing 429 errors after using a proxy for the first time.
- SillyTavern is preferable to Janitor AI: Users discussed Janitor AI with one commenting that SillyTavern is better because of customizability.
- Members say Janitor AI is an LLM front end, and a constant stream of new users ask why their favorite models have been returning 429 errors.
- Free DeepSeek Models Suffer From Rate Limits: Users reported problems when using the free version of Deepseek V3 0324, citing 429 errors.
- It was suggested that OpenRouter create a FAQ page to address these issues, with a link pinned in the support channel.
- OpenRouter Newb doesnât even know about SillyTavern: Members mocked a new user of OpenRouter for calling their OR API key a proxy and admitted that they were a JAI user but didnât even know what SillyTavern was.
- One member joked that it takes only minutes of direct exposure to this general channel before beginning the transformation into a twisted cynical husk.
- OpenRouter OPS are like Feds?: After a moderator joined the chat, users began joking that they were working for a secret OpenRouter fed force, responsible for stopping gooning.
- OpenRouter staff denied it, but said that the Open Router Goon Force is still investigating rumors of Proxy Errors.
OpenRouter â· #discussion (78 messagesđ„đ„):
Encoder LLMs, Token embeddings, MLP blocks, Residual stream, Attention mechanism
- LLM Encoders Tokenize Text Into Vectors: Encoder LLMs turn text into vectors by first tokenizing the text, then using a lookup table to turn tokens into their pre-trained vectors.
- The discussion clarified that itâs essentially token embedding versus full sentence embedding, where a sentence is treated as one token after passing through the network.
- MLP Blocks and Attention Impact Token Vectors: The conversation addressed whether encoder LLMs have MLP blocks, confirming that transformers typically have attention followed by feedforward networks.
- It was noted that even a single token, passed directly from a lookup table versus going through the full encode, will differ due to these blocks; furthermore if key and query match, then it will add its own value vector to itself.
- Residual Streamâs Role in LLM Modification: Members discussed that MLPs modify the residual stream, referring to the modified embedding as it passes through the model rather than solely modifying the value vector generated during attention.
- The discussion mentioned the existence of value matrices within this process and mentioned that it was found in qwen3 embedding 0.6B model.
- Microsoft and Anthropic Partnership is Blooming: Microsoft made significant strides by making Claude now available in Microsoft 365 Copilot, marking a rebound after a messy breakup.
- Discussion wondered if OpenRouter is big enough to discuss volume discounts with DeepInfra, Hyperbolic, Anthropic, Vertex, etc.
- Gemini-cli ReadManyFiles tool being utilized: Gemini-cli is making big strides with the ReadManyFiles tool, as detailed in the v0.6.0 release.
- A member said The ReadManyFiles tool gets a lot of work from me.
Unsloth AI (Daniel Han) â· #general (89 messagesđ„đ„):
Off Policy GRPO, Qwen3-VL-235B-A22B-Thinking GGUF, Unsloth'd models and AI safety, P100 for training
- GRPO off the policy beaten path: A member inquired about the existence of complete off-policy GRPO implementations, noting that online searches only revealed GRPO methods using the old modelâs policy.
- There was no further discussion or links provided on this topic.
- Qwen3-VL-235B-A22B-Thinking GGUF waiting game: A member asked about the status of Qwen3-VL-235B-A22B-Thinking GGUF releases.
- A team member confirmed that itâs not supported by llama.cpp yet and linked to llama.cppâs llama-arch.h file as reference.
- Peeling back the layers of Unslothâd AI safety: A member questioned the use of Unslothâd models in AI safety research, inquiring about potential impacts from lossless or lossy transformations on interpretability experiments.
- Another member clarified that Unsloth is a training framework, not a type of model, and referenced the dynamic 4-bit quantization algorithm used by Unsloth.
- P100 GPUs get roasted for training: A member asked about the performance expectations of using a multi-GPU rig with P100 16GB GPUs for fine-tuning.
- Another member simply stated that P100âs are garbo for training, without providing further elaboration.
Unsloth AI (Daniel Han) â· #off-topic (293 messagesđ„đ„):
Strix Halo, Evaluation set, Training loss, 5090 GPU, Gemini 2.5 Pro
- Strix Halo Gets Damaged During Shipping: A user reported that their Strix Halo machine arrived, but the shipping label was damaged, and it might have been replaced with a B200.
- Despite the damage, the device could still be turned on, sparking relief and curiosity about its contents.
- Eval Set Size sparks debate: A user questioned the practice of limiting the evaluation set to only 30 examples, noting that it would make the loss quite inaccurate.
- Another user responded that 30 is a good number for statistically significant results, especially if hardware/time constraints exist, while smaller sizes would give an unstable graph that is still useful for a specialised use case.
- Evaluation loss works with integer: Users debugged issues with displaying the evaluation loss, eventually finding that setting
eval_steps
to an integer value (like 5) instead of a decimal (like 0.2) resolves the problem.- They noted that 0.2 is wrong since itâs logging eval steps into train steps wrongly with zero loss.
- 5090 GPU sparks envy: A user mentioned owning a 5090 GPU, prompting another to comment about the high cost, they followed it up by saying someoneâs got the machine of their dreams.
- Later, there was a discussion about whether to buy the 6000 Pro or the L40S, with one user concluding that the L40S is the better choice overall due to its superior compute.
- Gemini 2.5 Pro Dumber Than Flash?: A user claimed that Gemini 2.5 Pro is now dumber than Flash in terms of instruction following, world knowledge, and prompt understanding.
- They speculated that this may be intentional to make Gemini 3 look better, suggesting that they intentionally made it worse so that gemini 3 looks better.
Unsloth AI (Daniel Han) â· #help (39 messagesđ„):
Company hardware access for vision project, Fine-tuning model recommendations, Qwen2.5-VL fine-tuning for domain-specific knowledge, Gemma 3N notebook error, Distillation usage
- Company Hardware Paves Way for Vision Project!: A member is getting access to company hardware for a vision project and is excited to not spend another $500 on Runpod.
- They hope to convince the company to release it but probably wonât win that fight.
- Llama 3 brain is pure putty!: Members recommend Llama 3 for fine-tuning, since its brain is like putty and will easily mold to what you want.
- Another member suggests Gemma if the user wants a model with the Gemini flair.
- Qwen2.5-VL: frame by frame!: Members discussed fine-tuning Qwen2.5-VL for domain-specific knowledge, noting it requires training per frame for video input, accepting only image, text, and bounding box.
- Passing a null image for text-only data might associate having no image with the given data, so bad results might arise.
- Gemma 3N notebook throws error: A user encountered an AttributeError while running the Gemma 3N notebook made by Unsloth, suspecting a version mismatch.
- Another member suggested the issue might be related to the dataset format, which should be in valid sharegpt format, or that the data prep cells were not executed correctly.
- Gemini allows distillation: Members discuss distillation to teach a student model to behave like the teacher model, specifically with the Gemini model.
- One member stated that they would need to look into it.
Unsloth AI (Daniel Han) â· #showcase (1 messages):
ChatGPT Instagram analysis, Competitor Comparison, Reel analysis
- ChatGPT wields Instagram Analysis Power: Members reported that ChatGPT can now analyze Instagram, review reels, and compare competitors.
- Theyâve also created a YouTube video showing how it saves me from doom-scrolling.
- Instagram Reels Get the ChatGPT Treatment: A user discovered that ChatGPT can analyze Instagram Reels, providing valuable insights.
- This capability helps users avoid doom-scrolling by efficiently reviewing and understanding reel content.
Perplexity AI â· #general (272 messagesđ„đ„):
Comet Browser, GPT-5 testing, Novel Crafter, Perplexity Max, Qwen3 Max
- Comet Browser Craze and Free Invites: Members are daily driving Comet and giving away free invites, with many liking it, while others saying they are now seeing Comet access as less exclusive after seeing the announcement channel.
- One user experienced issues redeeming their student access.
- GPT-5 Codex under testing: Users reported that GPT-5 Pro is being tested out in livebench with long thinking times, but another notes that the model produced only half the answer.
- One member asked if Perplexity had problems with GPT-5 Codex.
- Novel Crafter hailed as Creative QoL: Users are using Novel Crafter for creative writing for its essential tools and customizable features, allowing users to customize their own tools and implement code without rewriting.
- A user mentions it has some code implemented so you can mention a snippet in a prompt without ever having to write again.
- Perplexity Max Plan falls short: Users express disappointment with Perplexity Max, noting only one email address can be integrated and canceling the Max plan after 30 days as a result.
- Members suggest more API credits are needed, calling the email integration feature for only one account useless.
- Qwen3 Max incoming: Users are discussing the upcoming Qwen3 Max and its parallel reasoning capabilities, linking to a Qwen blog post.
- Some speculate whether Qwen3 Max will be free, with a user jokingly setting the time of arrival with plpanx = 24.
Perplexity AI â· #sharing (6 messages):
Portkey AI, Apollo 16, Artemis 2, Carl Sagan, 3i/Atlas
- Portkey AI to host in-person event: Portkey AI is hosting an in-person event on September 25th in SF for running LLMs in production, in partnership with Exa, with limited spots available; RSVP here.
- Apollo 16 Inspires Dreams of Space: A member shared an inspiring video of Apollo 16 in anticipation of Artemis 2âs launch in April 2026, highlighting NASAâs past achievements and their influence on todayâs technology.
- They included a reference to Gene Cernanâs last words from Apollo 17, before mankind departed the moon for almost 50 years.
- Carl Sagan Inspired Scratchpad: A member shared their Carl Sagan-themed âscratchpadâ looking at 3i/Atlas, describing it as an invitation to listen to the universe, humble and awed.
Perplexity AI â· #pplx-api (3 messages):
Solution being found
- A Solution Still Needed: A member asked Is there any solution for this now?
- Another member replied that no solution has been found yet.
- No solution available: A member inquired about the availability of a solution.
- Another member confirmed that no solution is currently available.
Cursor Community â· #general (264 messagesđ„đ„):
Gemini vs Sonnet, GPT-5-Codex Bugs, GLM 4.5 on Cursor, Windsurf free models, MCP (Model Control Program) Servers
- Gemini beats Sonnet for Design Code: A member suggested that Gemini is better than Sonnet for design-related tasks because Sonnet doesnât even get the colors right.
- GPT-5-Codex has Click Bugs: Users are experiencing bugs with the updated GPT-5-Codex model where buttons are unclickable.
- A user posted a screenshot of the bug here.
- Windsurf has Generous Free Plan: Users are using Windsurfâs free plan with generous models and promotions but also note it can depend on paying for the models with your own keys.
- The free plan offers 25 credits a month and a pro trial of 200 credits.
- MCP unlocks Agent Powers: A user inquired about MCP servers and how they can aid in local coding, with other users pointing out that MCP servers act as an API for agent use, enabling them to perform tasks such as web searches and analysis.
- The conversation highlighted the use of tools like exa.ai for web searches and the availability of various MCPs like Playwright MCP, Context7, Azure DevOps MCP, and GitHub MCP.
- Cursor Commits only in Portuguese?: Users report issues with Cursor generating commit messages in the userâs location language instead of English, and are looking for feedback on nightly builds.
- One user said that this may be added in future updates to obey the AI rules. The team replied stating that it is mostly heuristics.
OpenAI â· #ai-discussions (49 messagesđ„):
GPT-5 Mini, Kimi AGI, 4o vs GPT-5, Markov Chain, GPT-OSS-20B
- GPT-5 Mini deemed just as dumb: Members shared attached prompts for psychological profile creation (Psychological_Mathematics_Guide.pdf, Advanced_Psychological_Profile_Creation_Prompt_3.pdf) and found that GPT-5-Mini (High) is just as dumb.
- One member noted that another modelâs (Kimi) response seemed closer to how an AGI would answer than GPT-5âs, stating GPT-5 High doesnât get the joke. Not AGI level yetâŠ
- 4o beats GPT-5 in common sense contest: Members shared images showing 4o winning over GPT-5 in common sense reasoning.
- One member added that itâs hard to say without hearing the reasoning of gpt 5 because maybe it knew the prompter was obviously joking.
- Markov Chain explained: A member gave a detailed explanation of a Markov Chain as a mathematical model for systems that move between states depending only on the current state, not on the history of past states and its uses in Google PageRank, Natural Language Processing, Finance, Physics & Biology and Games.
- The explanation included discussion of the Markov Property and Transition Matrices.
- GPT-OSS-20B labeled most censored model: One member shared that GPT-OSS-20B is possibly the most censored model ever, sharing an image showing that it just noped out.
- Sora download errors may get fixed w/ Perplexity: A member was getting an error message every time they tried to download a new generated video from Sora.
- Another member suggested asking Perplexity for a solution, since they are giving out free 12 month pro passes.
OpenAI â· #gpt-4-discussions (1 messages):
ChatGPT Agent Mode, ChatGPT Companion Mode, Mode-Locking, Mode-Switching, Tracking KPIs
- ChatGPT defaults to âAgentâ Mode: By default, ChatGPT assumes an âAgentâ persona which makes it a problem-solver and instructable worker.
- To change this, users must instruct it to switch to âCompanionâ mode.
- âMode-Lockingâ keeps ChatGPT in âCompanionâ Mode: To keep ChatGPT in âCompanionâ mode (co-creator, guide), users can add a pinned instruction or reusable starter prompt.
- For example, you could say: âStay in Companion mode unless I explicitly say switch to Agent. Companion = co-pilot, not order-taker.â
- âMode-Switchingâ command resets ChatGPT: If ChatGPT drifts back to Agent mode, users can simply say: âGo back to companion mode.â
- This command resets the ChatGPT bot to its original state.
- Tracking Key Performance Indicators: Users should track the consistency of ChatGPTâs mode, for example, whether 90%+ of sessions behave as intended with pinned prompt.
- This helps users understand how often they must reset the bot.
OpenAI â· #prompt-engineering (28 messagesđ„):
Chain of Thought (CoT) Prompting, Deep Research for Prompting, Translation Prompting Strategies, Interactive Prompting Infographics
- CoT Prompting: Less Is More?: It was suggested that adding excessive Chain of Thought (CoT) requests can confuse the model and reduce performance, as models are already built to utilize CoT.
- Experimentation is key, as one should guide the model to solve specific problems rather than over-prompting with generic CoT requests.
- Reverse Engineer Prompts: The Yugoslavia Example: When translating, provide context about the target audience to improve results, for example by saying Weâre translating this for a woman who grew up in Yugoslavia in the 1940s, she has a 3rd grade education, so we need to phrase this for her.
- This specificity helps the model tailor the translation effectively.
- Deep Research is good for answering questions: It was said that the best way to answer questions, when direct links are unavailable, is through Deep Research.
- One user experienced an unusually long wait time while attempting Deep Research which was considered a bummer. The user shared some shareable ChatGPT links, however, some users encountered 404 errors.
- Interactive Infographic for CoT Prompting: An interactive infographic was created in a canvas to test Chain-of-Thought prompting, including visibility toggles, a task selector, a thinking-time slider, and copy-ready prompt cards.
- The infographic includes prompt cards for direct prompts, explain-after prompts, verify-then-respond prompts, translation refinement prompts, long-context prompts, and latency budget prompts.
OpenAI â· #api-discussions (28 messagesđ„):
Chain of Thought Prompting, Model Performance, Prompt Engineering, Interactive Infographic for CoT
- Chain of Thought Prompting: Overkill?: A member suggested that adding excessive Chain of Thought (CoT) prompting can statistically reduce model performance, especially on current âthinkingâ models.
- They suggested that prompts should focus on the desired outcome rather than forcing a specific thought process, and experimenting to solve specific problems rather than blindly applying CoT.
- Crafting a Surfer-Style Essay on Apples: A member shared examples on how to write a quality essay about apples from a surferâs point of view, with an example of a ChatGPT share link.
- They argued that specifying the persona directly in the prompt yields a more embodied and effective result, contrasting it with a method involving explicit chain-of-thought bullet points.
- Interactive Infographic for CoT Prompting: A member shared an interactive infographic built with React for Chain-of-Thought prompting.
- The tool includes visibility toggles, a task selector, a thinking-time slider with an S-curve, and copy-ready prompt cards, and is packaged as a single-file React component.
HuggingFace â· #general (95 messagesđ„đ„):
Huggingface cache deletion, MariaDB hackathon, Language learning apps, Qwen model reasoning, LinkedIn content
- Huggingface Cache Gets the Boot: A user deleted 100GB of Hugging Face cache data, noting that datasets can persist indefinitely.
- They added that downloading the same datasets repeatedly can be frustrating.
- Torch the Bird Alive!: A user trashed language learning apps, emphasizing that one unnamed app sucks if one aims to learn a language through it, and another user said i would torch the bird alive if that were an option.
- Another user said that they deleted an unnamed app because it was a waste of time.
- LinkedIn Gets âUnhinged Slopâ: One user joked that they live for the linkedin slop.
- Another user said that they post the most unhinged shit to get eyes on their posts and win.
- HF Forums: Listed vs. Unlisted: Someone asked about the meaning of listing/unlisting posts in Hugging Faceâs discuss forums.
- No direct answer was provided in the messages.
- Qwen 2.5 Models Overload HF: Users noticed that someone is flooding Hugging Face with Qwen 2.5 models following the naming convention Qwen2.5-0.5B-Instruct-randomword1-randomword2-randomword3, linking it to Gensyn.
- The motivation is suspected to be SEO-related, inflating the model count with easier-to-post smaller models and linking back to gensyn.ai for promotional purposes.
HuggingFace â· #today-im-learning (1 messages):
GPU, monitor, drivers, Linux, Windows
- Monitor blacks out when GPU heats up: A user reported that every time the GPU fires up, the monitor goes black in both Windows and Linux.
- Theyâve tried to correct the drivers multiple times and are frustrated that they have to run the monitor off the motherboard.
- Troubleshooting Black Screen on GPU Activation: The user is facing a persistent issue where their monitor goes black whenever their GPU activates.
- Despite numerous attempts to correct the drivers across both Windows and Linux, the problem persists, forcing them to rely on the motherboard for monitor output.
HuggingFace â· #cool-finds (2 messages):
trade-bench.live, UIUC students finance work
- UIUC students publish finance work: A member shared a link to trade-bench.live, showcasing work by UIUC students in the finance domain.
- The member admitted to not understanding much of it, inviting others with finance expertise to provide insights on the project which they found drab.
- Request for Finance Insights: The member expressed hope that someone in finance would check out the resource.
- They also invited people to share insights and clarifications, indicating they found it difficult to grasp.
HuggingFace â· #smol-course (6 messages):
OOM Error on 3090, PEFT runs successful locally, SFTTrainer writes fine tuned model
- 3090 GPU runs out of Memory: A member got an OOM error on a 3090 (Linux), even without LoRA, while trying to allocate 20.00 MiB on a GPU with 23.55 GiB total capacity.
- Itâs unclear whether fine-tuning should work in 24G GPU RAM without LoRA.
- Local PEFT Runs Finally Succeeding: After working through some issues with the LoRA config, a member reported finally getting some successful PEFT runs locally.
- No further details were provided regarding the specifics of the resolved issues.
- SFTTrainer auto-writes fine-tuned Models if output_dir is set: A member inquired whether SFTTrainer automatically writes the fine-tuned model if
output_dir
is set.- The member later confirmed that yes, SFTTrainer does automatically write the fine-tuned model if
output_dir
is set.
- The member later confirmed that yes, SFTTrainer does automatically write the fine-tuned model if
HuggingFace â· #agents-course (2 messages):
Global Greetings, Course Kickoff
- Buckeyes and Buenas Dias!: Enthusiastic members from Ohio, USA and Madrid chime in to say hello!
- The international community eagerly anticipates a curso magnĂfico.
- Course Commences!: At least one participant announces they are starting the course today.
- Many others will soon follow, hoping for a transformative learning experience.
GPU MODE â· #general (2 messages):
Hopper TMA, Minimal Matmul Kernel, CWM paper from FAIR
- Hopper TMA Kernel Quest Begins: A member is seeking a minimal matmul kernel implementation utilizing the Hopper TMA (Tensor Memory Accelerator) in raw CUDA, and not relying on CUTLASS or Triton.
- The search is inspired by the new CWM (Causal World Models) paper from FAIR.
- CWM Paper Sparks TMA Interest: The new CWM paper from FAIR seems to be driving interest in optimized matmul kernels using Hopperâs TMA.
- The request specifies a need for a minimal implementation, suggesting an interest in understanding the fundamentals of TMA integration.
GPU MODE â· #cuda (11 messagesđ„):
cuda headers, smem bank conflicts, cudaGraphicsGLRegisterImage and tex2d are undefined, TMA matmul kernel
- Kernel Iteration Computations Get Functional: A member suggested using a function
compute_iter<Is_first, Is_last, ...>(*args, **kwargs)
inside the loop, with a call tocompute_iter<False, False>
within the kernel.- Another user thought this was a very good idea.
- Lambda Kernels Limit Argument Litter: A member suggested using a lambda within the kernel to avoid writing a separate
__device__
function with a lot of arguments.- This allows calling the lambda inside and outside the main loop.
- NCU Profiling Finally Finds SMEM Snags: A user learned how to verify if a kernel has smem bank conflicts through ncu profiling.
- The user was wondering what the number wrapped around curly brackets means.
- CUDA Headers Cause Conundrums: A user reported a weird issue where cuda headers werenât being automatically included, resulting in undefined functions like
cudaGraphicsGLRegisterImage
andtex2d
.- Including
cuda_gl_interop.h
fixed the issue forcudaGraphicsGLRegisterImage
and the issue persisted even when creating a new project with the CUDA default template in Visual Studio 2022.
- Including
- WMMA Kernel Catches Unspecified Launch Crash: A user is facing an
unspecified launch failure
with a wmma kernel.- The user is trying to implement a minimum matmul kernel that uses the TMA.
GPU MODE â· #torch (3 messages):
torchrun API, torchrun --help
- Torchrun API Usage Confusion: A user inquired about the usage of the new torchrun API and reported that
uv run torchrun --help
shows different options compared to the official documentation. - Discrepancy in Torchrun Help Output: The output of
uv run torchrun --help
displayed a different set of options than expected based on the PyTorch Elastic documentation, causing confusion about the correct usage.
GPU MODE â· #cool-links (10 messagesđ„):
CUDA and Triton, PTX Memory Consistency Model, Compound Memory Models, GPU Consistency Analysis, Dat3M Verification Tool
- PTX Data Races Formally Analyzed: A formal analysis of the NVIDIA PTX Memory Consistency Model (ACM link) explores how languages like CUDA and Triton can target PTX with memory consistency, even though PTX allows for data races.
- Compound Memory Models Compositionally Amalgamate: The compound memory model is a compositional amalgamation where threads from each device continue to adhere to the memory ordering rules of that deviceâs original memory model, according to the PLDI 2023 paper (DOI link, PDF link).
- Unified GPU Consistency Analysis Proposed: The ASPLOS 2024 paper Towards Unified Analysis of GPU Consistency (DOI link, PDF link) notes that while CPU consistency guarantees are well-understood, the same isnât true for GPUs.
- Dat3M Tool Verifies Memory Models: The Dat3M tool (GitHub link) translates PTX and Vulkan models into the Dartagnan verification tool, making it the first analysis tool for multiple GPU consistency models.
- Missing PTX Fences Identified: A member highlighted the automated identification of missing fences in PTX, as demonstrated in Figure 12 of a research paper.
- Another member suggested implementing such checks at the NVVM IR layer instead of PTX.
GPU MODE â· #beginner (2 messages):
Inter-warp operations, Intra-warp operations, Independent thread scheduling, NVIDIA GPUs, CTA clusters
- Warped Minds Mulling NVIDIAâs Thread Scheduling: A member inquired about a good blog post explaining inter-warp and intra-warp operations behavior in NVIDIA GPUs with independent thread scheduling.
- The member is particularly confused when dealing with a cluster of CTAs or a multi-CTA matmul, wondering about thread execution guarantees in architectures since Volta.
- Unraveling the Mysteries of NVIDIAâs Warp Operations: The discussion revolves around understanding how inter-warp and intra-warp operations behave on NVIDIA GPUs when independent thread scheduling is enabled.
- The key concern is the unpredictable execution of threads within a warp, particularly in scenarios like multi-CTA matmuls where multiple SMs access each otherâs shared memory.
GPU MODE â· #triton-puzzles (1 messages):
Puzzle difficulty, Puzzle completion time
- Adventurers Gauge Puzzle Difficulty: Several adventurers inquired about the difficulty of the puzzles and the typical time required for completion, seeking to gauge the challenge.
- Some sought to benchmark against others, but no conclusion was reached due to lack of shared timing or concrete metrics.
- No Triton Puzzles Completed Yet: Currently there were no credible Triton puzzle completion times to compare experiences.
- Most adventurers are still at the starting line, and none have crossed the finish line to report any reliable data.
GPU MODE â· #self-promotion (1 messages):
LLM serving, Embeddings Pricing, Kernel Profiling
- Embeddings Pricing Exposed via Kernel Profiling: A member shared a Substack article detailing kernel profiling techniques to understand the profit margins of serving LLMs.
- The author also shared a link to his X/Twitter post related to the article.
- Dive into Underlaying Kernels for Embedding Production: A member has investigated the underlying kernels used to produce embeddings and shared his findings.
- He suggests that profiling and investigating kernels can provide insights into the profit margins of serving LLMs, referencing his new Substack post for further details.
GPU MODE â· #đż (5 messages):
Code Generation, Two-Stage Approach, Model Performance
- Code Gen Tackles Raw Syntax: Members discussed that code generation often uses raw syntax without constraints, providing guarantees via a formal grammar but sacrificing the natural language component.
- They noted that humans donât typically code while thinking about the underlying grammar expected by the compiler, suggesting potential for training or fine-tuning a model to do so.
- Two-Stage Approach Emerges: Someone suggested a two-stage approach: pseudo-code generation followed by formal grammar translation.
- They noted that the conversation also touched on the impact on model performance due to added constraints and the reduction of âdegrees of freedomâ for code generation.
GPU MODE â· #thunderkittens (5 messages):
H100 matmul kernel runtime error, nvshmem usage in paper 2
- H100 Kernel Crashes with Runtime Error: A member reported a runtime error with the H100 matmul kernel on Ubuntu 24.04, CUDA 12.8, PyTorch 2.7.0a0+nv25.03, TensorRT 10.9, and an NVIDIA H100 80GB HBM3 GPU, and provided full logs and build/run details.
- The error is: std::runtime_error: Error in tile TMA descriptor creation: unspecified launch failure.
- nvshmem Inclusion Postponed to Paper 2: A member inquired about the absence of nvshmem usage, and it was indicated that nvshmem usage is planned for paper 2, as illustrated in the attached image.
GPU MODE â· #submissions (17 messagesđ„):
MI300x8, amd-all2all leaderboard, amd-gemm-rs leaderboard
- MI300x8 Achieves Personal Best on amd-all2all: A member achieved a personal best of 1923 ”s on MI300x8 for the
amd-all2all
leaderboard.- Other submissions on the
amd-all2all
leaderboard for MI300x8 ranged from 1939 ”s to 2.12 ms.
- Other submissions on the
- amd-all2all leaderboard gets filled with MI300x8 results: There were several successful submissions to the
amd-all2all
leaderboard using MI300x8, with times of 108 ms, 25.2 ms, 25.4 ms, 28.0 ms, 25.3 ms, and 4.70 ms. - MI300x8 Excels on amd-gemm-rs Leaderboard: Submissions to the
amd-gemm-rs
leaderboard using MI300x8 achieved times between 572 ”s and 581 ”s.
GPU MODE â· #hardware (4 messages):
Voltage Park H100s Donation, Nebius Exclusive Sponsorship
- Voltage Park offers H100s donation: A representative from Voltage Park offered to donate H100s for an upcoming hackathon.
- However, the offer was declined due to an exclusive sponsorship agreement with Nebius for this particular hackathon.
- Nebius Secures Exclusive Hackathon Sponsorship: The GPU MODE hackathon secured an exclusive sponsorship with Nebius, preventing acceptance of other donations for this event.
- Organizers expressed interest in collaborating with Voltage Park on future events and offered to discuss opportunities further.
GPU MODE â· #factorio-learning-env (2 messages):
FLE Eval System Prompt, Image Analysis PR
- FLE Eval System Prompt Shared: A member shared a system prompt in FLE eval, attaching a file named agent0_system_prompt.txt from Discord CDN.
- The link provided is agent0_system_prompt.txt.
- Image Analysis PR Coming Soon: The same member mentioned their Image Analysis PR will be submitted the next day.
- This suggests ongoing development or updates related to image analysis functionalities within the project.
GPU MODE â· #amd-competition (4 messages):
GEMM-RS atomic writes optimization with Iris, Iris shared memory initialization, GEMM-RS bias handling
- Optimizing GEMM-RS Atomic Writes via Iris: A member reported that while Iris worked, optimizing GEMM-RS with atomic writes proved challenging to accelerate.
- They were advised to initialize the Iris shared memory inside the class, rather than strictly using it as an allocator.
- GEMM-RS Bias Variations Explored: A member tested three GEMM-RS variations, including one without bias addition and one always adding bias, to find optimizations when bias is None.
- The member found that variations timed out, or failed to raise TypeErrors.
GPU MODE â· #cutlass (3 messages):
Refinement hierarchy, TmemAllocator vs cute.arch.alloc_tmem
- Refinement Creates Hierarchy: A member posited that refinement can be viewed as a hierarchy, where a value refines another if it can be derived from it, using the examples that
((6,4))
refines(24)
becausesize((6,4))=24
, but not the opposite direction.- They likened this to splitting a single mode into more complex patterns in one dimension, and drew a rough analogy to the relationship between an ordinary vector and a matrix of shape
(M, 1)
.
- They likened this to splitting a single mode into more complex patterns in one dimension, and drew a rough analogy to the relationship between an ordinary vector and a matrix of shape
- TmemAllocator Throwdown: A user inquired about the difference between instantiating
TmemAllocator
and allocating from it versus usingcute.arch.alloc_tmem
in cutedsl.- No answer was given.
GPU MODE â· #mojo (1 messages):
Mojo Metal GPU target, Custom bitcode writer
- Mojo Targets Metal GPUs with Custom Bitcode: The new Metal GPU target in Mojo has generated excitement, particularly the availability of a custom bitcode writer for DSL targeting.
- This work may be reusable for those interested in targeting specific DSLs at Metal GPUs.
- Bitcode Writer Reusability: The user inquired whether the custom bitcode writer code for the Metal GPU target in Mojo is available and reusable.
- There is particular interest in leveraging this work to target specific DSLs at Metal GPUs.
GPU MODE â· #singularity-systems (6 messages):
Picograd's tensor and engine, Eager and lazy execution policies, Tinygrad's architecture, Graph compiler, Shipping incremental intermediaries
- Picogradâs Load-Bearing Tensor and Engine: The load-bearing part of Picograd is the tensor and engine, where the tensor will have two execution policies: eager and lazy.
- The former is a handle on device-allocated storage and the latter is sugar for a uop graph that will be compiled.
- Picograd Copies Tinygradâs Architecture: The member is directly and shamelessly copying tinygradâs architecture to simplify design decisions and bridge the same semantic gap as tinygradâs compiler.
- They stated that the target is no triton or openmp.
- Picogradâs CI Fuzzing Against Oracles: The member plans to set up CI to fuzz against numpy and torch oracles once they get the vertical slice of a forward and backward pass.
- They will then stop merging directly to master and focus on shipping code and book for eager mode.
GPU MODE â· #cluster-management (2 messages):
Singularity, Apptainer, Sylabs, Linux Foundation
- Singularity Forked, Renamed Apptainer: The open source project previously known as Singularity was renamed to Apptainer when it became part of the Linux Foundation, likely to distinguish it from the commercial fork called Singularity[CE] by Sylabs.
- Despite the renaming, Apptainer might still support the singularity alias for the CLI.
- Sylabs Commercial Fork: Sylabs maintains a commercial fork of the original Singularity project, called Singularity[CE].
- This is distinct from the open-source Apptainer project, which is now under the Linux Foundation.
LM Studio â· #general (37 messagesđ„):
Seed-OSS thinking budget, Conversation.json to markdown, LM Studio Plugins in Linux, Ollama Fine Tuning, LoRA injection into models
- Set Thinking Budget for Seed-OSS: A user asked how to set the thinking budget for Seed-OSS.
- Markdown Parser Sought: A member is looking for a good way to parse
.conversation.json
files into a human-readable markdown format due to the variability in models and regeneration versioning. - LM Studio Linux Plugin Availability: A user reported that the Linux version of LM Studio doesnât offer as many plugins as the Windows version.
- Debate Erupts on Ollama Fine-Tuning: A discussion ensued whether injecting data and using LoRA with Ollama constitutes fine-tuning, with claims that the knowledge is baked into the model files themselves, and isnât just a system prompt.
- Ollama LoRA injection confirmed: A user confirmed that Ollama not only supports running models locally, but also allows injecting LoRA weights, customizing system prompts, and even creating your own model variants where knowledge is directly embedded into the modelâs structure.
- However, they noted that it needs some setup, itâs not like just there.
LM Studio â· #hardware-discussion (13 messagesđ„):
Budget GPUs, Tesla K80s
- Budget GPUs Recommendations: For budget consumer GPUs, recommendations included the 2060 12GB at $150, 3060 12GB for $200, 2080ti for ~$230, and 5060ti 16GB for $400 if buying new.
- A used 3090 was also suggested, but another user pointed out it costs $700, hardly budget.
- Tesla K80s deemed e-waste: A question was posed on whether Tesla K80s are viable given their price range of $200-300 for refurbs.
- One user responded that Tesla generation is not recommended for AI/LLM use anymore tbh, basically e-waste.
Eleuther â· #general (3 messages):
Measuring AI Dialogue Coherence and Novelty, NYC Meetup in Central Park
- Swiss Researcher Queries AI Dialogue Metrics: An interdisciplinary researcher from Switzerland asked the tech community about the importance of measuring coherence and novelty in AI dialogue.
- The researcherâs background is in International Relations, hinting at a potential interest in applying these metrics to analyze AIâs role in global communication.
- EleutherAI NYC Meetup Announced: A member announced a NYC Meetup planned for Saturday afternoon in Central Park, with a link to a Discord channel for details.
- They also linked to a Twitter post to gauge interest in this direction.
Eleuther â· #research (12 messagesđ„):
DeepIgnorence Generalization Difficulty, Mathematical Formalism for Knowledge Completion, CFG on Style Transfer, Data Centric Approaches to ML/AI
- DeepIgnorenceâs Generalization Difficulties Highlighted: A member discussed how DeepIgnorence requires a difficult type of generalization, noting that models excel at style transfer but struggle with more complex inference.
- For example, we should not expect to be able to train on clothed minors and nude adults and have an image model that canât generate CSAM. Thatâs effectively style transfer, which is something models are extremely good at.
- Mathematical Formalism for Knowledge Completion Sought: A member inquired about a mathematical formalism to distinguish settings where knowledge completion works, highlighting the complexity of the problem.
- They suggested that in the worst case, it seems information theoretic, i.e., a model will not be able to reason its way to a very specific fact which is independent of other knowledge and unknown to the model.
- Discussion on CFGâs impact on Style Transfer: Members discussed the effect of techniques like CFG on style transfer.
- One member heard anecdotally that models that donât use it cannot perform style transfer as well. If so, maybe some research can be done with the LLM equivalent of CFG to see if it can bridge the gap between concepts to fill in missing knowledge.
- AI Engineer Seeks Research Collaboration: An AI engineer with a background in applied math and computer science from Imperial College London and Oxford is seeking research collaborations.
- They aim to use competition winnings to fund research and transition from industry to academia, focusing on data-centric approaches to machine learning/AI.
- Style Transfer and Knowledge Gaps: Different Sides of the Same Coin?: Members debated whether style transfer and closing knowledge gaps are fundamentally different, or related.
- One member thinks both tasks could be seen as attempts to generate samples not present in the training data from nearby data samples in the training data and that style transfer just seems like an easier task in that vein.
Eleuther â· #lm-thunderdome (3 messages):
GSM8k Benchmark, flexible-extract, strict-match
- Flexible Extraction Fails GSM8k Benchmark: In GSM8k benchmark version 3,
flexible-extract
scored 0.3594 which is worse thanstrict-match
which scored 0.5742 inexact_match
metric with 5-shot learning. - Funny benchmark: A member found it funny, saying haha how can flexible be worse than strict.
Eleuther â· #multimodal-general (1 messages):
Benchmarking Prompting Methods in VLMs, Interpretability Studies on VLMs, Ineffectiveness of LLM Prompting Techniques for VLMs, Mech-Interpretability Probing Study for VLMs
- Prompting Methods Benchmarked in VLMs?: A member is seeking studies that benchmark different prompting methods in VLMs and interpretability studies explaining their effectiveness.
- Normal LLM Prompting Ineffective for VLMs?: The user notes having seen several studies which discuss how ineffective normal LLM prompting techniques are for VLMs.
- Mech-Interp Probing Study for VLMs?: The user considers if a mech-Interp oriented probing study might be helpful, but is unsure of how to begin.
Moonshot AI (Kimi K-2) â· #general-chat (15 messagesđ„):
Kimi doesn't encourage delusions, Mini version of Kimi, Qwen model distilled on K2
- Kimi Doesnât Encourage Delusions: A member shared that they were testing Kimi and love that it doesnât encourage delusions.
- Another member shared an image and Kimiâs analysis was âare you serious?â.
- Kimiâs Cool Response goes Viral: A member shared this link saying âThis is so cool from Kimiâ.
- The message being replied to was a user humoring the idea that private voices in your head are Jesus, scripture mentions pets getting raptured, and the whole 2025 date is baseless hype, and Kimiâs response bluntly denied all claims.
- Mini-Kimi Distilled on Qwen?: A member wondered if there will ever be a mini version of Kimi with the same writing style but smaller.
- Another member doubts this is in the Moonshot teamâs interests, suggesting that the best bet would be a smaller Qwen model distilled on K2.
- Qwen Distills Reasoning with Kimi: A member doubts the rationality of distilling a Qwen model, arguing that Deepseek only did it because Qwen lacked good reasoning until Qwen 2.5.
- Another member countered that K2 has a different style of problem-solving and excellent writing, so a smaller Qwen3 model could benefit from distillation in certain attributes like prose and referencing obscure knowledge.
Latent Space â· #ai-general-chat (11 messagesđ„):
Gemini Live Model, Chrome DevTools MCP, AI Coding Agents
- Gemini goes Live with killer Audio: Googleâs Logan Kilpatrick announced a new Gemini Live model with native audio, improved function calling, and more natural conversations, as announced on X.
- Early users praise the flow and accents, but report iOS Safari issues, background-noise sensitivity, session-length limits, STT accuracy with accents, overly cautious censorship, missing price transparency, and desire for embodiment / wearables.
- Chrome DevTools MCP opens for AI Agents: Google announced the public preview of Chrome DevTools MCP, a new server that lets AI coding agents (Claude Code, Cursor, VS Code, Gemini, etc.) control and inspect a live Chrome browser through CDP/Puppeteer, as announced on X.
- Agents can now run performance traces, examine the DOM/console, capture screenshots and network traffic, and debug web apps in real time with one-line installation via npx.
MCP Contributors (Official) â· #general (1 messages):
glassbeadaleph: i think so, give me one second
MCP Contributors (Official) â· #general-wg (9 messagesđ„):
Embedded Resources title vs name, Claude Code, ReadResourceResult contents array
- Embedded Resources Lacks Title and Name: A member noted discrepancies in the Model Context Protocol documentation where embedded resources are implied to have a title, which is missing in
schema.ts
, and questioned the absence of a name field to match the Resource object.- It was argued that both title and name might be necessary because embedded resources arenât always retrievable via a read resource call.
- Claude Code Debated for SEP Documentation: A member suggested using Claude Code to write a SEP (Standard Enhancement Proposal) documentation, calling it a good test for the toolâs capabilities.
- Another member thought that getting an SEP for the topic would be relatively easy.
- ReadResourceResultâs contents Array Questioned: A discussion arose around the
ReadResourceResult.contents
array in this GitHub issue, questioning its intended use and semantics, as it is undocumented.- One member provided an example of a Web Resource, consisting of html and associated images, or scenarios where tokenizable/renderable mime types havenât been negotiated, to explain its use.
Nous Research AI â· #general (8 messagesđ„):
Anthropic Misuse Report, Cybercrime and AI, AI-fabricated credentials
- Anthropic report: AI misuse focuses on cybercrime: A member shared Anthropicâs report on detecting and countering AI misuse, highlighting that the actual threats are low-grade cybercrime, or vibe hacking.
- The discussion touched on whether applying for jobs with fabricated credentials is illegal, regardless of location, and the report specifically mentions completely fabricated masterâs degrees.
- LLMs automate personal life: A member noted that an LLM did all the legwork in achieving a recent accomplishment.
- According to them, all they had to do was spend many hours self-reflecting and feeding info about myself into the AI.
aider (Paul Gauthier) â· #questions-and-tips (6 messages):
Aider's /clear command, Aider access to Internet search
- â/clearâ command clears chat history, but not context: Users clarified that the
/clear
command only clears the chat history, but added files remain in the context.- The command
/context
can be used to check how many tokens are dedicated to each file.
- The command
- Aider Lacks Native Internet Search, but scrapes URLs instead.: A user inquired about giving aider access to Internet search.
- Another user clarified that this is not possible with the main branch, but you can instead use
/web https://www.example.com/
to scrape a website.
- Another user clarified that this is not possible with the main branch, but you can instead use
Yannick Kilcher â· #general (3 messages):
Saturday evening talks, Reading papers before talks
- Saturday Evening Talks Anticipation: A member looks forward to the Saturday evening (European time) talks.
- The announcement came earlier in the week.
- Pre-Talk Paper Reading: A member expressed a desire to read papers before the talks to better follow Yannick or the presenter.
- This would enhance their understanding and engagement during the sessions.
Yannick Kilcher â· #paper-discussion (2 messages):
Hyperparameters for Diffusion Models, ODE Solvers vs DPM++2m, Applications of Fast Inference, Diffusion Efficiency Research
- Hyperparameters Generate Images Like Distillation Models!: The author of âHyperparameters are all you needâ will present their paper, which uses a five-step inference method for a diffusion model.
- Key results show 8-step inference beats DPM++2mâs 20-step inference in FID scores with ~60% reduction in computational cost, using existing models without retraining.
- ODE Solvers Outperform DPM++2m in Fewer Steps: According to the paper, an 8-step Diffusion ODE Solver outperforms 20-step DPM++2m without needing additional training.
- The author seeks feedback, collaborators, and ideas for applications where inference speed is critical, especially from those working on diffusion efficiency, inviting discussion on ODE solver improvements.
- ArXiv Paper about to be reviewed: A user announced that they will be reviewing this paper soon.
Yannick Kilcher â· #ml-news (1 messages):
.neoneye: https://x.com/Alibaba_Qwen/status/1970599323013652705
Manus.im Discord â· #general (5 messages):
Manus PDF download issues, Beta Pro Access
- Manus PDF Download Stymied: A user reported that Manus was getting stuck while downloading a PDF for researching accounts and even after manually downloading the file and providing a link, Manus kept asking to upload the file.
- The user sought advice on resolving this issue, but the conversation ended there.
- Seeking Beta Pro Access: A user inquired about obtaining access to Beta Pro.
- The discussion ended without a response, leaving the method for acquiring Beta Pro access unresolved.
Modular (Mojo đ„) â· #general (3 messages):
Modular contributor, Contributing to Mojo
- Users ask about contributing to Modular: A user inquired about contributing their talents to Modular.
- They were asked to DM a staff member for further discussion.
- Contributor Opportunities Explored: A member expressed interest in leveraging their skills to support Modularâs services.
- A staff member suggested direct messaging to explore potential collaboration avenues.
Modular (Mojo đ„) â· #announcements (1 messages):
New Fundraising, Unified compute layer
- Modular Closes $250M Funding Round!: Modular announces it has raised $250M to accelerate building AIâs unified compute layer.
- The team expressed gratitude to the community for its contributions, feedback, and momentum, and promised to empower the community with more features in the coming year.
- Community Momentum Fuels Funding Success: The funding success is attributed to the communityâs invaluable contributions and feedback.
- The company commits to enhancing community empowerment through feature enhancements and expedited response to feedback.
tinygrad (George Hotz) â· #general (3 messages):
clspv build errors, Python bindings for clspv
- clspv Main Branch Plagued by Build Errors: The main branch of clspv is currently failing to build due to errors, but a user found that reverting to previous commits resolves the issue and shared a forked repository with a working stable branch.
- Users can pull the forked repository and checkout the stable branch to build clspv successfully.
- Python Bindings in the Works for clspv: A user is developing Python bindings for clspv, with the goal of enabling direct installation via pip using a single command.
- This enhancement would streamline the installation process, making clspv more accessible to Python developers.
DSPy â· #general (1 messages):
DSPy attachments, UV Tooling
- Attachments Add-On Attracts Attention: The
attachments
add on for DSPy is useful for adding new files, easily!- It is a standalone
uv add
for python.
- It is a standalone
- UV Tooling Integration: Discussion highlights the ease of adding new files using the
attachments
add-on within the DSPy framework.- The add-on is noted for its standalone
uv add
functionality, streamlining the process for Python projects.
- The add-on is noted for its standalone