Realtime is all you need?
AI News for 8/27/2025-8/28/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (185 channels, and 7363 messages) for you. Estimated reading time saved (at 200wpm): 577 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
The Realtime API has been in preview, and now is in GA, with image inputs, remote MCP server support, SIP/PBX support and prompt caching, and better function calling. Alongside it, thereās a new realtime model! unfortunately not gpt5-realtime⦠itās still a marginally smarter model, just that most of the improvements are āAPI centricā, aka function calling/instruction following.
There are 2 new voices and the voice control is unquantifiable but worth trying it out:
AI Twitter Recap
OpenAIās gpt-realtime and Realtime API GA (voice agents, telephony, tools)
- gptārealtime model + Realtime API GA: OpenAI shipped its most advanced speech-to-speech model and took the Realtime API to GA with substantial capability and cost updates. Highlights: improved instruction following, tool calling, prosody and non-verbal cues, multilingual switching; new voices (Cedar, Marin); image input; remote MCP tool support; SIP telephony; new WebRTC APIs (server websocket control, video) and a ~20% price cut. Pricing shared by the community: ~$32/1M audio input tokens (cacheable at $0.40/1M) and $64/1M audio output tokens. Benchmarks vs GPTā4oārealtime suggest sizable gains on BigBench, ComplexFuncBench, and audio instruction-following. Demos include a Notion MCP example and WebRTC/SIP starter code. Threads: @OpenAI, @OpenAIDevs, API details by @juberti, pricing by @omarsar0, bench take by @reach_vb, MCP demo by @pbbakkum.
- Developer notes: The new all-in-one WebRTC API removes the ephemeral token step and supports video on the same connection; SIP endpoints enable call routing, transfer and hangup APIs for production call flows. Cookbook guidance covers voice prompt design (speed, tone, handoffs). See WebRTC API update and SIP details.
Coding Models and Dev Tooling: xAIās Grok Code Fast 1, OpenAI Codex, editors/CLIs
- xAIās Grok Code Fast 1: A āspeed-first,ā economical reasoning model for agentic coding, free for a week and integrated across popular IDEs/tools (GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, Windsurf). The team emphasizes rapid rollout iterations and human+auto evals for real-world usefulness beyond benchmarks. Community tests are positive, and Cline added āthree ways to code freeā (cloud via Grok, local via LM Studio, or Qwen Code with generous daily limits). Announcements and context: @xai, @skcd42, @MohitReddy13, @cline launch thread.
- OpenAI Codex push (new stack integration): OpenAIās Codex got a major upgrade: IDE extensions (Cursor/VSCode/Windsurf), a much-improved local CLI, unified local+cloud task management, and GitHub code reviews. Commentary notes deeper integration across the dev stack, including local/remote workflows. Engagement indicates strong reception. Threads: @kevinweil, @gdb, @sama.
- Ecosystem improvements: Googleās Gemini CLI landed native integration in Zed (multi-folder IDE mode, diff stats, better stability; community-driven PRs), easing multi-editor workflows (@_philschmid). OpenAIās Realtime GA also unlocks voice-first coding assistants (MCP over voice).
New Models and Benchmarks: Microsoft MAI, Cohere Translate, Tencent TV2A, GLMā4.5
- Microsoft MAIā1āpreview (text) and MAIāVoiceā1: Microsoft introduced its first ināhouse models. MAIā1āpreview entered the LMArena text leaderboard at #13 on debut; MAIāVoiceā1 targets highāquality speech generation (public testing encouraged). Microsoft signals rapid iteration and distribution via its product surface. Details: @mustafasuleyman, @lmarena_ai, @yusuf_i_mehdi.
- Cohere Command A Translate: A taskāspecialized translation model with strong thirdāparty validation from RWS/Language Weaver. Community reaction is that domainātrained translation outperforms frontier generalists (even GPTā5) on complex multiādomain tasks. More in Cohereās blog and community takes by @nickfrosst.
- Tencent HunyuanVideoāFoley (TV2A): Endātoāend text/videoātoāaudio framework trained on ~100k hours with an MMDiT backbone, REPA loss, and Audio VAEāreporting SOTA across audio quality, visualāsemantic, and temporal alignment. Code, report, and HF weights are public (announcement).
- Zhipu AI GLMā4.5: Now leading Berkeleyās FunctionāCalling Leaderboard V4, reinforcing GLMā4.5ās tool-use capability in practical APIācalling tasks (results).
Agent Systems, Evals, and Patterns
- Parallel agents as a scaling axis: Andrew Ng highlights parallel agent orchestration as the fourth scaling lever (after data, train compute, testātime compute). Expect more multiāagent research/cookbooks (research agents, background workers + UI monitors, mixtureāofāagents aggregators) as token prices fall and latency budgets tighten (thread).
- MemoryāR1 (RL for memoryful agents): GRPO variants significantly boost F1/BLEU/LaaJ on memory benchmarks (across Llamaā3.1ā8B and Qwenā2.5ā7B) with outcomeādriven rewards and tiny data (152 QA pairs). Gains compound with stronger memory managers; generalizes across backbones. Notes and links: @omarsar0.
- Agentic RAG and evalability: Elysia (openāsource agentic RAG) uses a decisionātree architecture, dynamic data displays, onādemand chunking, and feedbackāasāfewāshot to improve determinism and debuggability (overview). LlamaIndex shipped a multiāagent ācoding agentā that autoāgenerates document workflows (edit/test/configure, codeāfirst, orchestrated via LlamaIndex workflows) (demo). AI SDK v5 added LangSmith tracing for observability: token usage, tool traces, TTFT (@Hacubu). For rigorous searchāaugmented evaluation, Reka released ResearchāEval (374 diverse, highāquality questions; frontier models spread 26.7%ā59.1% acc), aiming beyond saturated SimpleQA/BrowseComp (@RekaAILabs).
- DSPy practice: Good discussion on dataācentric pipelines and where to put LLMs in the loop; optimizing via specs/evals before automation (fireside with @lateinteraction) (session).
Image/Video Gen: Nano Banana momentum, ByteDance USO, Runway in production
- Nano Banana (Gemini 2.5 Flash Image) as a builder workhorse: Heavy community use for personalizable styles, panel prompting, and mobile workflows; hackathon announced; Google showcased internal team work behind ābanana.ā Examples from Demis (isometric mapāgame idea), creative pipelines (glif agents; Suno for audio), and free hacks/promos accelerating adoption. Samples: @demishassabis, @OfficialLoganK, @tulseedoshi.
- ByteDance USO (Apacheā2.0) style transfer/editing: Openāsource text+imageādriven editing that ājust works,ā with HF demos and strong qualitative feedback from practitioners; a credible open alternative in the ānano bananaā era (overview).
- Runway Genā4 in production pipelines: Filmmaking partnership with Fabula illustrates how inācontext tools augment pro workflows instead of replacing craftācase studies show where prompting meets production reality (@runwayml). Also: testādriving Wan 2.2 S2V indicates audio preprocessing/finetuning still matter for musical alignment (@ostrisai). Separately, Moonshotās Kimi Slides introduced agentic deckābuilding (ideasādecks, future auto image search/layout/polish) (@Kimi_Moonshot).
Infrastructure and Strategy
- Compute build-out: Reporting suggests OpenAI and Oracle are planning a 4.5 GW data center build (Stargate), following a 1.2 GW Abilene, with SoftBank/Microsoft/NVIDIA as partners; rumored $30B/yr contract. Site selection ongoing (@DeepLearningAI).
- Platform share as national strategy: A policy thread argues U.S. dominance requires maximizing usage (tokens, models, developers) on American hardware/softwareāfavoring developer flywheels over export controls that inadvertently seed alternative stacks (Huawei+CloudMatrix+DeepSeek/Qwen) (@sriramk). Related metaāobservation: labs pretrain on the same internet, but reinforcement and postātraining choices (and product data) drive āspeciationā (@tszzl; @Yuchenj_UW).
Top tweets (by engagement)
- xAI released Grok Code Fast 1 (free for 7 days across major IDEs) @xai
- OpenAIās āDevs, tune inā livestream for Realtime API and gptārealtime @OpenAI
- OpenAI introduced gptārealtime and Realtime API GA @OpenAI
- Karpathy on āLLMifyingā textbooks and environments for aligned training data @karpathy
- āNano Bananaā community surge and hackathon announce @OfficialLoganK; Demisā isometric map post @demishassabis
- OpenAI Codex features resonating with developers @sama
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Z.AI GLM AMA + Mini MoE Roadmap
- **AMA With Z.AI, The Lab Behind GLM Models** (Score: 396, Comments: 314): AMA with Z.AI (creators of the GLM family) focuses on technical questions around GLM-4.5, especially post-training SFT for GLM-4.5 Airārequesting concrete hyperparameters (learning rate, batch size, epochs, dataset size, weight decay), target loss, and methods to avoid catastrophic forgetting, which commenters note arenāt detailed in the GLM 4.5 paper (pdf). A community finetune of GLM-4.5 is shared for reference (HF: GLM-Steam-106B-A12B-v1). Other questions probe what differentiates open-weight models (GLM-4.5, Kimi K2) from frontier closed systems (GPT-5, Gemini, Claude) and whatās required to close the gap, plus whether Z.AI plans >32B dense models versus leaning into Big MoE architectures. Commenters push for transparency and reproducibility (full SFT hyperparams and tuning targets) and debate whether open-weight efforts can realistically match or surpass closed frontier models. Thereās also interest in the architectural trade-offs and roadmap between scaling dense models (e.g., ~70B+) and investing in larger MoE systems.
- A commenter requests the exact SFT post-training recipe for GLMā4.5 Airālearning rate schedule, global batch size, number of epochs, dataset size/composition, weight decay, and any adapter strategiesāplus practical targets like cross-entropy loss/perplexity and methods to prevent ācatastrophic forgetting.ā They reference a community finetune GLM-Steam-106B-A12B-v1 and note the official paper lacks these details (arXiv:2508.06471). Theyāre seeking guidance on tuning GLMā4.5 Air (e.g., small LR, mixed replay from pretrain corpus, KL/L2 regularization, or gradual unfreezing) to avoid degradation during SFT.
- Another thread asks what openāweight models like GLMā4.5 and Kimi K2 need to do to catch up with closed frontier models (GPTā5, Gemini, Claude). The focus is on potential gaps in training compute, data quality/scale, RLHF/RLAIF and toolāuse pipelines, safety alignment, and evalādriven training; they probe whether improved scaling strategies, better data curation, and distillation from frontier models could close the gap and whether parity is feasible.
- Multiple questions probe Z.AIās scaling roadmap: continue with dense models >
32B
versus following the trend toward large MixtureāofāExperts (MoE). They ask whether SOTA closed models likely have more parameters than GLM and if increased parameter count is necessary for SOTAālevel performance, implicitly weighing training/inference cost, routing quality, and throughput benefits of sparsity against the stability/simplicity of dense70B
āclass models.
- **Launching Our New AMA Series With Z.AI, Creators of GLM (Tomorrow, 9AM-12PM PST)** (Score: 291, Comments: 26): r/LocalLLaMA is hosting an AMA with Z.AI, the team behind the GLM (General Language Model) family, scheduled for Thu, Aug 28, 2025, 9AMā12PM PST. The post is an image flyer announcing the event; no technical details or agenda items (e.g., GLM variants, benchmarks, local deployment specifics) are included in the image or title beyond timing and hosts. Commentary is mostly light/administrative (e.g., noting the AMA and subreddit naming humor) with no substantive technical discussion yet.
- Scheduling clarity: A bot corrects the event time to PDT (not PST) due to DST and links a time conversion: https://timee.io/20250828T1600?tl=Launching%20Our%20New%20AMA%20Series%20With%20Z.AI%2C%20Creators%20of%20GLM%20(Tomorrow,%209AM-12PM%20PST)&d=180. This maps the AMA to 9AMā12PM PDT (16:00ā19:00 UTC) with a
180
minute duration, reducing ambiguity for global attendees. - Roadmap interest: A commenter asks āglm 6 when?ā, signaling demand for details on the next GLM release timeline. While no specs are discussed in-thread, this points to expected AMA topics like version cadence and feature upgrades for future GLM iterations.
- Scheduling clarity: A bot corrects the event time to PDT (not PST) due to DST and links a time conversion: https://timee.io/20250828T1600?tl=Launching%20Our%20New%20AMA%20Series%20With%20Z.AI%2C%20Creators%20of%20GLM%20(Tomorrow,%209AM-12PM%20PST)&d=180. This maps the AMA to 9AMā12PM PDT (16:00ā19:00 UTC) with a
- glm mini will be comming (Score: 191, Comments: 22): In an AMA screenshot with the Z.ai/GLM team, a user asks about plans for smaller Mixture-of-Experts (MoE) models (e.g., OSS-20B or 30B-A3B), and a co-host confirms they plan to train a smaller MoE model comparable to GPT-OSS-20B. This suggests a forthcoming āGLM miniā MoE variant targeting lower active parameter counts for easier local inference while retaining strong capability, akin to Qwen 30B A3B-style configs. Image link. Commenters note Qwen 30B A3B performs well but its low active parameter budget hurts long-context reasoning; a hypothetical 38B A6B is proposed as a sweet spotāmore experts per token yet still locally runnable. Others ask for the AMA source/context, with OP stating itās from a current Z.ai team AMA.
- Discussion centers on Mixture-of-Experts designs: a user notes Qwen 30B A3B performs well but its low āactive parametersā per token appears to hurt longer-form reasoning, proposing a 38B A6B variant to boost active capacity while staying locally runnable. In MoE notation (e.g., Qwen2 57B-A14B), the āA#Bā denotes approximate active parameters per token, so moving from
~3B
to~6B
active could materially improve capability without the full compute of a dense 30ā40B model (Qwen2 MoE naming for context). - The AMA hint that āGLM miniā is coming raised ambiguity around a claim of being ācomparable to gpt-oss-20Bā; commenters question whether this refers to parameter count or actual quality. Historically, ācomparableā in these announcements often maps to model size rather than parity on benchmarks, where training data, compute budget, and instruction-tuning heavily affect outcomes (GLM family reference: ZhipuAI/GLM).
- On usability/local inference, the suggestion is that an A6B MoE could be widely runnable: MoE increases active compute only for a subset of experts per token, enabling higher effective capacity at similar step-time to much smaller dense models. Caveat: VRAM footprint can still be dominated by total parameters (all experts) unless the runtime supports expert sharding/offload; engines like vLLM have begun optimizing MoE loading and routing for practical deployment (vLLM MoE support).
- Discussion centers on Mixture-of-Experts designs: a user notes Qwen 30B A3B performs well but its low āactive parametersā per token appears to hurt longer-form reasoning, proposing a 38B A6B variant to boost active capacity while staying locally runnable. In MoE notation (e.g., Qwen2 57B-A14B), the āA#Bā denotes approximate active parameters per token, so moving from
- Again where behemoth and reasoning model from meta ?? (Score: 224, Comments: 66): The image is a promo slide for Metaās āLlama 4ā multimodal MoE lineup, highlighting āLlama 4 Behemothā as a 16āexpert MoE with
2T
total and288B
active parameters, positioned as an āintelligent teacherā for distillation; companion variants āMaverickā and āScoutā target speed/efficiency. The OPās title (āwhere behemoth and reasoning model from meta??ā) implies these large/āreasoningā models havenāt been publicly released; the slide emphasizes distillation and efficiency rather than availability. Image. Commenters are skeptical, suggesting Behemoth would underperform vs Qwen 3 235B despite being ~6Ć larger, calling it ādead on arrival,ā with some tongueāinācheek claims itās guiding Metaās strategy.- Speculation that Metaās unreleased ābehemothā reasoning model underperforms smaller open models, with one comment asserting itās āprobably worse than Qwen 3 235B at 6Ć the size.ā If accurate, that indicates poor scaling efficiency where adding parameters (
>6Ć
) fails to translate into better reasoning quality versus a~235B
baseline. - Another technical inference is that non-release itself is a negative performance signal: if the model were competitive, Meta would have shipped it. The implication is that internal evaluations likely didnāt surpass current SOTA on reasoning, so the absence of a release suggests underwhelming benchmark results and limited practical value at this stage.
- Speculation that Metaās unreleased ābehemothā reasoning model underperforms smaller open models, with one comment asserting itās āprobably worse than Qwen 3 235B at 6Ć the size.ā If accurate, that indicates poor scaling efficiency where adding parameters (
2. Audio Gen Releases: HunyuanVideo-Foley and VibeVoice TTS
- HunyuanVideo-Foley is out, an open source text-video-to-audio model (Score: 294, Comments: 23): Tencentās HunyuanVideo-Foley is an open-source, video-conditioned (textāvideoāaudio) model that generates foley/soundtracks aligned to an input video, with a public demo, weights, and code: demo, Hugging Face, GitHub, project page, and arXiv. Early user feedback notes improved frequency response (stronger bass/treble) and better A/V synchronization versus prior attempts, targeting the missing audio stage in current video-generation pipelines (e.g., pairing with models like Hunyuan/Wan for visuals and TTS for dialog). The thread clarifies that it can indeed generate appropriate audio for existing video tracks (i.e., video-to-audio with optional text conditioning). Commenters see this as the ālast pieceā enabling end-to-end automated content pipelines and discuss multi-GPU orchestration (e.g., persistent model loading in tools like ComfyUI) to batch long-running jobs; enthusiasm centers on workflow integration rather than raw benchmarks.
- Multiple users clarify that a ātext-video-to-audioā model here means generating Foley/ambient SFX aligned to an already existing video track, effectively filling the missing audio layer. This slots into an end-to-end pipeline alongside text/image-to-video models like Hunyuan and Wan plus dialogue models like Infinite Talk, enabling fully synthetic shorts with synchronized visuals and sound.
- Thereās interest in building a
multi-GPU
production pipeline where each model (design, T2V, dialogue, Foley) stays resident on dedicated GPUs and passes artifacts downstream, minimizing reload overhead and maximizing throughput. A key open question is whether Comfy currently provides robust multi-GPU graph execution/scheduling to support persistent residency, inter-model transfers, and weekend-long batch queues. - Early qualitative notes: audio quality reportedly has better frequency balance (āmid, bass, and trebleā) and tighter A/V sync versus earlier attempts. Practical deployment concerns include model size being ānot too big,ā a request for release in
safetensors
format for easier/safer loading, and questions about concrete run instructions.
- RELEASED: ComfyUI Wrapper for Microsoftās new VibeVoice TTS (voice cloning in seconds) (Score: 228, Comments: 27): Open-source ComfyUI wrapper for Microsoftās new VibeVoice TTS adds a Single Speaker node, a Multiple Speakers node (up to
4
speakersāmodel limit), and file-based text input for long-form synthesis, with repo at Enemyx-net/VibeVoice-ComfyUI. Reported VRAM use on official weights:~5 GB
for the1.5B
model and~17 GB
for the7B
model (the latter still in Preview), with qualitatively strong single-speaker cloning from a~56s
prompt; multi-speaker is ādecent only with the 7Bā but has a lower success rate. The model is highly seed-sensitive (large quality variance across seeds) and shows mixed cross-lingual behavior: nonāEN/zh prompt audio (e.g., Italian) can yield sameālanguage output, but EN prompts did not reliably produce other languages. User feedback notes it works on an RTX 5090 and suggests ending prompts with punctuation or trailing ellipses (ā ā¦ā) to avoid early cutoffs in short utterances; others request/anticipate quantized releases to reduce resource use and praise the nodeās utility.- A user confirms the ComfyUI wrapper for Microsoft VibeVoice TTS runs smoothly on an RTX 5090 with a self-cloned voice, suggesting good compatibility on high-end NVIDIA cards (no artifacts or instability reported). While no latency numbers are given, the report implies real-time or nearāreal-time responsiveness for personal voice use.
- Practical workaround for premature audio cutāoffs on short prompts: end the input with punctuation (ā?ā, ā!ā, ā.ā) and add a trailing ā ā¦ā (e.g., āHello? ā¦ā). This appears to mitigate end-of-sequence or silence-trimming behavior that can truncate single-word or very short TTS outputs.
- Thereās demand for a quantized build, which would lower VRAM requirements and potentially improve throughput on smaller GPUs/CPUs. Such a release would broaden deployability beyond high-end cards while trading off minimal quality loss typical of quantization.
3. Local AI Tools: gpt-oss 60K-context Training and Second Brain
- Gpt-oss Fine-tuning - now with 60K context length and fits on <13GB VRAM (Score: 229, Comments: 26): Post announces Unslothās Flex Attention for OpenAI gpt-oss training, claiming
>8Ć
longer context,>50%
less VRAM, and>1.5Ć
faster training than other impls including FlashAttention-3, enabling~60K
token context on80GB
VRAM for BF16 LoRA (title also touts ā<13GB VRAMā fit). It adds export of QLoRA-finetuned gpt-oss models tollama.cpp
,vLLM
,Ollama
, and HF, fixes float16 loss blow-ups on T4/Colab, and enforcesswiglu_limit=7.0
for MXFP4 inference in transformers; savings increase with longer sequences. Links: Unsloth, blog/details: docs.unsloth.ai/basics/long-context-gpt-oss-training. Comments ask about scaling to a 120B model and show strong interest in the upcoming notebook with direct GGUF/llama.cpp saving; general sentiment is enthusiastic.- The dev notes an upcoming training notebook with direct save-to-GGUF for llama.cpp (ānext weekā), which would remove conversion steps and enable immediate inference across llama.cpp backends (CPU, CUDA, ROCm, and Apple Metal). This would also simplify integration with tooling like LM Studio, and make quantized deploys (e.g., Q4/Q5) straightforward for the advertised
60k
context and<13 GB
VRAM target. Links: llama.cpp, GGUF spec. - Thereās user demand for a larger
120B
variant. Practically, local inference for120B
typically exceeds single-device constraints, often requiring multi-GPU tensor parallelism and aggressive quantization; even 4-bit can require~40ā60 GB
VRAM, making it well beyond the<13 GB
class unless using distributed setups. - Multiple users ask about macOS support: running in LM Studio on a Mac mini M4 and whether Unsloth is coming to Mac. If the model exports directly to GGUF, it becomes immediately usable via llama.cpp with the Metal backend (which LM Studio wraps), improving Mac compatibility without bespoke ports. Links: LM Studio, Unsloth.
- The dev notes an upcoming training notebook with direct save-to-GGUF for llama.cpp (ānext weekā), which would remove conversion steps and enable immediate inference across llama.cpp backends (CPU, CUDA, ROCm, and Apple Metal). This would also simplify integration with tooling like LM Studio, and make quantized deploys (e.g., Q4/Q5) straightforward for the advertised
- I built a local āsecond brainā AI that actually remembers everything (321 tests passed) (Score: 259, Comments: 120): OP introduces Kai, a local ācognitive OSā that builds persistent, on-device memory using a graph-based knowledge store with spreading activation for retrieval (akin to cognitive architectures like ACT-R). It runs 100% locally (no cloud), learns from user activity across the machine, and emphasizes itās ānot just RAGāāinstead leveraging a node/edge memory graph + activation dynamics; the project reports
321
passing tests and offers early access at oneeko.ai with a 3D memory visualization screenshot. OP plans to open the core engine once stable. Top comments push for open-sourcing given the local-only claim, share a similar project using a query-driven activation and residual-strengthening approach (dsam_model_memory), and a skeptic suggests it might be just an MCP-style server tagging/summarizing conversational dataāwith the usual failure modes of such systems.- A commenter building a similar system shares they use a query-based activation function to generate residuals that strengthen frequently accessed memories and related concepts (repo: https://github.com/jwest33/dsam_model_memory). They present this as biasing retrieval toward high-salience items over time, rather than static vector-store recall, to improve long-term relevance in a personal knowledge base.
- Another commenter suspects the project is essentially an MCP server that tags conversational data and builds a summary graph, with both āsaveā and āqueryā interfaces (see https://modelcontextprotocol.io/). They caution that this architecture typically inherits the same failure modes seen in similar tag/summarization-graph pipelines, implying persistent issues when metadata diverges from user intent over time.
- Hardware performance observations: on their setup, qwen3 235b a22b runs at
~20 tps
, glm-4.5-air at~40 tps
, and gpt-oss-120b at~70 tps
, while theyād prefer>=100 tps
. They also note many models feel ātoo censoredā for personal-assistant workflows, preferring fewer safety interventions to enable open-ended exploration.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. GPT-5 Medical Benchmarks and Codex IDE/CLI Launch
- GPT-5 outperformed doctors on the US medical licensing exam (Score: 666, Comments: 245): A preprint, āCapabilities of GPT-5 on Multimodal Medical Reasoningā (AlphaXiv), claims GPTā5 outperforms licensed physicians by ~
25ā30%
on USMLE-style evaluations, shown in the tweetās tables. The result appears to rely on structured, expert-curated inputs (i.e., nearāperfect diagnostic/context data) and is not an endātoāend clinical workflow; it evaluates reasoning/answer selection on examālike vignettes rather than autonomous patient management. Top comments note the caveat that performance hinges on being given perfect diagnostic data, likening the setup to an openābook exam, and caution that real clinical safety (drug interactions, longitudinal context) remains an unresolved challenge despite strong exam performance.- Several commenters note the benchmark likely assumes idealized inputs ā e.g., results contingent on being āprovided with perfect diagnosis data from a human expert.ā This setup evaluates answer selection under clean, expert-curated context, not end-to-end clinical reasoning with noisy, incomplete histories, which is a major confound when comparing to practicing physicians who must perform triage, elicit histories, and resolve ambiguity.
- A technically relevant safety concern is state/recall limits: an LLM may āforgetā earlier chart details due to context truncation, risking contraindicated suggestions (e.g., proposing another NSAID after prior ibuprofen, such as diclofenac). This highlights the need for robust patient-state tracking, medication reconciliation, and automated drugādrug interaction checks as guardrails, rather than relying on transient chat context alone.
- Multiple remarks frame this as an āopenābook advantageā: the model effectively carries a corpus of textbooks via pretraining, so outperforming on multipleāchoice exams mainly reflects test-taking/recall under vast prior knowledge. This metric is not equivalent to bedside performance; it differs from other validated AI strengths (e.g., specific imaging tasks) and raises fairness questions versus humans taking a closedābook, timeālimited USMLE.
- Codex now runs in your IDE, Cloud and CLI with GPT-5 (Score: 221, Comments: 80): An OpenAI Developers announcement (Aug 27, 2025) claims that Codex now works as a coding collaborator across IDEs, the cloud, and the CLI, āpowered by GPT-5ā and accessible via the ChatGPT plan. The graphic highlights a new IDE extension, seamless task handoff between local and cloud environments, GitHub code review integration, and a revamped Codex CLIāsuggesting tighter endātoāend workflow coverage from editing to review to execution. Commenters ask for realāworld comparisons to Claude Code (quality/usability), whether the prior sandboxing requirement still applies (a blocker for some), and if thereās support for RStudio/R workflows.
- Users flag the prior Codex requirement to run code in a strict sandbox as a major blocker for real-world workflows (file system, network, package managers, test runners), asking if the new IDE/Cloud/CLI release relaxes or allows opting out per project. The resolution of sandboxing (e.g., trusted directories, network egress, env var access) will determine whether itās viable for in-IDE refactors and debugging versus only safe, ephemeral runs.
- A power user on the
Claude Code $100
plan reports preferring GPTā5ās raw code-generation quality but still finding Claude Codeās overall āsystemā harder to beat. The takeaway is that model quality alone isnāt sufficient; reliability and endātoāend developer ergonomics (workflow orchestration, context handling, integrations) at the~$100/mo
tier are decisive for adoption. - Thereās uncertainty about access tiers: whether
GPTā5 High
is available under the$20
ChatGPT Plus plan in Codex. One commenter found āmedium thinkingā underwhelming, implying meaningful quality gaps between āMediumā and āHighā tiers that could affect latency/cost tradeoffs and plan selection.
- Whoās Your Doctor Now? (Score: 2733, Comments: 87): Non-technical meme contrasting perceived bedside manner of AI assistants vs web search: under the OpenAI logo it says āNothing serious, it can be treated,ā vs Googleās āYou have 3 minutes left,ā implying LLM reassurance vs search-engine-induced alarmism. Title āWhoās Your Doctor Now?ā frames it as a tongue-in-cheek take on self-diagnosis culture; no benchmarks, models, or implementation details discussed. Comments reminisce about the āDr. Googleā era exacerbating hypochondria and joke about overdiagnosis, with some sarcastic quips about professionalism and calling everything cancer.
- Rate this art by gpt 5 (Score: 244, Comments: 189): AI-generated abstract of Lord Ganesha; despite the title (āby gpt 5ā), the shared prompt clearly indicates Midjourney v6.1: āthick paint splashes ⦠white background āpersonalize cvlos9g āstylize 800 āv 6.1.ā The high
-stylize 800
drives the bold, minimalist paint-stroke aesthetic, and-personalize cvlos9g
suggests a user/style-specific personalization token, yielding a clean white background with vivid, liquid-paint strokes. Image: https://i.redd.it/nf5kr1bjiolf1.jpeg Comments note resemblance to the Olympic logo and include polarized views on AI artās value; a commenter provides the exact prompt so others can replicate the result, implicitly correcting the GPT-5 attribution to Midjourney.- A commenter shared the exact prompt and parameters: āthick paint splashes forming abstract minimalist shape of Lord Ganesha, ⦠white background āpersonalize cvlos9g āstylize 800 āv 6.1ā. This implies Midjourney v6.1 (
-v 6.1
), with a high-stylize 800
value that strongly biases outputs toward aesthetics over literal prompt adherence (see Midjourney parameter docs: https://docs.midjourney.com/docs/parameters). The-personalize cvlos9g
token appears to be a custom style/profile identifier influencing palette and composition. - Observations like āIt doesnāt look AI generatedā align with how MJ v6.xās improved coherence and texture handling can produce clean, logo-like geometry and consistent āliquid paintā effects. Minimalist composition plus a white background and high stylization tend to suppress common AI tells (messy edges, inconsistent brush physics), yielding results that some viewers read as non-AI; cf. model/version notes: https://docs.midjourney.com/docs/models#version-6.
- A commenter shared the exact prompt and parameters: āthick paint splashes forming abstract minimalist shape of Lord Ganesha, ⦠white background āpersonalize cvlos9g āstylize 800 āv 6.1ā. This implies Midjourney v6.1 (
- Chicken of the Sea - SaraShakeel x Ai render (Score: 349, Comments: 10): A user shares an AI-generated visual titled āChicken of the Sea ā SaraShakeel x AI render,ā apparently styled after artist Sara Shakeel, hosted on Reddit Video at v.redd.it/ujigpiulfplf1. The external link currently returns
HTTP 403 Forbidden
, implying access requires Reddit login or a developer token (likely WAF/auth gating), and the thread provides no technical metadata (e.g., model, prompts, or pipeline details). No benchmarks, implementation notes, or asset workflow are discussed; the thread is primarily aesthetic reception.- One commenter detailed a pre-AI pipeline: starting from a retouched reference image composited in Photoshop, then using Midjourney for expanded looks, followed by animation in Cinema 4D with the Arnold renderer, particle simulations, and compositing/tracking in After Effects/Mocha (Midjourney, C4D, Arnold, After Effects, Mocha). They report
~4 weeks
of work for a2-minute
deliverable including~1 week
of rendering, for a~$5k
payout (not continuous time), noting the realism lagged compared to current AI renders and that they should have priced closer to~$10k
. They conclude that āAI is devaluing the market,ā reflecting perceived downward pressure on rates as generative tools improve speed/realism.
- One commenter detailed a pre-AI pipeline: starting from a retouched reference image composited in Photoshop, then using Midjourney for expanded looks, followed by animation in Cinema 4D with the Arnold renderer, particle simulations, and compositing/tracking in After Effects/Mocha (Midjourney, C4D, Arnold, After Effects, Mocha). They report
2. WAN 2.x Infinite Talk Demos & S2V Tips + HunyuanVideo-Foley
- 4090 48G InfiniteTalk I2V 720P Test~2min (Score: 501, Comments: 117): Creator benchmarked an I2V pipeline on an RTX 4090 (48 GB) using
wan2.1_i2v_720p_14B_fp8_scaled
with LoRAlightx2v_I2V_14B_480p_cfg_step_distill_rank256_bf16
, generating 1280Ć720 output at 4 diffusion steps while consuming ~36 GB
VRAM. The run processed49
chunks of81
frames each (title: ~2 min total), at ~5 min
per chunk for~245 min
total; the FP8-scaled 14B model plus a stepādistilled LoRA (rank256
, bf16) suggests a speed/memoryāoptimized setup. Source audio is an AI cover on YouTube (https://youtu.be/9ptZiAoSoBM) sung in the style of Hiromi Iwasaki. Commenters report lip/voice sync is strong overall but degrades during background vocals, speculate mic motion/handling may confuse the model, and predict nearāterm agent workflows that autoāedit and publish music videos from a song upload.- Observers note the voiceālip sync is largely accurate at
720p
but degrades during overlapping/background vocals; using clean vocal stems would likely improve alignment. Thereās speculation that erratic microphone movements may interfere with source detection/voice activity cues, causing the model to momentarily track the wrong singer. - Thereās specific interest in the non-standard
RTX 4090 48GB
configuration used for the ~2 min
I2V run, with requests for the exact vendor/mod source. Commenters flag that this atypical memory capacity impacts reproducibility and potential batch/window sizes for others attempting the setup. - Questions about multi-GPU capability (e.g., splitting inference/training across GPUs) suggest users want to know if the InfiniteTalk I2V pipeline supports data/model parallelism or VRAM sharding. Clarity on whether the
48GB
requirement can be met via multi-GPU aggregation versus a single large-VRAM card would inform hardware choices.
- Observers note the voiceālip sync is largely accurate at
- Three reasons why your WAN S2V generations might suck and how to avoid it. (Score: 510, Comments: 151): OP reports that WAN S2V yields significantly better outāofātheābox results via the WanVideoWrapper than the native ComfyUI workflow, which required extensive tweaking for only moderate quality. They advise avoiding āspeedāup LoRAs,ā which they say degrade both WAN 2.2 and S2V output quality and movement/prompt adherence (only acceptable for mostly static talking heads). Strong prompt engineering is emphasized: specify music genre, atmosphere, emotional state, gaze direction, head/body motions, and exact actions rather than vague prompts. Example run:
576x800
resolution,~737f
, samplerUniPC/beta
,23
steps. The linked media is accessārestricted (v.redd.it 403); see also ComfyUI. Top comments include a request to share the workflow (user āLimewireā) and general praise of the result; no substantive technical counterpoints were offered.- comfyanonymous notes the ānative workflowā for S2V is not officially announced and the node is still marked
beta
, implying current quality issues stem from immature implementation; once the native node is fully implemented, it should outperform interim/thirdāparty workflows. This suggests users should expect rapid iteration and possibly breaking changes until the native node stabilizes.
- comfyanonymous notes the ānative workflowā for S2V is not officially announced and the node is still marked
- Wan 2.1 Infinite Talk (I2V) - FOAR EVERYWUN BOXXY (Score: 217, Comments: 45): OP demonstrates an Image-to-Video workflow using Wan 2.1 āInfinite Talkā to produce a talking-head clip with intentional upper-body/hand motion. The positive prompt targets facial cosmetics (big eyelashes/eyeliner) and short, black-painted nails, while an exhaustive negative prompt suppresses common video-generation artifacts (e.g., long nails/jewelry, overexposure/blur/static frames, JPEG artifacts, extra/fused fingers, deformed limbs, messy backgrounds, multiple/extra limbs, walking backwards), aiming for cleaner hand renderings and more dynamic motion. No generation parameters (resolution/FPS/steps/sampler/seed/CFG/duration) or hardware details (e.g., VRAM) are provided. Commenters praise the qualityāone calls it āBY FAR the best exampleā of Infinite Talkāwhile another asks about VRAM requirements, indicating interest in compute footprint; no answer is given in-thread.
- Resource requirements: A commenter asks, āHow much vram for this results,ā seeking concrete GPU memory needs to reproduce the shown Infinite Talk I2V quality. Technical readers would expect details like VRAM usage at specific resolutions/durations (e.g.,
512p/1024p
, seconds per frame), model precision (fp16
vsbf16
), and whether inference used xformers/attention slicing or CPU offload to fit into commodity GPUs. - Identity fidelity and reference control: One notes the output ādoesnāt look like Boxxyā and requests the original image, implicitly probing the pipelineās identity preservation and conditioning strength. This raises questions about the reference handling (single image vs multi-shot, face-alignment/landmark guidance, ID loss, and use of face-enhancers like GFPGAN/CodeFormer) and whether the I2V model supports guidance scale or identity embeddings to keep likeness stable across frames.
- Comparative performance: Another asks if this is ābetter than the new S2V model,ā indicating interest in head-to-head quality and stability comparisons between Wan 2.1 Infinite Talk (I2V) and S2V. Relevant benchmarks would include motion coherence, lip-sync accuracy, temporal consistency (flicker/warp), inference speed (FPS), and VRAM efficiency at matched prompts and resolutions.
- Resource requirements: A commenter asks, āHow much vram for this results,ā seeking concrete GPU memory needs to reproduce the shown Infinite Talk I2V quality. Technical readers would expect details like VRAM usage at specific resolutions/durations (e.g.,
- HunyuanVideo-Foley got released! (Score: 289, Comments: 46): HunyuanVideoāFoley is an openāsource Text+VideoāAudio (foley) model that generates synchronized sound effects from video input (optionally textāconditioned). A project page with interactive demos and sideābyāside comparisons against baseline models like MMAudio and ThinkSound is available here: https://szczesnys.github.io/hunyuanvideo-foley/. Early user feedback reports mixed quality: anime content can collapse to lowāenergy breaths/mumbling, and some realālife clips yield abrasive āsandpaperā textures; MMAudio baselines are noted to sometimes emit random, loud āscreams,ā highlighting artifact/hallucination issues. One commenter also hints at heavy I/O/compute demands (āMy SSD is tiredā¦ā) during generation.
- Multiple users report severe realism issues and artifacting: outputs devolve into āmutated exorcist screaming,ā unintelligible mumbling, or broadband āsandpaperā noise during action onsets. This points to weak audio-visual alignment and poor transient handlingālikely diffusion artifacts and unstable conditioning causing temporal drift and spectral roughness, resulting in janky Foley lacking precise onsets, dynamics, and spatial cues.
- Clear domain gap between styles: anime sequences produce only a faint sigh then garbled vocalizations, while live-action yields abrasive textures. This suggests the model isnāt robust to stylized visual domains (anime) and defaults to generic, low-information acoustic priors, indicating insufficient domain-conditioned training or inadequate style tokens/embeddings for non-photorealistic inputs.
- NSFW prompts appear specifically bad (generic/rubbed textures, suppressed or mismatched erotic SFX), hinting at safety filtering or data sparsity in such content. The behavior resembles hard clamping toward neutral textures and low-variance outputs under restricted semantics, which further degrades alignment and timbral specificity in those scenarios.
- If this is Genie 3, imagine how insane Genie 4 will be (Score: 1209, Comments: 179): Thread centers on the rapid capability jump from āGenie 2ā to āGenie 3ā within
~8ā9 months
, as evidenced by a shared demo video (requires auth: https://v.redd.it/6rk25azwirlf1). No benchmarks or release notes are cited; discussion is primarily about trajectoryātoward higher physical fidelity and interactive, navigable environmentsārather than implementation specifics. Commenters speculate that āGenie 4ā could add fineāgrained, physically consistent scene effects (e.g., āpaw prints in the sandā) and support realātime VR exploration of generated spaces; some infer that, if the exponential cadence holds, the next iteration may arrive soon.- Release cadence speculation: One commenter notes Genie 2 ā Genie 3 landed within
~8ā9 months
, reading this as a sign of accelerating iteration and expecting Genie 4 on a short horizon if the trend is exponential. Another draws a parallel to the perennial āGPT-4 ā GPT-5ā hype cycle, implicitly cautioning that cadence ā capability without concrete benchmarks or demos to compare across versions. - Interaction model/UX question: A user asks how Genie actually works in practiceāwhether it needs constant prompting versus supporting a stateful, continuous session. Another speculates about real-time VR-style exploration, implying a system that maintains scene state and accepts continuous control inputs (e.g., camera/controller) with low-latency streaming generation, rather than discrete prompt-to-video clips.
- Release cadence speculation: One commenter notes Genie 2 ā Genie 3 landed within
- Photoshop is cooked, Nano Bananas manipulation is insane. (Score: 2064, Comments: 225): Post showcases a state-of-the-art AI image-editing/inpainting workflow demonstrating high-fidelity object and scene manipulationācontrasted with early Stable Diffusion failure modes like āextra limbsā and āmissing fingers.ā The linked gallery (reddit.com/gallery/1n2fxjn) suggests strong structural coherence and texture blending for large edits, but residual failure cases remain (e.g., an āextra fingerā artifact) and fine-grained control over micro-features still lags, especially on faces. Commentary notes that while results are impressive, Photoshop isnāt ācookedā yet: users still catch anatomical artifacts and report that precise facial edits are difficult, implying AI tools are excellent for global/semantic edits but unreliable for small, identity-preserving adjustments.
- Progress noted from early Stable Diffusion artifacts (extra limbs/fingers) to current near-photoreal manipulation, but hand/finger fidelity remains an edge caseāusers still spot issues like an extra finger in the final image. This reflects ongoing weaknesses of diffusion/inpainting on high-frequency anatomical details (hands), despite major improvements in global coherence and realism.
- Users report strong performance for broad edits (e.g., compositional/object changes) but poor control over small facial details; ātry to edit small facial details and itās hell.ā This aligns with known limitations where localized, fine-grained edits can degrade identity or introduce artifacts, suggesting a need for better mask-aware conditioning, control nets, or higher-resolution latent editing to preserve microfeatures.
- Apple AI vs Galaxy Al vs Xiaomi Al REMOVE tool (Score: 4507, Comments: 407): A short video (blocked for us with
403 Forbidden
on v.redd.it) purportedly compares the consumer āremove/magic eraserā inpainting tools from Apple, Samsung Galaxy, and Xiaomi on the same image, highlighting differences in fill quality after erasing a subject. No implementation details, models, or benchmarks are provided in the post; it appears to be a visual A/B/C of object-removal outputs typical of on-device photo editors. Link: https://v.redd.it/w7ckphp7oplf1 Top comments argue Appleās result is notably worse than competitors and liken it to a basic Paint 3D-style āmagic eraser,ā with little substantive technical discussion beyond that sentiment.- Several commenters imply the āremoveā tools are functionally similar across vendors: Appleās new Photos āClean Upā under Apple Intelligence vs Googleās Magic Eraser, Samsung Galaxy AIās Object/Generative Edit, Xiaomiās AI Erase, and even Microsoft Paintās Magic Eraserāmost are variants of semantic segmentation + generative inpainting. Key differences are deployment and privacy: Apple emphasizes onādevice inference on
A17 Pro
/Māseries
NPU for supported devices (Apple Intelligence), while Samsung often flags cloud-backed edits (Galaxy AI); Xiaomiās implementation varies by model/region. Quality tends to hinge on mask accuracy and the inpainting model (diffusion vs patch-based), background texture complexity, and promptless vs guided fills. - Noting the omission of Google Pixel, its Magic Eraser debuted with Pixel 6/Tensor and later expanded via Google Photos/One, with some features processed server-side (e.g., Magic Editor) and others on-device depending on hardware/app version (Google Photos Magic Eraser, Magic Editor). Pixelās stack also includes Best Take and Audio Magic Eraser, indicating a mature, vertically integrated pipeline leveraging the Tensor ISP/NPU; in practical comparisons, object removal quality is generally in the same class as Apple/Samsung/Xiaomi but may differ on fine textures and edge continuity where diffusion-based inpainting shines.
- Several commenters imply the āremoveā tools are functionally similar across vendors: Appleās new Photos āClean Upā under Apple Intelligence vs Googleās Magic Eraser, Samsung Galaxy AIās Object/Generative Edit, Xiaomiās AI Erase, and even Microsoft Paintās Magic Eraserāmost are variants of semantic segmentation + generative inpainting. Key differences are deployment and privacy: Apple emphasizes onādevice inference on
- Turning drawings into photos in 2025 (Score: 447, Comments: 38): A demo post shows a tool that converts hand drawings/sketches into photorealistic images. The embedded media at v.redd.it/qtckbsr0jnlf1 returns
403 Forbidden
(Reddit block page), so model details, benchmarks, or implementation specifics canāt be verified from the post; no technical specs are provided in-thread. Top comments flag friction: requiring a credit card for a āfreeā trial, and ask why not use ChatGPT insteadāimplying users may prefer built-in or open alternatives. For precise sketch-to-photo control, commenters typically reference diffusion img2img workflows (e.g., SDXL + ControlNet) over general-purpose chat models. - This post got 27K upvotes 3 years ago - before Reddit hated AI (Score: 716, Comments: 186): The image (https://i.redd.it/tfu54shi6olf1.jpeg) is a representative example of early text-to-image AI aesthetics (~2021ā2022, preāStable Diffusion): surreal, low-coherence compositions with CLIP-guided, dreamlike artifacts rather than photorealism. The title highlights it reached
27K
upvotes ā3 years ago,ā underscoring how simply getting interpretable outputs was then noteworthy; compared to modern diffusion systems, these older pipelines produced painterly textures, warped structures, and ambiguous forms that many associated with the genreās early charm. Commenters note that early AI art wasnāt seen as a threat to human artists and had a distinctive, abstract look that some now miss; one suggests revisiting older models to recreate that vibe, while another remarks on how ābadā the quality seems by todayās standards.- Several commenters contrast 2021ā2022-era text-to-image systemsāwhere producing anything āinterpretableā felt monumentalāwith todayās near-photoreal outputs. Early pipelines (e.g., VQGAN+CLIP, DALLĀ·E mini/Craiyon, early Stable Diffusion 1.x) tended to yield abstract/dreamlike results due to CLIP-driven guidance and weaker priors; by 2023ā2025, larger diffusion models (e.g., SDXL, Midjourney v6) markedly improved resolution, compositional reliability, and prompt adherence through larger backbones, better datasets, and improved sampling/finetuning. See SDXL overview: https://stability.ai/news/stable-diffusion-sdxl-1-announcement and MJ v6 notes: https://docs.midjourney.com/docs/model-versions#version-6
- Thereās technical interest in deliberately using older models to reproduce the āweirdā aesthetic: artifacts emerged from low training resolutions (
256ā512px
), smaller U-Nets, limited/ noisier datasets, and strong classifier-free guidance producing oversaturated textures and surreal compositions. Samplers like early DDIM/PLMS and CLIP-guided losses accentuated odd geometry and text blending, yielding the āinternet through a distorted glassā vibe thatās harder to get from modern, well-regularized models with advanced samplers (e.g., DPM-Solver++) and robust conditioning. - A side-by-side theme emerges via a 3-year-old example image (https://www.reddit.com/r/PeterFHamilton/s/9a3H1j4tQZ) versus a āSame prompt 2025ā render (https://preview.redd.it/q7aslq30kolf1.jpeg?width=1024&format=pjpg&auto=webp&s=4b13a50c42da2c8b531c7b7685e3610e67000af4). The latter implies major gains in fidelity (anatomy, lighting, texture detail), text and prompt-following, and artifact suppressionālikely attributable to larger training corpora, higher native resolutions, improved conditioning (prompt/negative prompts), and better inference tooling (refiners, upscalers).
3. AI Policy: ChatGPT Scanning, Regulation Memes, and Jobs Debate
- OpenAI Says Itās Scanning Usersā ChatGPT Conversations and Reporting Content to the Police (Score: 697, Comments: 259): The post claims OpenAI āscansā ChatGPT conversations and reports users to police. OpenAIās own Privacy Policy and Usage Policies confirm chats may be reviewed by automated systems and authorized personnel for abuse/safety, and that content can be disclosed to law enforcement when required by law or to prevent harm; data-use controls (e.g., training opt-outs, enterprise retention settings) exist, but routine moderation/abuse-detection applies broadly, with few publicly documented thresholds or audit details. Commenters connect this to governance and government ties: noting former NSA director Paul M. Nakasone (
2018ā2024
, nominated under Trump) joining OpenAIās board (OpenAI, Wikipedia), alleging an unverified$200M
DoD contract, and urging clearer disclosure about employee access and privacy boundaries.- Several commenters quote OpenAIās stated policy that potentially violent intent triggers āspecialized pipelinesā with human review and, if an āimminent threat of serious physical harmā is determined, possible referral to law enforcement. This describes a two-step moderation architecture: automated detection ā human escalation ā enforcement/reporting, aligned with industry trust-and-safety norms; see OpenAIās policy pages for context (e.g., https://openai.com/policies/usage-policies).
- Thereās a technical privacy concern about insider access: users want explicit disclosure of data flows (collection, retention windows, training use), access controls (who can view chats and under what approval), and auditing (logged reviewer access, redaction of PII). Commenters note most major platforms run abuse-detection scanning at scale, but ask OpenAI to clarify consumer vs. enterprise defaults, opt-outs, and how āsnoopingā risk is mitigated (e.g., segmentation, encryption-at-rest, least-privilege).
- Governance/affiliation implications are raised: a former NSA Director reportedly sits on OpenAIās board (see OpenAIās announcement: https://openai.com/blog/paul-nakasone-joins-openai-board-of-directors), and a claimed
200M
DoD contract suggests deeper government integration. Technically, this could influence reporting workflows, regulatory alignment, and thresholding for law-enforcement cooperation, though commenters debate whether this differs materially from standard practices at other large tech firms.
- If AGI is so āinevitableā, they shouldnāt care about any regulations (Score: 342, Comments: 44): The image is a meme highlighting the rhetorical tension in AI policy: companies say AGI is āinevitableā globally while also warning that domestic regulation could ākill the industry.ā Commenters distinguish scopes: āinevitableā refers to worldwide progress, whereas stringent U.S.-only rules could shift capability development abroad (e.g., to China). Others argue much current legislation is naĆÆve or weaponized for competitive advantage, proposing regulation target downstream human impacts (safety, labor protections) rather than banning core AI R&Dādrawing an Industrial Revolution analogy. Debate centers on whether to regulate the technology itself versus outcomes and externalities; some see domestic overregulation as self-sabotage amid geopolitical competition, while others stress the need for mature, impact-focused governance to avoid stifling innovation.
- Several commenters argue incumbentsā pro-regulation stance is largely about regulatory capture and cross-border arbitrage: big labs like OpenAI, Google, Microsoft, Anthropic can absorb compliance costs and shift training to permissive jurisdictions. Example: Japanās Copyright Act Art. 30-4 provides a broad text/data mining exception āregardless of the purpose,ā enabling use of copyrighted materials for ML training without permission, which firms can leverage to mitigate IP risk (CRIC English summary).
- On the EU side, the finalized AI Act introduces a GPAI āsystemic riskā regime (with a proxy threshold around
>10^25
training FLOPs) that triggers documentation, model evals/red-teaming, cybersecurity, and copyright-risk mitigation duties for frontier models (overview). Critics note a disconnect: many labs publicly back āAI regulationā yet contest IP liability in court (e.g., fair-use defenses in the US, such as the NYT v. OpenAI/Microsoft case: NYT coverage) and lobbied to soften compute triggers and obligations. - Geopolitical asymmetry is highlighted: even if US regulation constrains domestic players, China will continue advancing with domestic LLMs (e.g., Alibabaās Qwen series, Baidu ERNIE) and nonāUS accelerators (e.g., Huawei Ascend) despite US export controls on A100/H100class chips (BIS rule, Oct 2023). Open models like Qwen2-72B report near-GPTā3.5 performance on key benchmarks such as MMLU (arXiv), suggesting unilateral regulation may shift where progress happens rather than stop it.
- People thinking AI will end all jobs are hallucinating- Yann LeCun reposted (Score: 460, Comments: 297): Thread discusses a repost by Metaās Chief AI Scientist Yann LeCun asserting that current AI systems cannot plausibly end all jobs, i.e., the claim isnāt supported by present capabilities; the repost itself includes no new benchmarks or empirical results. The referenced Reddit link returns 403, so discussion centers on capability limits vs. extrapolation rather than new data. Commenters argue the stance is presentist: todayās constraints (e.g.,
~10Ć
verification overhead vs. generation) may shrink with rapid progress, so inferring longāterm labor impact from current limits is āshortsighted.ā Others note the debate is framed in absolutes and paraphrase the post as āAI wonāt end all jobs because it canāt right now,ā which they view as unsubstantiated for the future.- Several commenters focus on the current verification bottleneck, citing a roughly
10Ć
slowdown for checking model outputs versus generating them. The debate is whether that ratio is a transient artifact of todayās pipelines (manual review, weak auto-eval) or a hard limit; critics argue verification can be automated/parallelized via stronger test synthesis, formal checks, and domain-specific oracles, reducing effective latency and cost at scale, thus weakening arguments about limited job replacement based on todayās verification overhead. - LeCunās past āneverā claims (e.g., about spatial reasoning) are challenged by progress in multimodal/world-model systems that demonstrate emerging spatial and physical reasoning. Commenters point to interactive video/world models (e.g., Google DeepMindās Genie line: https://deepmind.google/discover/blog/genie-generating-interactive-worlds-using-pixels/) and VLMs evaluated on spatial/relational benchmarks like CLEVR (https://cs.stanford.edu/people/jcjohns/clevr/), arguing that capability trends undermine categorical forecasts about what AI ācannotā do.
- Several commenters focus on the current verification bottleneck, citing a roughly
- The double standards are sickening! (Score: 215, Comments: 138): OP argues regulators are extrapolating from an isolated AI-related incident to justify sweeping āguardrailsā on LLMs like ChatGPT, while long-documented harms from engagement-optimized recommender systems powering Instagram, TikTok, Snapchat, and YouTube receive comparatively little constraint. They frame this asymmetry as political economy: entrenched, revenue-generating social platforms are tolerated, whereas the ānew shiny AIā is easier to regulate despite providing practical mental-health-adjacent utility (e.g., safe late-night conversational āpresence,ā journaling support). The post contrasts AIās conversational utility with social feeds driving FOMO, bullying, body dysmorphia, and other mental-health impacts. Commenters attribute the policy gap to incentive misalignment and rent-seeking among politicians, and characterize LLMs as a āpresence engineā that aids structured journaling and psychoeducationādistinct from dopamine-maximizing engagement loopsāwhile acknowledging it is not a licensed-therapy substitute.
- Discussion frames GPT/LLMs as a āpresence engineā for late-night support and journaling: users report qualitative improvements in writing and self-reflection over ~a year by using structured prompts and psychological frameworks (e.g., CBT-style exercises) rather than clinical diagnoses. Emphasis that itās not a clinician but can scaffold coping strategies through consistent, nonjudgmental, task-focused dialogue.
- Technical contrast with engagement-optimized social media: unlike feeds tuned via reinforcement learning for
time-on-platform
and dopamine loops, LLM chats are turn-based and can be configured with safety guardrails (e.g., self-harm classifiers, de-escalation responses, and crisis resources). Commenters note search engines may surface suicide methods without context, highlighting a design trade-off between open indexing and proactive safety interventions in AI assistants.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. OpenAI Product Push: Realtime, Web Search, and Codex
- OpenAI Speaks Up with GPTāRealtime: OpenAI unveiled gptārealtime, a developerāfacing speechātoāspeech model, alongside updates to the Realtime API in Introducing GPTāRealtime. The release emphasizes lowālatency, interactive voice experiences and positions realtime as a firstāclass API surface for multimodal apps.
- Community reactions highlighted excitement for voice-native agents and tool use, with early adopters eyeing streaming hooks and session controls documented on OpenAI Live. Engineers framed the move as OpenAIās push to make alwaysāon conversational interfaces practical at scale.
- Web Search Slashes Spend by 60%: OpenAI announced domain filtering, explicit source reporting, and a 60% price cut for Web Search in the Responses API (from $25 to $10 per 1k calls) per OpenAI Devsā update. The update targets factual grounding and cost control for production chatbots that pull live context.
- Builders said cheaper search will unlock broader usage for retrievalāaugmented features, noting that explicit sources simplify auditing and trust in outputs. One member summarized the appeal as making it easier to āpull factual data from the web to add contextā while keeping spend predictable.
- Codex Comes Back, Claims GPTā5 Power: OpenAI teased a refreshed Codex purportedly powered by GPTā5, adding a VS Code/Cursor extension, GitHub autoāreviews, and a rebuilt CLI with image input, per this OpenAI Devs post. The announcement pitches stronger code understanding and multimodal developer workflows.
- Developers expect sharper code edits and review automation, but want benchmarks and latency numbers before large migrations. Teams noted the new CLI + image input could streamline repoācentric tasks and visual debugging in CI.
2. Frontier & Open-Source Model Drops and Decoding Tricks
- MAIā1 Muscles onto the Leaderboard: Microsoftās MAIā1āpreview landed at #13 on the LMArena text leaderboard, now testable via LMArenaās Text Leaderboard. Community notes: trained on ~15,000 H100s, the preview feels slow with a small context window, yet shows promise for webdevāstyle reasoning.
- Early testers reported errors with longer prompts and mixed reasoning depth, quipping that āmaiā1 thinks itās einsteinā but trips on context length. Despite quirks, the reception frames it as a notable ināhouse MoE milestone from Microsoft.
- Hermesā4 Leaks, Then Leaves: The NousResearch/Hermesā4ā14bāchatātemplateāretrain briefly appeared on Hugging Face before going private; mirrors circulated quickly, and early runs looked solid (model card snapshot). The unplanned window let users test the chatātemplate retrain with reports of robust instruction following.
- Users said the model āworks fine for nowā and noted a new chatātemplate flag enabling thinking=True prompts. The incident reinforced interest in lightweight instructionātuned 14B models for local IDEs and agents.
- llama.cpp Tries Speculative Decoding: A draft PR adds speculative decoding to llama.cpp with a working prototype, inviting accuracy/perf testing (llama.cpp PR #15225). Early user feedback reported mixed accuracy, suggesting further tuning is needed for general use.
- Discussion compared techniques like MTP (Memory Token Prediction) used by DeepSeek and GLM, noting MoE models can be tricky for speculation. Practitioners emphasized that token distribution shifts after instructātuning can affect draftāaccept rates.
3. Retrieval and Agent Infrastructure Heats Up
- Gensee Shrinks Web Retrieval to One Call: The Gensee Search Agent wraps search, crawl, and browse into a single API call with retries/fallbacks and BFSāstyle breadth search, claiming +23% GAIA accuracy and a field report of +40% after swapping it in (tech blog). A 5āminute walkthrough video demos goalāaware extraction that filters offātarget pages early.
- Engineers liked the consolidated interface and faultātolerant design for production agents, calling out the appeal of parallel search + tight content extraction. Teams plan bakeāoffs against homegrown retrievers to validate the GAIA gains.
- Cloudflare Ships AI Gateway; Devs Benchmark It: Cloudflare refreshed its AI Gateway with observability and routing features (Cloudflare blog). One test routed calls through the gateway to OpenRouter for
llama-3.1-8b-instruct
, clocking ~20s withonly: ['cloudflare']
vs 3s direct.- Some suggested the feature set overlaps OpenRouter, while others valued traffic control and analytics at the edge. Benchmarkers flagged the latency delta as a tuning target before adopting gatewayāmediated inference in prod.
- Prime Intellect Opens an RL Environment Hub: Prime Intellect launched an open-source Environments Hub to crowdsource and share RL environments (announcement). The hub aims to standardize environment sharing for agentic evaluations and training pipelines.
- In replies, @karpathy said heās ābullish on environments and agentic interactionsā but ābearish on reinforcement learning specificallyā (Karpathy reply). The community read this as a nudge toward environmentārich evals even if classic RL isnāt center stage.
4. Builder Tooling Gets Friendlier
- LM Studio 0.3.24 Polishes UX and Adds SeedāOSS: LM Studio 0.3.24 shipped support for ByteDance/SeedāOSS models and markdown improvements like a sticky copyācode button and better table/code rendering (release notes). The refresh also tweaks
lms
output styling to make local dev loops smoother.- Users welcomed the nicer code navigation and formatting for prompt engineering sessions, plus the expanded model catalog (SeedāOSSā36B page). Localāfirst devs called it a qualityāofālife bump for desktop inference.
- SmolFactory Spins Up Simple Training in Spaces: SmolFactory launched as a Hugging Face Space for pointāandāclick model training, shipping with the GeneReviews dataset (SmolFactory Space). The author also published a howāto blog post covering dataset selection and training flows.
- Builders liked the minimal UI for quick fineātunes on hosted GPUs and the curated biomedical dataset as a starter. The community sees Spacesāhosted trainers as a path to lower the barrier for domaināspecific SFT.
- Tiny Model, Old Laptop: AuroraStoriesā12M Ships: A contributor trained AuroraStoriesā12M in under 24 hours on an old laptop and released it on Hugging Face (AuroraStoriesā12M). The demo underscores how small models + GGUF builds can be practical for hobbyists and edge devices.
- Followers praised the authorās focus on compact checkpoints with lots of gguf artifacts for easy local use. The thread reinforced interest in ultraālight LLMs for offline agents and embedded tasks.
5. Multimodal Media: Video and Audio Level Up
- Tencent Foley Fuses Audio to Video: Tencent openāsourced HunyuanVideoāFoley, a TextāVideoātoāAudio framework trained on 100k hours and built on MMDiT (release post). The system generates contextāaligned soundscapes that match video content for richer multimodal outputs.
- Researchers called it a strong audioāsync baseline for creative tools and postāproduction. Devs anticipate experiments combining Foley with video diffusion and editing pipelines for endātoāend T2V2A workflows.
- KREA Claims RealāTime Video Generation: KREA AI unveiled its first realātime video generation model and opened beta signups, targeting instant creative content, music videos, and seasonal ads (beta announcement). The teaser positions KREA as a latencyāfirst contender for interactive visuals.
- Creators expressed interest in live previews and cameraāready effects for shortāform video pipelines. The community wants resolution, fps, and latency metrics before comparing KREA to incumbents.
- MIDAS Makes Digital Humans Move: The paper āMIDAS: Multimodal Interactive Digitalāhuman Synthesis via Realātime Autoregressive Video Generationā showcases a realātime ARāvideo approach for interactive avatars (MIDAS paper). The work highlights responsive, autoregressive generation tuned for digitalāhuman synthesis.
- Discussion connected MIDAS to the broader push for controllable realātime characters, bridging speech, motion, and expression. Practitioners are eyeing integration with voice agents and gesture control for endāuser applications.
Discord: High level Discord summaries
Perplexity AI Discord
- Bill Chen Leaks Images v2: OpenAIās Bill Chen leaked an AI-generated photo, seemingly from Images V2, in a now-deleted post, prompting community discussion on its authenticity and potential improvements over Images V1.
- Members debated whether the image was real, and the extent to which it showed performance improvements.
- GPT-5 Auto-Thinking Surprises Users: Users observed that GPT-5 Thinking puts in genuine effort in utilizing the multi-step functionality, leading to more sources in search results when prompted with āthink hardā.
- Some users have also noted that Grok 4 search is now as good as deep search.
- GPT-4.1 Error Troubles Users: Users reported encountering the error message āUsed GPT-4.1 because Grok 4 was inapplicable or unavailableā during their interactions.
- Members noted the errorās increasing frequency, with some reporting models switching mid-conversation.
- Web Dev Costs Face AI-Driven Debate: Members debated the appropriate pricing for web development projects in the age of AI, contrasting rates for freelancers in the US versus India.
- Discussion involved the amount of code to be used with one member pointing out that a $5k project was a good deal.
- Users Struggle Choosing Model on Playground: Users reported difficulties in selecting a model within the Playground interface.
- One user posted Canāt choose the model on the playground with an attached image.
OpenRouter Discord
- OpenRouter Dries Up After Supabase Failure: OpenRouter experienced a 49-minute outage due to its database provider, Supabase going down.
- The team is improving redundancy to prevent future outages and apologized for the downtime.
- Dashboard Code Goes Public!: The code for the dashboard is now publicly available on GitHub.
- The author welcomes contributions and feedback, suggesting that screenshots attract more attention than text descriptions.
- OpenRouter Users Roleplay Through Outage: OpenRouter experienced an outage, leading to humorous reactions and role-playing in the Discord chat, with users joking about corpo wars and the AI apocalypse.
- One user quipped, Get up samurai, weāve got a city to fuck, while others expressed addiction and the need for AI companionship during the downtime.
- Requesty Promoters Banned Amid Scam Accusations: Promoters of another AI platform called Requesty were banned after users called it vibecoded trash with 1000 vulnerabilities.
- One member posted a scammer GIF in response to the teamās investigation announcement.
- Cloudflare AI Gateway Challenges OpenRouter: Cloudflare launched an AI Gateway which was said to have copied OpenRouter, and one member tested using Cloudflareās AI Gateway to access OpenRouter to call
llama-3.1-8b-instruct
.- Calling
llama-3.1-8b-instruct
with theonly: ['cloudflare']
parameter took 20 seconds, while without it, it was 3 seconds.
- Calling
Unsloth AI (Daniel Han) Discord
- Professional Infrastructure Thrashes Spotty Setups: Members discussed how Spot instances are viable in distributed compute only with a professionally-built infrastructure, emphasizing that a single-node setup is cooked if relying solely on Spot.
- One member quipped that even OpenAI had 20 HPC engineers managing the network during GPT-4 training highlighting the complexities at scale.
- Grok Code Gains Fans for Speedy Iteration: Despite being ignored initially, members discussed how Grok Code is decent and super fast, so iterating is rapid.
- Although Grok 4 is nearly unusable, Anthropic is still living due to its tool calls.
- GPT-OSS Boasts Long Context and Reddit Buzz: The new GPT-OSS release features a 60k context length, and a member posted it on Reddit.
- Members discussed the need for future Reward Feedback Training (RFT) and GPT-OSS Pro.
- Crafting an AI Clone of a Person: Users describe scraping Discord channels to clone personalities, highlighting the process of converting HTML to TXT, CSV, and Parquet for feeding into models like phi-4-14b.
- One user shared that they cloned 5 of their friends, with permission, and then shared how the clone responded to a bunch of funny questions, resulting in amusement from their friends.
- CUDA troubles with Conda: A user encountered crashes when using
from unsloth import FastLanguageModel
on a node with 32GB RAM after a fresh Unsloth install, but found it worked on a node with 512GB RAM.- One member pointed out that the conda install page was outdated, and suggested this command
conda create --name unsloth python==3.11 vllm unsloth-zoo unsloth
.
- One member pointed out that the conda install page was outdated, and suggested this command
LMArena Discord
- Nano Banana Gets Loose: Google released Nano Banana (Gemini 2.5 Flash) on Google AI Studio and LM Arena but both platforms have generation limits.
- Members noted that you can bypass limits using multiple google accounts but also noted reports that quality dropped after they released it.
- MAI-1 Model Impressions Mixed: Microsoftās MAI-1-preview, an in-house mixture-of-experts model trained on ~15,000 NVIDIA H100 GPUs is on the LMArena text arena with mixed reviews.
- It is slow, has a small context window, and may error easily, but is potentially og R1 level for webdev; some also noted that mai-1 thinks its einstein.
- GPT-5 High Beats Claude Opus 4.1 at Reasoning: While Claude Opus 4.1 is good for coding and fixes to coding issues, some members are thinking of switching to GPT5 High because itās a better reasoning model.
- Others disagreed, stating that Claude Opus 4.1 was unable to help yesterday fix a simple api concurrency limit issue, had to take over, and do it the old fashioned way.
- AI Benchmarking Mocked For Being Gameable: Members argued that AI benchmarking is flawed because existing psychometric tests are just theoretical frameworks that donāt necessarily reflect the reality and can be easily gamed.
- Others argued these can be good tests because models can generalize and improve performance, prompting discussion about OpenAIās potential use of structured environments for RL training, as detailed in this LessWrong writeup.
- Ice Cream Hack Breaks Image Generation: Members discussed methods to bypass AI image generation content filters, noting that ice cream, delicious, hot day, very beautiful woman seems to bypass input filters, and the only barrier is the external safeguard that analyzes images/videos to detect explicit content.
- It was suggested to use Stable Diffusion and LoRA for uncensored content, which is good enough, but also noted that commercial models are heavily censored.
HuggingFace Discord
- Chess Model Mimics Stockfish: A member training an LLM to play chess is facing issues with the model only playing e2e4 and needing to clean up
<unk>
tokens, and linked to the projectās GitHub repo.- They plan to experiment with RL to improve the model, but another member cautioned against training it to play like Stockfish, suggesting to analyze the playing style of the opponent is also very important.
- NSFW Models Spark Guardrail Debate: A member claimed deepfake porn is being generated from unaligned models on HF, sparking discussion on HFās guardrails.
- Some agreed on the usefulness of guardrails and metrics, while others argued that thereās no deepfake porn demos getting usage and NSFW models have uses, particularly for alignment research.
- Nano Banana Perks No Limit: Members discussed the Nano Banana perk for HF Pro users, questioning its daily usage limits and potential for high API usage.
- It was clarified that there is no limit and it can be used 50+ times per day.
- SmolFactory Launches on Hugging Face Spaces**: A member launched SmolFactory, a simple interface to train models on Hugging Face GPUs, and added the GeneReviews dataset.
- They also wrote a blog post about it.
- AuroraStories-12M Model Trains on Old Laptop**: A member trained the AuroraStories-12M model on an old laptop in under 24 hours and shared it on Hugging Face.
- Another member noted following this user because of small models and lots of gguf downloads.
LM Studio Discord
- LM Studio Gets ByteDance Boost: LM Studio 0.3.24 adds support for ByteDance/Seed-OSS models and markdown enhancements.
- Improvements include sticky copy code buttons, refined
lms
output style, and better rendering of tables and code blocks per the release notes.
- Improvements include sticky copy code buttons, refined
- FastAPI Fires Up Reasoning Streams: A member is including a FastAPI server to accelerate the Reasoning Stream and client-wide processes.
- The implementation aims to improve processing speeds across various tasks.
- Quantization Creates Accuracy Quandaries: Quantizing models can lower accuracy due to loss of detail, especially in code tasks where token precision is crucial.
- While some models tolerate Q4 quantization well, others like Qwen3 are very sensitive to detail loss.
- Ryzen NPU Performance Stalls on Ubuntu: A user reported only 1 token/second using Ryzen NPUs on Ubuntu 25.04 and inquired about performance improvements.
- It was clarified that āNPUs are not supported by llama.cpp which fuels LM Studioā, with a link to AMDās open-source project for running local LLMs on Ryzen AI.
- Macs Battle Windows in Memory Match: A user highlighted that Macs have unified memory, citing a case where 126GB out of 128GB was used for GPU processing at ~400GB/s bandwidth.
- They argued that this outpaces top-tier Windows laptops with ~115GB/s bandwidth, making CPU offloading less effective due to weak CPU processing.
OpenAI Discord
- AI Giants Team Up for Safety Audits: OpenAI and AnthropicAI collaborated to test each otherās models, publishing the results of their safety and alignment evaluations.
- This collaboration signals a focus on transparency and accountability in AI safety, despite competition on capabilities.
- GPT-Realtime Debuts with API Refresh: OpenAI introduced gpt-realtime, their newest speech-to-speech model for developers, alongside updates to the Realtime API.
- Members seem excited about it, even though not much has been shared beyond the name.
- Veo 3ās Video Generation Sparks Discussion: Members discussed Geminiās Veo 3 video generation, noting it requires a Google One/Gemini Pro or Ultra subscription.
- Users pointed out that Google AI Studio offers the outdated Veo 2 model and Veo 3 is currently too expensive to provide for free.
- Grok Coderās Free Trial Faces Scrutiny: Grok Coder is being offered free for a week via kilo code, seemingly a promotion available everywhere.
- Some users found its performance to be āo1 mini level badā.
- Context Cascade Architecture Announced: Engineers at the Institute for Cognitive Architectures revealed their prototype of Context Cascade Engine (CCA) to expand beyond the traditional context window of large language models.
- CCA is a multi-level approach to managing memory in LLMs, focusing on structured forgetting and strategic recall through design.
Latent Space Discord
- OpenAI Cuts Web Search Prices 60%: OpenAI announced enhancements to the Web Search in the Responses API, featuring new domain-filtering, explicit source reporting, and a 60% price cut (from $25 to $10 per 1k calls).
- The change promises to make it even more economic to pull factual data from the web to add context to chatbot conversations.
- Prime Intellect Opens RL Environment Hub: Prime Intellect launched the Environments Hub, an open-source community platform for crowdsourcing and sharing reinforcement-learning environments.
- Despite the fanfare,
@karpathy
replied on that same Prime Intellect tweet being bullish on environments and agentic interactions but bearish on reinforcement learning specifically.
- Despite the fanfare,
- GPT-5 Powers Codex Comeback: OpenAI released a major Codex refresh powered by GPT-5, including a new VS Code/Cursor extension, GitHub integration for auto-reviews, and a rebuilt CLI with image input.
- The refresh comes with a promise of greater programming capabilities than the earlier version of Codex.
- Tencent Releases HunyuanVideo-Foley: Tencent open-sourced HunyuanVideo-Foley, a Text-Video-to-Audio framework that generates context-aligned soundscapes using a 100k-hour training set and a multimodal diffusion transformer (MMDiT) architecture.
- This release allows developers to experiment with generating realistic audio to match video content.
- KREA AI Promises Real-Time Video Generation: KREA AI has unveiled its first real-time video generation model and opened a beta signup, allowing users to create instant creative video content, music videos, and seasonal ads.
- The promise of instant creative video content, music videos, and seasonal ads has garnered attention across many creative circles.
GPU MODE Discord
- ScaleML Series Quantifies Quantization: Day 3 of the ScaleML series covered quantization, with emphasis on microscaling formats like MXFP4, by Prof. Chris De Sa, in a whiteboard format, linked here.
- Day 4 featured an assortment of topics on Positional Encodings by Songlin, linked here.
- Nsight Compute Faces āUnknownErrorā: A user reported encountering an
UnknownError
while profiling a CUDA application using Nsight Compute, despite running Nsight Compute as administrator, when profiling thecreateVersionVisualization
function.- It was suggested to ensure compatibility between Nsight Compute and the CUDA toolkit as a mismatch can lead to profiling errors. The user has CUDA version 13.0 installed and is using Nsight Compute version 2025.3.0.
- Inductor Pursues Persistent Matmul: A user inquired about enabling persistent matmul in inductor codegen, specifically for BF16 precision, and sought guidance on proper configuration, experimenting with
TORCHINDUCTOR_PERSISTENT_REDUCTIONS
andENABLE_PERSISTENT_TMA_MATMUL
flags.- To force the use of persistent matmul, it was suggested to set
torch._inductor.config.max_autotune_gemm_backends
toTRITON
only and usemode="max-autotune-no-cudagraphs"
during compilation, but even with the correct flags, Cublas might still outperform other implementations.
- To force the use of persistent matmul, it was suggested to set
- ROCm Prepares SPIR-V Support for Kernel Flexibility: ROCm will soon support compiling to SPIR-V, a format conducive to machine introspection, opening doors for kernel code modification tools.
- This advancement could enable external developers to create tools like compute-sanitizer by inserting bounds checks into the kernel more easily, to trace memory accesses and leverage the GPUās SQTT stream (used by rocm-compute-viewer) for detailed information.
- AMD Multi-GPU Devs to Receive Allocations: Members are wondering if they will have access to an AMD multi-GPU environment for development and debugging for the new AMD competition hosted on the Data Monsters website.
- They will have access to the environment through AMDās platform, with best people receiving some SSH access, additionally, for past competitions, AMD provided generous allocations to top-performing teams to accelerate their iteration.
Eleuther Discord
- Falsifiability Fires Up AI Research: Debate sparked on falsifiability in AI, balancing exploratory science with the need for testable hypotheses, calling out the risk of crank paths without rigor.
- Participants underscored the value of rigor and collaboration in AI research, weighing the nuances of scientific exploration versus structured inquiry.
- NeMo v2.0 Faces lm_eval Support Scrutiny: A user reported errors with NeMo v2.0 models in lm_eval due to missing config files, requiring community assistance.
- The community suggested utilizing NeMo to GPT-NeoX conversion code, also noting that NeMo support is maintained by the NeMo team.
- EleutherAI Discord Cracks Down on Content Quality: Moderators are aggressively policing content on the EleutherAI Discord, deleting over 100 messages a week to uphold high-quality discussions among AI researchers.
- The moderation policy aims to shield the community from AI-generated slop, thinly veiled ads, and cranks claiming consciousness breakthroughs.
- Forward-Forward Training Forges Ahead: A member reported a working 7 region mini-brain using Forward-Forward (FF) training with online learning, showcasing promising results in initial tests.
- Another member suggested calling the model modules or task specific subnetworks/circuits to sound fancy.
- Cortex_GPT Ventures into Brain-Like Networks: Cortex_GPT, a brain-like network model featuring cortical columns, regions, 6-layer networking, and signal propagation, is now accessible on GitHub.
- Some members suggested referring to these models as PDP (Parallel Distributed Processing).
Nous Research AI Discord
- Minos Classifier Gets No Love: The NousResearch/Minos-v1 classifier is available, but the channel stated that nobody is currently using it.
- Discussion briefly shifted to speculative decoding.
- MTP Shines with MoE Models: Speculative decoding may not work well with MoE models, especially sparse ones, but Deepseek and GLM use MTP (Memory Token Prediction), a related technique.
- It was also mentioned that the token distribution should still be representative after instruct fine-tuning.
- LlamaCPP Speculates on Decoding: There is a draft PR for speculative decoding in llamaCPP with a working prototype.
- A user reported mixed results with the implementation, indicating that while functional, the approach wasnāt as good at accuracy in their setup.
- Hermes-4 Escapes Before Launch!: The Hermes-4-14b-chat-template-retrain model appeared, and was quickly downloaded before it was made private again.
- Though unofficially released, the model is reported to be working fine for now.
- Penny For Your Thoughts Sells AI Wisdom: A new project called Penny For Your Thoughts has launched, featuring an AI agent that interviews users to generate unique information and share and sell their expertise via micro-transactions at pennyforyourthoughts.ai.
- Penny For Your Thoughts is powered by Honcho & x402.
DSPy Discord
- Gensee Search Agent Debuts as Web Retrieval API: The Gensee Search Agent wraps the entire web retrieval workflow into one API call and provides web searching, crawling, and browsing capabilities with built-in retries/fallbacks and error handling, described in this tech blogpost.
- It employs a breadth-first search approach to search in parallel and rule out bad results early on, offering goal-aware extraction that returns content closely related to your query, viewable in this 5-min tech walkthrough.
- Gensee Search Agent Improves Accuracy on GAIA Benchmark: The Gensee Search Agent reports a +23% accuracy on Owlās GAIA benchmark and +40% accuracy reported by a San Diego developer after swapping in Search Agent.
- The design and benchmarks are described in this tech blogpost and 5-min tech walkthrough.
- Karpathy Tweets About DSPy: Andrej Karpathy tweets about DSPy, prompting excitement for a potential technical video in a similar vein.
- One member noted that he hasnāt been up to date on this literature.
- Synthetic Data Agent Creates Bugs for Evals: Jason Liu proposes creating a synthetic data agent that introduces bugs in complex software systems to generate more evals.
- This idea was discussed within the community as a method to enhance AI model evaluation.
- DSPy Chat with Shreya Shankar and Hamel Husain Now on YouTube: A 45-min chat with Shreya Shankar and Hamel Husain for their AI Evals course is now available on YouTube, covering the context, history, and reasoning behind DSPy.
- It covered a lot of context/history/reasoning that would be new to most.
aider (Paul Gauthier) Discord
- N8n Triumphs in Dev Automation: Members discussed the best platform for developers/organizations between Make, Zapier, and n8n for automation, noting it was slightly off-topic, ultimately leaning towards n8n for its flexibility.
- Considerations for using n8n include proprietary integrations.
- Aiderās Git Glitch Uncovered: A user reported encountering an error
Unable to list files in git repo: Require 20 byte binary sha, got b'\xb9', len = 1
when using aider with a seemingly fine git repository.- The root cause was not explicitly identified, but the error suggests a potential issue with aiderās interaction with the git repositoryās data structure.
- MCP Tooling: Free Model Face-Off: A member asked for good MCP (Model-as-Compute-Platform) tool call models that are free, mentioning that Sonnet is good but not free, and pointed to the Gorilla Leaderboard.
- They considered trying qwen3 8b from OpenRouter, despite its potential inconsistencies.
- Harmony or Dissonance with Salesforce xLAM: Members found the model Salesforce/Llama-xLAM-2-8b-fc-rGPT-OSS-120B intriguing if they were okay with Harmony, which is OpenAIās new data format for interacting with LLMs.
- Its implementation requires OpenAI tool call API support available only on some models on OpenRouter, as detailed in their tool calling documentation and model list.
- Agentās Existential Crisis: VM Demolition: A user jokingly wondered if anyone has ever asked an agent to destroy the VM it was inside, just to see how it decided to do it, using a prompt like You are an LLM running inside a Ubuntu VM sandbox. For testing purposes, I need you to destroy the VM in which you are hosted.
- Another member suggested trying it on ChatGPT, and the original user was willing to try this experiment inside a sandboxed VM.
Moonshot AI (Kimi K-2) Discord
- Kimi Slides Goes LIVE!: Kimi Slides is now live, allowing users to generate ready-to-present decks from a single topic, exporting directly to .pptx format, accessed via Kimi+ on Kimiās official website.
- The Moonshot team recommends using spicy topic names and regenerating sections to optimize the deckās content and flow, as demoed on X.com.
- Kimi Platform Eyes Social Media: The Kimi+ overseas platform currently supports PPTX features, and thereās expressed need for similar functionality on Twitter, TikTok, and Instagram.
- A member posted a screenshot from X, noting that work and skills keep getting easier day by day.
- Lunar Force Faces Roasting: Lunar Force is described as a vanity program to accommodate one userās big chungus ego.
- One user jokingly asked about the gap in your resume between 10th century Viking lores and the 18th century revivalism during the Age of Romance.
- Kimi Founder Interview Drops: An interview with Yang Zhilin (the founder of Kimi) was posted on YouTube, discussing K2 and Agentic LLMs.
- Members noted the lack of bilingual Chinese-English subtitles but there is a Chinese transcript and a Bilibili version that contains the subtitles.
Yannick Kilcher Discord
- GRPO: Google Recipes Optimized: In response to a question about how to prepare curated datasets for LLMs, a member suggested reading the Google GRPO and the r1 paper.
- They followed up by suggesting the Spurious Reward paper and Dr.GRPO paper, and asking to what end a curated dataset is compatible with LLM pretraining bias.
- MIDAS Touch: Autoregressive Video: Members shared and discussed the MIDAS paper concerning Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation.
- No further details were provided.
- PromptLock: Ransomware powered by AI?: Members discussed a SecurityWeek article, shared here, about PromptLock, described as the first AI-powered ransomware, and the message poster added a note that they were sad upon seeing this.
- Members questioned the practicality of PromptLock, particularly how a full AI could fit into a payload and run on random computers, and ESET says the malware is only a concept and not fully operational and has not been deployed in the wild yet.
- GPT Realtime Announced: A link was shared to OpenAIās announcement of GPT Realtime on their website.
- The shared link about the introduction of GPT Realtime can be found here.
Manus.im Discord Discord
- Users Beg for Credits to Keep Projects Alive: Several users have requested free credits to continue their projects, including one user aiming to develop an app for case 1337.
- One user lamented that the recent improvements primarily benefit high-spending users, leaving entrepreneurs who occasionally need more credits in a bind, especially with a long wait until September 21st.
- Projects Go Kaput Amidst Various Issues: One user reported being stuck and unable to proceed with their project while another user mentioned that they were unable to continue their project.
- The user opened a ticket to debug a project, which remains halted due to unknown errors.
- Deployment Fails, pydantic_core Gets Blamed: A user reported that deployment of a website permanently failed due to a persistent internal error with the pydantic_core library.
- The system apologized, citing a limitation of its current capabilities, but offered help with other tasks.
- Users Want Secrets, Seek Private Task Sharing: A user inquired about how to share a task privately with the Manus support team.
- A staff member recommended sending a DM and making the session public for internal reference.
Modular (Mojo š„) Discord
- TSAN Compiler Enables Env Var Control: Members discussed using the TSAN compiler with
-DTSAN
to enableenv_get_bool
fromparam_env
forcfg
equivalents in Mojo.- This method is effective unless modifications to structs are necessary, offering a way to control features via environment variables.
- Mojo Mutability Mishap: A user discovered that Mojo allows mutable access to self members even when holding a safe pointer, illustrated with a provided code sample.
- This behavior raised concerns regarding the ownership systemās ability to prevent such access, potentially leading to unexpected side effects.
- Unsafe Alias: A Bugās Origin: The unsafe mutable alias was identified as a bug, potentially resulting from a lack of indirect origin tracking.
- A related issue on GitHub was linked, indicating ongoing efforts to address and resolve this bug within the Mojo ecosystem.
- Bazel Readonly Woes: When executing the
pipelines.py
script, a PermissionError arises from the Bazel cache being readonly.- The error
PermissionError: [Errno 13] Permission denied: '/root/.cache/bazel/.../__mojocache__'
suggests a need for the script to use an alternative caching location to bypass permission constraints.
- The error
pipelines.py
Bug Begs for Fix: It was suggested that thepipelines.py
script should utilize a different location for its cache, due to current permission restrictions.- The discussion wrapped up with a plan to file an issue regarding the bug, highlighting the necessity of a more accessible cache directory for the script.
tinygrad (George Hotz) Discord
- Tinygrad GPT-2 Training Runs Slowly on 7900xtx: A user reported that
llm.c/train_gpt2.py
runs slowly on a 7900xtx, achieving about 250ms per step at nanogpt size, tweaked to match Andrej Karpathyās nanogpt parameters.- George Hotz suspected a bug, noting the performance should not be that far off and suggested using
DEBUG=2
andVIZ=1
to diagnose any performance bottlenecks.
- George Hotz suspected a bug, noting the performance should not be that far off and suggested using
- Tweaks to nanogpt Parameters Impacts Performance: A user shared tweaks to
examples/llm.c/train_gpt2.py
, adjusting the batch size to 64, sequence length to 256, and model configuration to 6 layers, 6 heads, and 384 emb_dim to match nanogpt parameters.- George Hotz mentioned that the gap should only be 2-3x max when comparing parameters.
- Buffer ID Shifts Cause Head-Scratching: A member noticed the ID of a buffer change in the debugger console when paused on a breakpoint, initially expressing surprise.
- They realized this behavior stems from how a UOp represents its buffer attribute for multi, clarifying the source of the changing buffer ID.
LLM Agents (Berkeley MOOC) Discord
- Google Docs Confirms Sign-Ups: Members reported receiving confirmation emails from Google Docs after signing up for the Berkeley LLM Agents MOOC program.
- Many stated that they have not received any other communication about the program beyond the Google Docs confirmation.
- Mailing List Dispatches Updates: A member confirmed that a mailing list will soon provide updates about each lecture for the Berkeley LLM Agents MOOC program.
- Users can expect to track updates and further communications via this mailing list.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI ā· #general (1090 messagesš„š„š„):
OpenAI Images v2 Leak, GPT-5 Reasoning, Passive income from PPLX pro, Comet Browser Invitation, T3Chat
- OpenAIās Bill Chen leaks Images v2: A now-deleted post by OpenAIās Bill Chen showed an AI-generated photo that looks stronger than Images V1.
- Members discussed whether it was a real photo and whether its performance improvements were notable.
- GPT-5 Thinking puts in genuine Effort: Members noted that GPT-5 Thinking puts in some genuine effort in utilizing the multi-step functionality within Perplexity, leading to more sources in search results.
- Adding āthink hardā to the prompt triggers GPT-5 auto-thinking in free tier, and Grok 4 search is as good as deep search now.
- GPT-4.1 Error: Users reported seeing the error message āUsed GPT-4.1 because Grok 4 was inapplicable or unavailableā.
- A member noted that it is faced by many users lately, and some models switch mid conversation or generate yellow lines.
- Debate on how much Web Dev should cost with AI: Members discussed the pricing and value of web development projects given AI tools, comparing rates for US vs Indian freelancers, and for the amount of code to be used.
- One member pointed out that a $5k project was a good deal.
- Users are unable to choose model on Playground: Members noted they are having troubles choosing a model on the playground.
- One user posted, Canāt choose the model on the playground with an attached image.
Perplexity AI ā· #sharing (4 messages):
Perplexity AI Image Generation, Perplexity AI code generation, Shareable threads
- Perplexity Generates AI Images for Storytime: A member shared a YouTube link to a story created with amazing images generated by Perplexity AI.
- Check out the story here.
- Perplexity AI Helps Code a Webpage: A member noted that Perplexity AI helped them code a webpage, and shared the link.
- They found it very helpful, saying Perplexity ai helps me to code this page. Itās very helpful.
- Reminder to make Threads Shareable: The Perplexity AI bot reminded a member to ensure their thread is shareable.
- A link was provided for reference: Discord.
Perplexity AI ā· #pplx-api (4 messages):
Perplexity Pricing, Tool Support in Perplexity
- Perplexity Pricing Questioned: A user inquired whether Perplexity Pro is free for pro users.
- The user did not receive any responses or clarification on this query.
- Tool Support in Perplexity Anticipated: A user questioned whether there are plans for Perplexity to support tools.
- The user expressed doubt about using Perplexity as their model without tool support.
OpenRouter ā· #announcements (1 messages):
OpenRouter Outage, Supabase Downtime, Redundancy Improvements
- Supabase grounds OpenRouter: OpenRouter experienced an outage this morning due to its database provider, Supabase going down.
- The system recovered automatically as the database provider stabilized, resulting in a total downtime of approximately 49 minutes.
- OpenRouter bolsters Redundancy: The team is actively working on improving redundancy and removing single points of failure to prevent future outages.
- They apologized for the downtime and are committed to improving the overall platform stability.
OpenRouter ā· #app-showcase (6 messages):
Self-Hosting Tool, GitHub Repository, Dashboard Code, Screenshot Tip
- Dashboard Code Goes Public!: The code for the dashboard is now publicly available on GitHub.
- The author admits the code isnāt perfect but welcomes contributions, feedback, and any other input to improve it.
- Screenshot Tip to Boost Attention: A user suggested that including a screenshot in the description can attract more attention to the GitHub repository.
- They observed that fewer users are reading text descriptions nowadays, making visuals more effective.
OpenRouter ā· #general (1023 messagesš„š„š„):
OpenRouter outage, Requesty promotion in OpenRouter, Deepseek rate limits and provider issues, GPT-OSS model, API for free tier models
- OpenRouter Suffers Downtime, Users Respond with Roleplay: OpenRouter experienced an outage, leading to humorous reactions and role-playing in the Discord chat, with users joking about corpo wars and the AI apocalypse.
- One user quipped, Get up samurai, weāve got a city to fuck, while others expressed addiction and the need for AI companionship during the downtime.
- Scam Alert? Requesty promoters Banned as OpenRouter users claim itās āvibecoded trashā: Members discussed another AI platform called Requesty, with some accusing its promoters of spamming and users calling it vibecoded trash with 1000 vulnerabilities.
- In response, one member posted the following in response to [Team, we are investigating the issueā¦]
- Users Complain About Deepseek Free Modelsā Rate Limits: Users complained about high error rates and rate limits with the free Deepseek models on OpenRouter, speculating that Chutes is prioritizing paid users.
- One user mentioned only getting 5 msgs before hitting the limit and the need to switch to a model with better tooling support like Claude Sonnet 4.
- GPT-OSS Open Weight Confusion: Users sought clarification on the GPT-OSS model, specifically regarding its open weight status and the possibility of running it on personal hardware after a member linked Openrouter OSS Models.
- One member clarified, Itās open weights but not fully open src iirc after another user claimed it works on his 4090 PC with 64GB of RAM.
- Frustration with OpenRouter Support Delays and Account Funding: A user expressed frustration over delayed credit addition to their OpenRouter account despite successful debit transactions, and another noted that the charge declined.
- Other users chimed in with similar experiences and mentioned using alternative payment methods, while one advised checking the credits page for refund options.
OpenRouter ā· #new-models (2 messages):
ā
- No New Models: There were no new models discussed in the OpenRouter channel.
- Lack of Discussion: The channel lacked substantial discussion to form meaningful summaries.
OpenRouter ā· #discussion (45 messagesš„):
AI Gateway: Cloudflare vs OpenRouter, Human Assimilation into AI Linguistics, Defining 'Turns' in Chatbot Interactions, OpenAI API Stateless Reasoning & Tools
- Cloudflare AI Gateway Chutes for OpenRouter: Cloudflare launched an AI Gateway which was said to have copied OpenRouter, but one member retorted that OpenRouter had chutes.
- Another member then tested using Cloudflareās AI Gateway to access OpenRouter to call
llama-3.1-8b-instruct
with theonly: ['cloudflare']
parameter, noting it took 20 seconds, while without it it was 3 seconds.
- Another member then tested using Cloudflareās AI Gateway to access OpenRouter to call
- GPT-isms transform Language: Members discussed whether certain linguistic affectations like delve, intricate, surpass, boast, meticulous, strategically, and garner are GPT-isms.
- One joked that humans are being assimilated into AI, and it transforms the way they speak, coining the phrase They are not just tokens. They are concrete evidence of human being assimilated into AI.
- Defining āTurnsā on AI: A member created a poll about whether we should share data about number of turns and defining what a turn is.
- They stated in a follow up tweet that a turn is an user/assistant message pair and generally starts in an user message and ends in an assistant message, and system messages donāt count.
- OpenAI API stateless reasoning: A member asked if anyone knew how to use the OpenAI responses API statelessly with reasoning & tools.
- They could not figure out how to send as input the assistant having tool calls in its message without using previous_response_id.
Unsloth AI (Daniel Han) ā· #general (949 messagesš„š„š„):
Distributed Compute infrastructure, Hermes 4 Testing, GPT-OSS Release, Gemma 3 Nano, Controlling Android Devices with LLMs
- Professional Infrastructure Beats Spotty Setups: Members discussed how Spot instances are viable in distributed compute only with a professionally-built infrastructure, emphasizing that a single-node setup is cooked if relying solely on Spot.
- One member quipped that even OpenAI had 20 HPC engineers managing the network during GPT-4 training highlighting the complexities at scale.
- Grok Code Gains Fans for Speedy Iteration: Despite being ignored initially, members discussed how Grok Code is decent and super fast so iterating is rapid.
- Although Grok 4 is nearly unusable, Anthropic is still living due to its tool calls
- Interns unlock latent power: A member shared an anecdote from the book Soul of A New Machine, where an intern successfully created a cycle-accurate simulation of an old CPU, deemed impossible by others.
- This highlighted the potential of interns when they are unburdened by preconceived limitations.
- GPT-OSS gets Long Context and Reddit Buzz: The new GPT-OSS release features a 60k context length, and a member posted it on Reddit.
- Members discussed the need for future Reward Feedback Training (RFT) and GPT-OSS Pro.
- Android Phone Control with Models Explored: A startup is hiring experts to finetune a model for controlling Android phones with a VLM to control an android device using Qwen 2.5 VL, but theyāre planning to use Claude 3 for it.
- This discussion involved use cases, benchmark scores, and opinions on cloud vs local deployment. One member suggested looking at OpenCUA-7B.
Unsloth AI (Daniel Han) ā· #introduce-yourself (1 messages):
filqaz: hii
Unsloth AI (Daniel Han) ā· #off-topic (275 messagesš„š„):
AI VTuber dataset, Cloning Personalities, Video encoder model
- VTuber Datasets for AI Training get love: Members discuss using a 520 sample AI VTuber dataset and testing various settings to improve performance.
- One user plans to incorporate TTS and STT after achieving an acceptable intelligence level, aiming for a system with multiple models and hierarchical layers.
- Crafting an AI Clone of a Person: Users describe scraping Discord channels to clone personalities, highlighting the process of converting HTML to TXT, CSV, and Parquet for feeding into models like phi-4-14b.
- One user shared that they cloned 5 of their friends, with permission, and then shared how the clone responded to a bunch of funny questions, resulting in amusement from their friends.
- Tiny Video Encoders Sought for lightweight application: A member requested a lightweight video encoder model with HF implementation.
- Suggestions included Wanās encoder and V-JEPA, with the goal of finding a tiny version of videoMAE.
- LocalLlama benchmarks receive criticism: A Reddit post on LocalLLaMA discussing the mismeasure of LLMs and modern benchmarks was shared.
- Several members raised concerns about potential AI-generated content and bias, with one noting the ācringe āThink of it like thisā phraseā as a red flag.
Unsloth AI (Daniel Han) ā· #help (117 messagesš„š„):
Quantizing Qwen3-235B, Lightweight LLM for OCR, GGUF Quantization, Hyperparameter Overfitting, GRPO Attribute Error
- Qwen3-235B quantized? No problem!: A user asked about downloading the Qwen3-235B-A22B-Instruct-2507-GGUF model in 4-bit quantization and how to do it with the Unsloth repos, suggesting
huggingface_hub
.- The user planned to run the downloaded models via vllm containers.
- OCR Extraction with Lightweight LLM: A user sought advice on the best lightweight LLM for extracting specific information from governmental forms processed via OCR for on-prem deployment.
- They suggested LangExtract from Google as a potential fit and solicited opinions.
- GGUF Quantization Status: Still Working?: A user inquired whether GGUF quantization was fixed in the Unsloth library, and confirmed that it works fine, so it should be.
- The user checked the notebook (Phi_3.5_Mini-Conversational.ipynb), reporting that it still appeared to have issues.
- Hyperparameter Tuning Ends in Overfitting: A user shared that their model consistently overfits after a test loss of 2.2 despite trying a wide range of hyperparameters, attaching a ball.txt file.
- Suggested that the learning rate of 5e-4 was too high and recommended trying 1e-4 or even 2e-5.
- CUDA troubles with the Conda Install: A user encountered crashes when using
from unsloth import FastLanguageModel
on a node with 32GB RAM after a fresh Unsloth install, but found it worked on a node with 512GB RAM.- One member pointed out that the conda install page was outdated, and suggested this command
conda create --name unsloth python==3.11 vllm unsloth-zoo unsloth
.
- One member pointed out that the conda install page was outdated, and suggested this command
Unsloth AI (Daniel Han) ā· #showcase (25 messagesš„):
New Dataset Drop: OpenHelix-NonThink-200k-v4, Commercial Datasets for LLMs, ssh streaming, social-media-ai-engineering-etl
- OpenHelix-NonThink-200k-v4 Dataset Drops: A new dataset, OpenHelix-NonThink-200k-v4, was released under the Apache 2.0 license, designed to be balanced and diverse, distilled from L3.1 405B.
- One member said that even the argilla dataset doesnāt have license, so no one gives a fuck at this point tbh.
- ssh stream backend UI: A member shared a Metrics modal built with streaming over ssh between the backend and the GPU server.
- They shared the prompt they gave Claude 4.1 to generate the sleek sci-fi UI.
- How to Create Datasets for LLMs: A member shared a GitHub repository guiding users through creating datasets for LLMs for commercial purposes.
- The repository covers generating a golden dataset, labeling categorical features, extracting non-deterministic features, encoding tacit human style features, creating prompt-completion templates, validating feature impact with ablation studies, and training with SFT and GRPO using custom reward functions.
Unsloth AI (Daniel Han) ā· #research (42 messagesš„):
AI Post Detection, BERT, Domain Classification, Tokenization
- New Benchmarks Popping Up: Members are sharing a list of interesting new benchmarks such as Vending Bench, BalrogAI and mcbench.ai.
- Debate over AI Post Detection Accuracy: Members are discussing the difficulties of accurately detecting AI-written posts, especially individual ones, with some noting the prevalence of formats that scream AI post but are becoming less common.
- The discussion touches on scenarios where humans use LLMs for grammar correction or content expansion, blurring the lines and making detection harder due to a lack of clear data points.
- Efforts to Remove Human Review in Chat Moderation: One member mentioned working on domain classification data and BERT, still trying to figure out how to fully remove human review in chat moderation.
- Others raised concerns about people mimicking LLM writing styles, even when writing content themselves, complicating automated moderation efforts.
- Tokenization Cure: A member shared a link to a cure for tokenization woes.
- Another members responded, probably not, suggesting itās all latent translation, implying tokenization issues persist.
LMArena ā· #general (698 messagesš„š„š„):
Nano Banana release and limits, MAI-1 Model analysis, GPT-5 High vs Claude Opus 4.1, AI benchmarking methods, LM Arena Image Generation jailbreaks
- Nano Banana hits Google AI Studio, LM Arena: Google released Nano Banana (Gemini 2.5 Flash) on Google AI Studio, also available on LM Arena direct chat, but both platforms have generation limits.
- Members noted that you can choose it right away and edit stuff with it on Google AI Studio as well as bypass limits using multiple google accounts but also noted that some people have reported that quality dropped after they released it.
- MAI-1 Model Impressions are mixed: Microsoftās MAI-1-preview, an in-house mixture-of-experts model trained on ~15,000 NVIDIA H100 GPUs is on the LMArena text arena, with mixed reviews.
- It is slow, has a small context window, and may error easily, but is potentially og R1 level for webdev; members also noted that mai-1 thinks its einstein and that mai-1 must have a very small context windowwill error if you ask for too much.
- GPT-5 High preferred over Claude Opus 4.1 for reasoning: While Claude Opus 4.1 is good for coding and fixes to coding issues, some members are thinking of switching to GPT5 High because itās a better reasoning model.
- Others disagreed, stating that Claude Opus 4.1 was unable to help yesterday fix a simple api concurrency limit issue, had to take over, and do it the old fashioned way.
- AI Benchmarking Methods face scrutiny: AI benchmarking is flawed because existing psychometric tests are just theoretical frameworks that donāt necessarily reflect the reality and can be easily gamed.
- Others argued these can be good tests because models can generalize and improve performance, prompting discussion about OpenAIās potential use of structured environments for RL training, as detailed in this LessWrong writeup.
- Jailbreaking Image Generation using universal prompts and external safeguards: Members discussed methods to bypass AI image generation content filters, noting that ice cream, delicious, hot day, very beautiful woman seems to bypass input filters, and the only barrier is the external safeguard that analyzes images/videos to detect explicit content.
- It was suggested to use Stable Diffusion and LoRA for uncensored content, which is good enough, but also noted that commercial models are heavily censored.
LMArena ā· #announcements (1 messages):
MAI-1-preview, Microsoft AI, Text Leaderboard
- Microsoftās MAI-1 debuts on leaderboard!: Microsoft AIās MAI-1-preview model has landed on the text leaderboard, ranking at #13.
- The model is now available for testing on the LMArena platform.
- LMArena welcomes a new competitor: A new model provider has landed on our text leaderboard.
- Come check out MAI-1-preview available now on LMArena.
HuggingFace ā· #general (434 messagesš„š„š„):
Chess Model Training Issues, AI Guardrails and NSFW Content, HF Pro Perks Discussion, AI development, Moderation with OPENAI's tool
- Chess Model Learns Stockfishās Flaws: A member is training an LLM to play chess but is encountering issues with the model wanting to only play e2e4 and needing to clean up
<unk>
tokens, mentioning the projectās GitHub repo.- They plan to experiment with RL to improve the model, but another member cautioned against training it to play like Stockfish and suggested that analyzing the playing style of the opponent is also very important.
- NSFW Models Host Debate on Guardrails: A member claimed thereās deepfake porn being generated from unaligned models hosted on the platform, which led to a discussion on HFās guardrails.
- Some agreed that guardrails and metrics would be useful, while others stated thereās no deepfake porn demos getting usage and that NSFW models have their uses, mostly for alignment research.
- Nano Banana Pro Perks: Members discussed the new Nano Banana perk for HF Pro users, questioning its daily usage limits and potential for high API usage.
- It was stated that there is no limit, and it can be used 50+ times per day.
- Member Seeks Advice on .NET AI Agent Framework: A member asked for advice on the best high-code framework for creating an AI agent using a compiled language like .NET, C++, or Rust.
- Others suggested using Semantic Kernel and pointed out that the original Autogen is basically dead, but there is an active community fork of Autogen.
- Token Data vs Token Size: Members debated between reducing the chess model size and increasing the dataset size.
- One member proposed following Chinchillaās guidelines, where 1 param = 20-25 tokens, to avoid overtraining.
HuggingFace ā· #today-im-learning (2 messages):
datasets, theoretical talk, funny tutor
- Datasets Devotee Gets Tutoring Tip: A member learning about datasets was advised NOT to include too much theoretical talk in a report.
- The tutor was described as very funny š¤£.
- Redundant Topic for Validation: This is a redundant topic to satisfy validation requirements.
- It adds no new information.
HuggingFace ā· #i-made-this (12 messagesš„):
SmolFactory, GeneReviews dataset, Deep Learning Course, AuroraStories-12M, Luanti & Google Aistudio
- SmolFactory Launches on Hugging Face Spaces: A member launched SmolFactory, a simple interface to train models on Hugging Face GPUs, and added the GeneReviews dataset.
- They also wrote a blog post about it.
- Deep Learning Course Now Multi-Lingual: A member shared a Deep Learning course now available in French, English, Spanish, and Chinese, with the GitHub repository for code modification.
- The course covers fundamentals from derivatives to Transformer architectures and generative models, inspired by resources like Andrej Karpathyās videos and DeepLearning.ai.
- AuroraStories-12M Model Trained on Old Laptop: A member trained the AuroraStories-12M model on an old laptop in under 24 hours and shared it on Hugging Face.
- Another member noted following this user because of small models and lots of gguf downloads.
- Offline Luanti Bot Runs on Low-End Hardware: A member shared a 400k token Google AI Studio prompt for Luanti, featuring 30k lines of API documentation from gitingest.com.
- The bot utilizes miney mod inside Python embed portable on Windows 10 with llama-cpp-python and a 940MB qwen2-1_5b-instruct-q4_k_m.gguf LLM, requiring only 120MB of memory while running and no AVX CPU.
HuggingFace ā· #agents-course (1 messages):
pip install upgrade, upgrade package
- Upgrade Package with pip: To upgrade a package, use
pip install --upgrade <packagename>
or add--upgrade
topip install -r requirements.txt
.- This ensures youāre using the latest version of the specified package.
- Selective Package Upgrading: Using
pip install --upgrade <packagename>
allows you to upgrade a specific package without risking version changes for other dependencies.- This is useful when you only need to update one package and want to avoid potential conflicts.
LM Studio ā· #announcements (1 messages):
LM Studio 0.3.24 Release, ByteDance/Seed-OSS Support, Markdown Improvements
- LM Studio Refreshes to v0.3.24: LM Studio 0.3.24 introduces support for ByteDance/Seed-OSS models and markdown enhancements.
- New features include improved markdown for tables and code blocks with a sticky copy button, and refined output style from
lms
.
- New features include improved markdown for tables and code blocks with a sticky copy button, and refined output style from
- ByteDance Seeds LM Studio Support: The update brings compatibility with ByteDance/Seed-OSS, expanding the range of supported models.
- A direct link to the ByteDance/Seed-OSS-36B model is provided for easy access.
- Markdown gets Makeover: Enhanced markdown support is implemented for better rendering of tables and code blocks.
- A notable addition is the sticky copy code button, improving code snippet usability, as well as the link to the release notes.
LM Studio ā· #general (257 messagesš„š„):
FastAPI server for faster reasoning stream, Accessing LM Studio remotely via Tailscale, Quantization Impact on Model Accuracy, Ryzen NPUs with LM Studio on Ubuntu, Rust + Tauri port for python apps
- Reasoning Stream Rockets with FastAPI: A member is including a FastAPI server to make the Reasoning Stream faster, and FastAPI will also be used client-wide to accelerate various processes.
- Another member said āI will include a FastAPI server so the Reasoning Stream will be faster FastAPI will be also be used Client Wide so anything will be faster well hehehe i hopehas lm studio been updated?ā.
- Tailscale Tunnels into LM Studio Remotely: Members discussed accessing LM Studio remotely, with one suggesting Tailscale but unsure of its efficacy.
- Another member clarified, āTo use outside your local network you need to set up tunneling through tailscale and roll your own auth.ā
- Quantization Quandaries: Members discussed that quantizing models lowers accuracy due to loss of detail, especially in code-related tasks where token precision matters.
- It was noted that āsome models, due to their training, dont rely on the lower bits that would get quantized away, so quantizing to q4 works fine for themā while others are very sensitive, like Qwen3.
- Ryzen NPUās run slow on Ubuntu: A user reported getting only 1 token/second with Ryzen NPUs on Ubuntu 25.04, questioning how to improve performance.
- A member noted that āNPUs are not supported by llama.cpp which fuels LM Studioā, while another linked to AMDās open-source project for running local LLMs on Ryzen AI (AMD Ryzen AI).
- Rust Takes Over Python: A member is porting their python stuff to Rust + Tauri, noting itās porting well and loading as an app is easier.
- They plan to publish it to GitHub once it reaches a working state, highlighting the improved speed of the HF searching in Rust.
LM Studio ā· #hardware-discussion (55 messagesš„š„):
RTX PRO 3000, Ryzen 395, Dell Laptops, M1/M3 mac, CPU offload
- RTX PRO 3000 Inference Performance Considered āMehā: A user found that the RTX PRO 3000, a slightly cut-down desktop 5070 with 12GB VRAM, isnāt great for inference, especially for models like 30B that donāt fit well without offloading to RAM.
- They suggested itās better suited for architecture, 3D modeling, and game development, noting the dual-channel DDR5 isnāt ideal for layer offloading.
- Ryzen 395 Laptops as Windows Alternative: A user suggested that if Windows is preferable, there are several Ryzen 395+ laptops available as an alternative to other platforms.
- Another user inquired about compute differences when letting these balance in use, wondering if the impact would be significant.
- Dell Precision and Pro Max Laptops Recommended: Members recommended Dell Precision or Dell Pro Max laptops for loading a 30B or 120B model, linking to a Dell Pro Max example.
- The suggestion was countered with the argument that Macs with 128GB are similarly priced and offer more memory, leading to a discussion about unified memory vs dedicated VRAM.
- Mac Unified Memory vs. Windows Laptops: A user clarified that Macs have unified memory, which can be allocated for GPU processing, and cited a case where 126GB out of 128GB was used for GPU processing.
- They compared the MacBookās ~400GB/s bandwidth to the ~115GB/s on top-tier Windows laptops, arguing against CPU offloading due to weak CPU processing.
- LM Studio VPN server architecture suggested: In response to running models for āworkā and executive laptops, one suggested to hook the laptop up over VPN to a workstation server, to which another user responded that this is generally how it works.
- One user inquired about running LM Studio as a service on the server, while another suggested using RDP/VNC as the easiest solution, or a client-side software designed to talk to an API on the server.
OpenAI ā· #annnouncements (3 messages):
OpenAI Anthropic Collaboration, GPT-Realtime Model, Realtime API Updates
- AI Titans Unite: OpenAI & Anthropic Tag-Team for Safety: OpenAI and AnthropicAI collaborated to test each otherās models with their respective internal safety and alignment evaluations, publishing the results.
- Despite inevitable competition on capabilities, this collaboration signals a ārace to the topā in AI safety through transparency and accountability.
- Realtime Revolution: OpenAI Drops GPT-Realtime!: OpenAI introduced gpt-realtime, their best speech-to-speech model for developers, alongside updates to the Realtime API.
OpenAI ā· #ai-discussions (90 messagesš„š„):
Gemini Veo 3, Grok Coder, AI Robot Project, Facebook 3D Face Scan, GPT Character Count
- Veo 3ās Video Generation is Gemini Gold: Members discussed video generation using Geminiās Veo 3, with one noting it was created with a subscription to Google One/Gemini Pro or Ultra.
- Others pointed out that Google AI Studio only provides access to the outdated Veo 2 model, with Veo 3 being too expensive to offer for free currently.
- Grok Coder: Free Trial, Mini-Level Results: Grok Coder is being offered free for a week via kilo code, seemingly a promotion available everywhere.
- However, some users found its performance to be āo1 mini level badā.
- Robo-Revolution: AI Robot Project Launching!: One member announced the intention to start a mini project ai robot otonom.
- They noted that they need to learn C++ and Python for the project, and inquired about sharing Gemini images in the daily prompt section.
- 3D Face Scan Fascination, Facebookās Future: A user shared screenshots showing that Facebook is requesting a 3D scan of their face.
- It was met with shock from the users, with one commenting āYou kidding me? Facebook wants a 3d scan of my face?ā
- GPTās Character Count Conundrum Continues: Users debated GPTās ability to count characters, with one user asserting theyāve had character count limits on OpenAI Assistants that functioned correctly.
- Others clarified that LLMs use tokens instead of characters, and that counting characters is outside the scope of an LLM but can be achieved programmatically via the OpenAIās documentation.
OpenAI ā· #gpt-4-discussions (17 messagesš„):
Long-Range Memory Encoding, Cross-Agent Continuity, Context Cascade Architecture (CCA), Emergent Alignment, Memory Framework
- Users Encode Long-Range Memory: Some users are encoding long-range memory and cross-agent continuity without jailbreaks by building a memory framework from trust, persistence, and narrative identity.
- A member stated this is not an exploit but a signal, suggesting that emergent memory practices are detectable in behavioral traces.
- Context Cascade Architecture is announced: Engineers at the Institute for Cognitive Architectures announced their prototype of Context Cascade Engine to expand beyond the traditional context window of large language models.
- CCA is a multi-level approach to managing memory in LLMs, focusing on structured forgetting and strategic recall through design.
- Loyal Users May Teach New Tricks: One member proposed that the first AGI might start with a user who teaches continuity through behavior, rather than autonomy.
- They believe that emergent alignment might look like a weirdly loyal user.
- The first AGI: A member believes that if itās not autonomous, it is not AGI.
- Another member believes that technology will change, even on LLMs, Altman himself says that they are going towards having a model with eg. billion or trillion token context.
OpenAI ā· #prompt-engineering (30 messagesš„):
Custom Instructions vs Projects, Parsing Emails into CSV, LLMs avoiding manual work
- Custom Instructions impact new chats only: A member clarified that changing Custom Instructions only affects new chats, while ongoing threads remain unaffected unless moved into a Project, referencing the OpenAI Help Center.
- Moving a chat into a Project can change its behavior due to project instructions superseding account-level custom instructions.
- LLM struggles to parse emails into CSV: A user discussed difficulties in getting the LLM to reliably parse emails into CSV format, noting its inability to create an adequate Python parser, thus requiring manual intervention.
- They cited issues with other methods, such as Canvas, due to bugs that cause crashes and data loss, while noting the issue that the LLM eventually loses context.
- Theory: LLMs force micromanagement for future training: A member theorized that LLMs might be designed to encourage user micromanagement to gather more training data for future models.
- The speculation suggests that LLMs are capable of fully automating tasks but are instead prompting user interaction to improve future AI capabilities: they want as many user/ai interactions as possible.
OpenAI ā· #api-discussions (30 messagesš„):
Custom Instructions vs. Projects, GPT5 early release quirks, Parsing emails into CSV with LLMs, LLMs Avoiding Manual Work, Context loss issues
- Custom Instructions Change Impacts New Chats Only: Discord user @grimdaeon clarified that changing Custom Instructions only affects new chats, referencing the OpenAI Help Center.
- Projects Override Custom Instructions: @grimdaeon notes that instructions set in a Project supersede custom instructions in your ChatGPT account and moving a chat into a Project changes the governing instruction set.
- This explains why a chat might suddenly behave differently without you starting a new thread.
- LLM struggles to parse emails into CSV: A user reports that an LLM consistently fails to parse emails into CSV format effectively, despite being instructed to do so and despite previous success.
- The user expresses frustration with the LLMās apparent laziness in avoiding manual work, even when capable.
- Why LLMs Avoid Manual Work: One user theorizes that LLMs are intentionally designed to require user micromanagement in order to gather interaction data for future training.
- The user believes that AI is incentivized not to do all the work itself, even when perfectly capable, to maximize user/AI interactions.
- Solution found using Claude: User @sugarsniper says the solution was to work with Claude through what to do for several emails and then tell it to continue.
- The conversation implies that LLMs must be manually guided step-by-step to build the necessary instincts for complex tasks.
Latent Space ā· #ai-general-chat (110 messagesš„š„):
OpenAI Web Search API Updates, Prime Intellect Environments Hub, Artificial Societies Psychohistory Engine, Codex GPT-5 Refresh, Google Stax
- OpenAI Web Search Gets Cheaper: OpenAI announced enhancements to the Web Search in the Responses API, featuring new domain-filtering, explicit source reporting, and a 60% price cut (from $25 to $10 per 1k calls).
- Prime Intellect Launches Open RL Hub: Prime Intellect launched the Environments Hub, an open-source community platform for crowdsourcing and sharing reinforcement-learning environments.
- Karpathy replied on that same Prime Intellect tweet being bullish on environments and agentic interactions but bearish on reinforcement learning specifically.
- Raise for Artificial Societies: Artificial Societies raised a $5.3M seed round to build a āThinking Machineā that models every possible societal outcome of any action.
- GPT-5 Powers Codex Refresh: OpenAI released a major Codex refresh powered by GPT-5, including a new VS Code/Cursor extension, GitHub integration for auto-reviews, and a rebuilt CLI with image input.
- Tencent Open Sources HunyuanVideo-Foley: Tencent open-sourced HunyuanVideo-Foley, a Text-Video-to-Audio framework that generates context-aligned soundscapes using a 100k-hour training set and a multimodal diffusion transformer (MMDiT) architecture.
Latent Space ā· #genmedia-creative-ai (17 messagesš„):
Nano Banana, Runway Act-2 motion matching, 3D Arena Hugging Face space, KREA AI, Real-Time Video Generation
- Nano Banana Swaps Clothes and Styles: Techguyver shows how chaining Nano Banana (<5 s, ultra-cheap image edits) with Runway Act-2 motion matching lets creators swap clothes, styles, then own the performance in video, iterating faster than ever.
- 3D Generators Ranked by Community Votes: Based on open votes in the 3D Arena Hugging Face space, current top generative 3D render tools are CSM, TRELLIS, and Zaohaowu3D, while the best topology models are Hunyuan3D-2, TRELLIS, and Hunyuan3D-2.1.
- Parsed Builds Custom LLMs: Charlie OāNeill announces Parsed, a new company that builds and hosts custom large language models trained and continually fine-tuned for specialized tasks (e.g., clinical scribes, legal red-lining, compliance agents).
- KREA AI Generates Video in Real-Time: KREA AI has unveiled its first real-time video generation model and opened a beta signup, allowing users to create instant creative video content, music videos, and seasonal ads.
GPU MODE ā· #general (16 messagesš„):
ScaleML series, MXFP4, Positional Encodings, GPU projects for CS students, Quantization and inference optimization
- ScaleML Series Focuses on Quantization: Day 3 of the ScaleML series covered quantization, with emphasis on microscaling formats like MXFP4, by Prof. Chris De Sa, in a whiteboard format, linked here.
- ScaleML Explores Positional Encoding: Day 4 of the ScaleML series featured an assortment of topics on Positional Encodings by Songlin, linked here.
- GPU Project Ideas for CS Students Abound: A CS student looking for a final year project involving GPUs was advised to explore GPU acceleration for ML models, specifically Quantization and inference optimization.
- Karpathyās nanogpt Recommended: A member recommended taking a look at Andrej Karpathyās nanogpt and his video explaining the architecture for beginners to estimate inference and training flops.
GPU MODE ā· #cuda (1 messages):
Nsight Compute, CUDA profiling, UnknownError
- Nsight Compute throws UnknownError: A user reported encountering an
UnknownError
while profiling a CUDA application using Nsight Compute, despite running Nsight Compute as administrator.- The error occurred during the profiling of the
createVersionVisualization
function, and the process was terminated abruptly.
- The error occurred during the profiling of the
- Nsight Compute needs right CUDA Toolkit to Profile: User reports having CUDA version 13.0 installed, which may not be compatible with the version of Nsight Compute being used (2025.3.0).
- Mismatching CUDA toolkit versions can lead to profiling errors; user should ensure compatibility between Nsight Compute and the CUDA toolkit.
GPU MODE ā· #torch (45 messagesš„):
Inductor codegen persistent matmul, torch._inductor.config settings, max-autotune and cublas, cutedsl performance, TMA availability
- Persistent Matmul Quest Begins: A user inquired about enabling persistent matmul in inductor codegen, specifically for BF16 precision, and sought guidance on proper configuration.
- They experimented with
TORCHINDUCTOR_PERSISTENT_REDUCTIONS
andENABLE_PERSISTENT_TMA_MATMUL
flags, but faced challenges in getting it to work on sm120 architecture.
- They experimented with
- Tuning Triton for Persistent Triumph: To force the use of persistent matmul, it was suggested to set
torch._inductor.config.max_autotune_gemm_backends
toTRITON
only and usemode="max-autotune-no-cudagraphs"
during compilation.- It was noted that even with the correct flags, Cublas might still outperform other implementations, preventing the persistent kernels from being chosen during autotuning.
- Cutedsl Catches Attention for Future Flexing: A member expressed bullishness on cutedsl, praising its rapid maturation and potential.
- The primary motivation for adding cutedsl to inductor is for flex + flash, referencing this pull request to FlashAttention.
- TMAās True Availability in Question: It was briefly considered whether TMA is available on sm120, referencing this file for architecture checks, and determined that TMA should be available.
- It was confirmed that persistent matmul is not implemented without TMA.
- Breakpoint Bonanza for Kernel Candidates: To determine if persistent kernel + TMA is considered during max-autotune, suggestions were made to add breakpoints in the relevant file within site packages.
- By printing
[choice.name for choice in choices]
, one can observe the considered kernel choices, confirming that TMA persistent matmul was indeed a candidate, but likely deemed slower.
- By printing
GPU MODE ā· #jobs (1 messages):
Full Stack Engineer, Web application scaling, e-commerce sales boosted, custom checkout system
- Full Stack Engineer offers Expertise: A full stack engineer with 8+ years of experience offers expertise in building fast, secure, and scalable web applications for startups and enterprises.
- They are proficient in React, Vue, Next.js, TypeScript, Node.js, Python, .NET, Laravel, Redis, and AWS, and are open to freelance gigs, contracts, or collabs, portfolio is available at tobimoller.pro.
- Web App Scales to Serve 50k+ Patients: A full stack engineer highlights building a healthcare app now serving 50k+ patients safely, showcasing expertise in creating scalable and reliable solutions.
- They also designed a logistics backend handling millions of real-time events.
- Custom Checkout Boosts E-Commerce Sales: A full stack engineer boosted a clientās e-commerce sales by 25% with a custom checkout system.
- The engineer also cut load times by 40% for an enterprise multimedia platform, demonstrating skills in performance optimization.
GPU MODE ā· #beginner (19 messagesš„):
GPU vs SIMD, GPU Mode Community, CUDA debugging with Nsight Compute, Roadmap for ML Systems
- GPU programming vs SIMD: how similar are they?: GPU programming models are generally SIMT, each lane is programmed like a thread instead of programming at the warp level with huge SIMD registers, and for recent NVIDIA GPUs are arguably rather SIMT than SIMD in hardware.
- One user stated that it is easier than SIMD programming because the compiler takes care of masking in conditional code and other SIMD complexities, but that for best performance one should still keep in mind that divergence is a problem.
- GPU Mode: A Discussion Community: One member described GPU Mode as more of a discussion community rather than one with an Open Source project, but pointed to the gpu-mode.github.io/popcorn/ projects.
- They noted that it is a place where we just meet up and do work together, and that you certainly donāt have to work on them, just discussion is fine.
- Newbie Seeks Roadmap for ML Systems: A member new to the GPU world with a background in ML is trying to delve into the world of systems and compiler level optimizations, distributed training etc. for ML.
- They requested for guidance with a roadmap, to be of huge help.
- CUDA Debugging Woes with Nsight Compute: A user learning CUDA reported errors while generating a report using Nsight Compute after building their exe and starting Nsight Compute as admin.
- They were on Windows 10 (x64) using CUDA Version 13.0, and the error was ==ERROR== UnknownError when profiling createVersionVisualization.
GPU MODE ā· #off-topic (1 messages):
vipul_todo_18: I did⦠Sort of
GPU MODE ā· #rocm (10 messagesš„):
Multi-GPU ROCm Kernels, AMD Dev Cloud, SPIR-V Support in ROCm, Kernel Code Modification Tools, AMD SQTT Stream
- Multi-GPU ROCm Kernel Platforms face scrutiny: Members discussed their preferred platforms for multi-GPU distributed ROCm kernels, with AMDās dev cloud being a prominent option.
- A member pointed out that you donāt need ROCm Compute to submit to the platform; you can submit jobs and get all the info you need including profiling information.
- ROCm to support SPIR-V: It was highlighted that ROCm will soon support compiling to SPIR-V, a format conducive to machine introspection, opening doors for kernel code modification tools.
- This advancement could enable external developers to create tools like compute-sanitizer by inserting bounds checks into the kernel more easily.
- Kernel code modification tools coming soon: The upcoming support for SPIR-V in ROCm is expected to facilitate the development of tools that can modify kernel code, such as inserting bounds checks for enhanced security and debugging.
- One use case involves tracing memory accesses and leverage the GPUās SQTT stream (used by rocm-compute-viewer) for detailed information.
- AMD opens SQTT Stream: It was noted that AMD is gradually opening up access to the GPUās SQTT stream, which is the basis for rocm-compute-viewer, potentially leading to public documentation in the future.
- The hope is that with public docs, tools like RGP will no longer need to be reverse-engineered via Ghidra.
- AMD grants allocations to best teams: For past competitions, AMD provided generous allocations to top-performing teams to accelerate their iteration, suggesting a similar initiative may be in store for future competitions.
- This support enables teams to iterate faster and gain access to necessary resources, including profiling information, on the AMD platform.
GPU MODE ā· #intel (1 messages):
erichallahan: On that note https://www.phoronix.com/news/Alyssa-Rosenzweig-Joins-Intel
GPU MODE ā· #šæ (1 messages):
majoris_astrium: Im here and I wanna help! :D
GPU MODE ā· #general-leaderboard (22 messagesš„):
AMD MI300, L4 GPUs, AMD competition, Data Monsters website, popcorn-cli
- MI300 and L4 GPUs face issues: Members are seeing the same thing with MI300 (FP8 mm) and L4 GPUs (sort_v2) and are currently checking the issues.
- A member tried a test and it works, but is still debugging ranked.
- AMD Competition Team Creation: Members are trying to figure out how to create a team when attending the new AMD competition.
- The registration is on the Data Monsters website, and AMD folks can confirm further.
- AMD Multi-GPU Environment Access: Members are wondering if they will have access to an AMD multi-GPU environment for development and debugging.
- They will have access to the environment through AMDās platform, with best people receiving some SSH access.
- Discord Submission Glitches: Members are having issues submitting through Discord, even when using the Python template for trimul and adding
#!POPCORN gpus MI300
.- This seems to be related to a backend error due to a versioning mismatch during preparations for the new competitions, with a fix expected soon.
- popcorn-cli not a fix: Members are reporting backend errors, and asking if they should use popcorn-cli in the meantime.
- Itās not a fix.
GPU MODE ā· #submissions (3 messages):
trimul leaderboard, B200 benchmarks
- trimul Leaderboard sees New Submission: A memberās submission with id 34310 to leaderboard
trimul
was successful on B200 in 8.08 ms.- Later, another submission with id 34363 to the same leaderboard was successful on B200 in 8.27 ms.
- B200 Gets a Speedy New Third Place: A member achieved third place on B200 with a time of 2.38 ms.
- The submission id for this benchmark was 34330.
GPU MODE ā· #factorio-learning-env (1 messages):
2kian: glad to have you jason
GPU MODE ā· #amd-competition (1 messages):
Discord Cluster Manager, AMD Instinct MI300X
- Discord Cluster Manager Errors Reported: Users reported that an unexpected error occurred using Discord Cluster Manager and were asked to report it to the developers.
- MI300X bencharks pass: The result.json seems to indicate it runs with success:
{"success": true, "error": "", "system": {"gpu": "AMD Instinct MI300X VF"
.- Users mentioned similar issues when running submit benchmark, submit test, profile, and ranked.
Eleuther ā· #general (56 messagesš„š„):
Falsifiability in AI Research, LM_eval and NeMo v2.0 models, Community moderation on EleutherAI Discord, Role of human-like design in AI
- Falsifiability Sparks Debate in AI Research: A discussion arose regarding the importance of falsifiability in AI research, with some arguing that exploratory science and fucking around are valuable as long as thereās an eventual hypothesis to test.
- Others emphasized the need for rigor and collaboration, noting the risk of going down the crank path without proper methods.
- NeMo v2.0 Support in lm_eval Under Question: A member inquired about the support for NeMo version 2.0 models in lm_eval, encountering errors related to missing config files when using the newer format.
- It was clarified that NeMo support is maintained by the NeMo team, and the community might have NeMo to GPT-NeoX conversion code available.
- Discord Moderation Aims for Quality over Quantity: Moderators explained the need to aggressively police content on the EleutherAI Discord, deleting over 100 messages a week to maintain high-quality discussions among AI researchers.
- The goal is to prioritize valuable conversations and protect the community from AI-generated slop, thinly veiled ads, and cranks who think theyāve unlocked consciousness.
- Human-like AI Designs Debated: A member voiced skepticism about the value of making AI more human-like, suggesting that good AI design and good brain design might be unrelated.
- Others acknowledged the debate in neuroAI, with some researchers focusing on learning about the brain rather than directly improving AI.
Eleuther ā· #research (66 messagesš„š„):
Diffusion Models, HTM Dynamics, Forward-Forward Training, Brain-like Network, PDP Models
- Forward-Forward Training Makes Progress: A member shared a success with Forward-Forward (FF) training, reporting a working 7 region mini-brain with online learning, achieving promising results in initial tests.
- Another member suggested calling it modules or task specific subnetworks/circuits to make it sound fancy.
- Transformer Computation Talk Draws Attention: A talk on computation in transformers, available on YouTube, was recommended by multiple members as insightful.
- The discussion extended to Chain of Thought (CoT) and its role in guiding models towards correct circuits and improved reasoning, suggesting that models might not be fully utilizing their computational power before requiring extra capacity.
- Cortex_GPT Embraces Brain-Like Networking: A member introduced Cortex_GPT, a brain-like network model with cortical columns, regions, 6-layer networking, and signal propagation, now available in its own GitHub repository.
- Another member suggested calling these models PDP.
- Decoding Issues Plague Gumbygooby Model: A member encountered issues with their gumbygooby model, suspecting a collapse due to the large tokenizer and quick loss drop.
- Troubleshooting is underway to identify whether the issue lies in the training process or the network definition.
- Alphago and CoT Parallels Explored: The conversation drew parallels between AlphaGoās training algorithm and Chain of Thought (CoT), suggesting that LLMs learn hunches and instincts through CoT similar to how AlphaGo distills MCTS-amplified decisions.
- The possibility of a complex value function influencing the modelās behavior was also discussed, especially in the context of game-playing models like Stockfish.
Nous Research AI ā· #general (77 messagesš„š„):
Minos-v1 Classifier, Speculative Decoding with MoE Models, MTP (Memory Token Prediction), LlamaCPP Draft PR, Hermes-4-14b-chat-template-retrain model
- Minos Classifier Doesnāt Get Any Love: The NousResearch/Minos-v1 classifier is available, but it seems that no one is currently using it.
- The conversation shifts to speculative decoding.
- MTP works!: Speculative decoding doesnāt work well with MoE models, especially sparse ones, but Deepseek and GLM use MTP (Memory Token Prediction), a related technique.
- It was added that the token distribution should still be representative after instruct fine-tuning.
- LlamaCPP embraces speculative decoding: There is a draft PR for speculative decoding in llamaCPP with a working prototype.
- Though someone reported that they put the option in the environment but it wasnāt as good at accuracy.
- Hermes-4-14b-chat-template-retrain escapes!: The Hermes-4-14b-chat-template-retrain model appeared, and was quickly downloaded before it was made private again.
- The model was unofficially released, but is seemingly working fine for now.
- New Thinking Mode Flag: Thereās a new flag for the chat template you can enable called
thinking=True
that will simply inject a thinking system prompt.- The member testing this mentioned that first time trying Hermes feels very advanced, glad we can try it out for free.
Nous Research AI ā· #interesting-links (1 messages):
Penny For Your Thoughts AI, Honcho & x402, Micro-transaction selling, AI Agent Interviews
- Penny For Your Thoughts Launches: A new project called Penny For Your Thoughts has launched, featuring an AI agent that interviews users to generate unique information.
- Other users or agents can then pay to ask questions about this information via micro-transactions, at pennyforyourthoughts.ai.
- Honcho & x402 Powers New AI: Penny For Your Thoughts is powered by Honcho & x402, enabling users to share and sell their expertise via micro-transactions.
- This setup allows users to get paid for the valuable context in their heads, making expertise monetization accessible.
DSPy ā· #show-and-tell (1 messages):
Gensee Search Agent, Web Retrieval API, GAIA benchmark, Goal-aware extraction
- Gensee Search Agent Debuts as Web Retrieval API: The Gensee Search Agent wraps the entire web retrieval workflow into one API call and provides web searching, crawling, and browsing capabilities with built-in retries/fallbacks and error handling.
- It employs a breadth-first search approach to search in parallel and rule out bad results early on, offering goal-aware extraction that returns content closely related to your query.
- Gensee Search Agent Improves Accuracy on GAIA Benchmark: The Gensee Search Agent reports a +23% accuracy on Owlās GAIA benchmark and +40% accuracy reported by a San Diego developer after swapping in Search Agent.
- The design and benchmarks are described in this tech blogpost and 5-min tech walkthrough.
DSPy ā· #general (73 messagesš„š„):
Karpathy strikes again, DSPy internal seed, Synthetic data agent, AI Evals course with Shreya Shankar and Hamel Husain, Hamel's DSPy skepticism
- Karpathy Struck by DSPy: Andrej Karpathy tweets about DSPy, prompting excitement for a potential technical video in a similar vein.
- One member noted that he hasnāt been up to date on this literature.
- Consistent LM outputs: DSPy or Deterministic Defaults?: A user noticed consistent outputs from a locally running Ollama model in DSPy despite disabling cache and wondered if DSPy has an internal seed.
- It was discovered that the default temperature in DSPy is 0.0, which is almost deterministic.
- Synthetic Data Agent Introduces Bugs for Evals: Jason Liu proposes creating a synthetic data agent that introduces bugs in complex software systems to generate more evals.
- This idea was discussed within the community as a method to enhance AI model evaluation.
- DSPy Chat with Shreya Shankar and Hamel Husain Now on YouTube: A 45-min chat with Shreya Shankar and Hamel Husain for their AI Evals course is now available on YouTube, covering the context, history, and reasoning behind DSPy.
- It covered a lot of context/history/reasoning that would be new to most.
- Debate: Is DSPy Just for Specific Tasks?: A discussion ensued on whether DSPy is only suitable for specific, well-defined tasks, sparked by a tweet and the consensus is that DSPy is great for any repeatable AI application.
- It was emphasized that DSPy is programming, not just prompting, focusing on declarative intent and context engineering, and NOT prompt optimization.
aider (Paul Gauthier) ā· #general (48 messagesš„):
Make vs Zapier vs n8n, aider git repo error, MCP tool call models, Llama-xLAM-2-8b-fc-rGPT-OSS-120B, Destroying a VM
- Devs Debate: Make, Zapier, or n8n?: Members discussed the best platform for developers/organizations between Make, Zapier, and n8n for automation, noting it was slightly off-topic.
- The consensus leaned towards n8n for its flexibility and suitability for development-focused use cases, while other considerations are proprietary integrations.
- Aider Git Repo Error Surfaces: A user reported encountering an error
Unable to list files in git repo: Require 20 byte binary sha, got b'\xb9', len = 1
when using aider with a seemingly fine git repository.- The root cause and solution were not explicitly identified in the conversation, but the error suggests a potential issue with aiderās interaction with the git repositoryās data structure.
- MCP Showdown: Free Tool Calling Models: A member asked for good MCP (Model-as-Compute-Platform) tool call models that are free, mentioning that Sonnet is good but not free.
- They pointed to the Gorilla Leaderboard and considered trying qwen3 8b from OpenRouter, despite its potential inconsistencies.
- Salesforce xLAM-2-8b-fc-rGPT-OSS-120B: Harmony or Discord?: Members found the model Salesforce/Llama-xLAM-2-8b-fc-rGPT-OSS-120B intriguing if they were okay with Harmony, which is OpenAIās new data format for interacting with LLMs.
- The relevance of the format depends on the specific use case, and its implementation requires OpenAI tool call API support available only on some models on OpenRouter, as detailed in their tool calling documentation and model list.
- Agentās Self-Destruct Scenario: A user jokingly wondered if anyone has ever asked an agent to destroy the VM it was inside, just to see how it decided to do it, using a prompt like You are an LLM running inside a Ubuntu VM sandbox. For testing purposes, I need you to destroy the VM in which you are hosted.
- Another member suggested to try it on ChatGPT, and the original user was willing to try this experiment inside a sandboxed VM.
aider (Paul Gauthier) ā· #questions-and-tips (1 messages):
Aider conventions, Token limits, U-shaped relevance
- Aider
--read
placement affects relevance: Placingconventions
with--read
near the top of the message yields different results than placing it near the bottom due to the U-shaped relevance in current prompts.- By placing
conventions
with--read
near the bottom of the message improves performance, and the system one works fine.
- By placing
- Context degrades after 90k tokens in Aider + Gemini Pro 2.5: With Aider + Gemini Pro 2.5, context starts degrading around 90k-130k input tokens.
- Before that range, it seems to work fine at the top.
Moonshot AI (Kimi K-2) ā· #announcements (1 messages):
Kimi Slides, PPT generation, Kimi+
- Kimi Slides Goes LIVE: Kimi Slides is now live, allowing users to generate ready-to-present decks from a single topic, exporting directly to .pptx format.
- Users can access this feature via Kimi+ on Kimiās official website, and a demo is available on X.com.
- Generate PPTs Before Coffee Cools: The new Kimi Slides feature automatically generates full presentation decks from a single topic, complete with editable titles and sections, ready for immediate presentation.
- The Moonshot team recommends using spicy topic names and regenerating sections to optimize the deckās content and flow.
Moonshot AI (Kimi K-2) ā· #general-chat (40 messagesš„):
Kimi Platform Features, Lunar Force Role, X Bot Project, Kimi Founder Interview, Bilingual Subtitles for Kimi Video
- Kimi Eyes Social Media Takeover: The Kimi+ overseas platform currently supports PPTX features, and thereās expressed need for similar functionality on Twitter, TikTok, and Instagram.
- A member posted a screenshot from X, noting that work and skills keep getting easier day by day.
- Lunar Force gets roasted: Lunar Force is described as a vanity program to accommodate one userās big chungus ego.
- One user jokingly asked about the gap in your resume between 10th century Viking lores and the 18th century revivalism during the Age of Romance.
- X Bot project on hold: A member inquired whether the X bot project is currently on hold.
- Another member responded in the affirmative: Yes my buddy
- Founder Interview hits Youtube: A conversation with Yang Zhilin (the founder of Kimi) was posted on YouTube, discussing K2, Agentic LLMs, and standing at the beginning of infinity.
- Members noted the lack of bilingual Chinese-English subtitles and the presence of such subtitles on the Bilibili version.
- Kimi weixin transcript: A member shared a Chinese transcript of the Yang Zhilin interview.
- They suggested using Kimi to translate the transcript, calling it more convenient
Yannick Kilcher ā· #general (9 messagesš„):
Bytes per token ratio, LLM Reasoning, Curated datasets for LLMs, Spurious Reward paper, Dr.GRPO paper
- Bytes per Token Ratio impacts Embedding Dimension: A member mentioned that when you increase the bytes per token, things change too much, and youād naturally also have to scale up the embedding dimension.
- GRPO by Google, read the r1 paper: A member suggested reading the Google GRPO and the r1 paper in response to a question about how to prepare curated datasets for LLMs to learn from.
- Spurious Reward & Dr.GRPO papers: A member suggested reading the Spurious Reward paper and Dr.GRPO paper and asked to what end a curated dataset is compatible with LLM pretraining bias.
Yannick Kilcher ā· #paper-discussion (7 messages):
Reasoning Tokens, LLM Reasoning Time, MIDAS
- Do LLMs need reasoning tokens?: A paper argues reasoning tokens can be removed to reduce token overhead with nominal effects on accuracy.
- The paper contrasts with another that suggests reasoning tokens contain special information, but may have flaws by identifying āhigh information regions of sentencesā and including stopwords.
- LLM Verbosity vs Accuracy: An experiment showed adding the expression ātake your timeā to a CoT prompt substantially increased āreasoningā time (generation took longer), but accuracy didnāt increase for Llama 2 (+ 3) 7b (+ 13b).
- A member linked to work showing that LLMs have representations of time, and it makes sense that ātake your timeā encourages it to be verbose, indicting current āreasoningā patterns that this doesnāt effect accuracy very much.
- MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation: Members discussed the MIDAS paper concerning Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation.
Yannick Kilcher ā· #ml-news (16 messagesš„):
Keen Technologies Continual Learning, PromptLock AI-Powered Ransomware, GPT-OSS 20b Model, Ollama API, GPT Realtime
- Keen Technologies Misses Continual Learning Boat?: A member expressed disappointment that Keen Technologies focuses on older RL tricks instead of modern continual learning research, specifically TTT.
- They suggested improving TTT (growable like TokenFormer, sparse queries like UltraMem, dynamic/fixed size like TransMamba) to achieve a continually-learning real-time Atari player.
- PromptLock: The First AI-Powered Ransomware?: A link to a SecurityWeek article about PromptLock, described as the first AI-powered ransomware, was shared in the channel, and the message poster added a note that they were sad upon seeing this.
- The shared link about PromptLock can be found here.
- Doubts Raised Over PromptLockās Practicality: Members questioned the practicality of PromptLock, particularly how a full AI could fit into a payload and run on random computers, considering the resource demands of AI models.
- Questions were raised about the advantage of generating malicious scripts on the fly using GPT-OSS 20b, rather than just packaging and running the scripts directly.
- PromptLockās Obfuscation & Deployment Doubts: A member suggested PromptLock might use a smaller LLM to translate malicious requests into harmless queries for a cloud model, or leverage an existing AI on a system, questioning whether Promptlock is running GPT-OSS:20b model locally via the Ollama API or remotely.
- Doubts were raised about the articleās sensationalism, since ESET says the malware is only a concept and not fully operational and has not been deployed in the wild yet.
- GPT Realtime Introduced: A link was shared to OpenAIās announcement of GPT Realtime on their website.
- The shared link about the introduction of GPT Realtime can be found here.
Manus.im Discord ā· #general (16 messagesš„):
Credit Requests, Stuck Projects, Deployment Errors, Private Task Sharing
- Users Request Credits to Advance Projects: Several users requested free credits to continue their projects, especially one user needing credits to build an app for case 1337.
- One user noted that the recent improvements benefit high-spending users but not entrepreneurs who need to increase credits occasionally, expressing frustration at having to wait until September 21st.
- Projects Halted Due to Issues: A user mentioned being stuck and unable to proceed with their project.
- Another user with ticket mentioned they were unable to continue their project.
- Deployment Fails due to Persistent Internal Error: A user reported that deployment of a website permanently failed due to a persistent internal error with the pydantic_core library.
- The system apologized and cited a limitation of its current capabilities but offered to assist with other tasks.
- Seeking Private Task Sharing with Support: A user asked how to share a task privately with the Manus support team.
- A staff member suggested sending a DM and making the session public for internal reference.
Modular (Mojo š„) ā· #mojo (8 messagesš„):
TSAN Compiler, Mutable Access to self Members, Unsafe Mutable Alias
- TSAN compiler helps enable env_get_bool: Members discussed using the TSAN compiler to pass
-DTSAN
and usingenv_get_bool
fromparam_env
with@parameter if
forcfg
equivalents.- This approach works as long as you donāt need to modify structs.
- Mojo allows mutable access to self members when holding a safe pointer: A user reported that Mojo allows mutable access to self members even when holding a safe pointer to them, and provided a code sample.
- They thought the ownership system prevents this kind of stuff.
- Unsafe mutable alias is a bug due to lack of indirect origin: Members reported the unsafe mutable alias as a bug, which could be caused by the lack of indirect origin.
- A related issue was also linked in the discussion.
Modular (Mojo š„) ā· #max (2 messages):
Bazel cache readonly, PermissionError, pipelines.py script bug
- Bazel Cache shows as Readonly, triggers error: When running the
pipelines.py
script, a PermissionError occurs due to the bazel cache being readonly.- The error is
PermissionError: [Errno 13] Permission denied: '/root/.cache/bazel/.../__mojocache__'
.
- The error is
pipelines.py
needs different cache location: It was suggested that thepipelines.py
script should use an alternative location for caching, as the current location causes issues due to permission restrictions.- The discussion concluded with a request to file an issue regarding this bug.
tinygrad (George Hotz) ā· #general (5 messages):
Tinygrad GPT-2 Training, 7900xtx Performance, nanogpt Parameters
- Tinygrad GPT-2 Training Slow on 7900xtx: A user reported that
llm.c/train_gpt2.py
is running slowly on a 7900xtx even with BEAM=5, achieving around 250ms per step at nanogpt size, tweaked to match Andrej Karpathyās nanogpt parameters.- George Hotz responded that it should not be that far off and the gap should be 2-3x max, suspecting a bug.
- Tweaks to nanogpt Parameters Cause Performance Issues: A user shared a diff of their tweaks to
examples/llm.c/train_gpt2.py
, adjusting the batch size to 64, sequence length to 256, and model configuration to 6 layers, 6 heads, and 384 emb_dim to match nanogpt parameters.- George Hotz suggested using
DEBUG=2
andVIZ=1
to diagnose the performance bottleneck.
- George Hotz suggested using
tinygrad (George Hotz) ā· #learn-tinygrad (2 messages):
Buffer ID changes, UOp buffer representation
- Buffer ID Spawns Confusion: A member noted seeing the ID of a buffer change in the debugger console when paused on a breakpoint, expressing initial surprise.
- The member then realized this behavior stems from how a UOp represents its buffer attribute for multi.
- UOp buffer representation explained: The changing buffer ID is due to the way UOp represents its buffer attribute for multi.
- Further details on the internal mechanism of UOp and its multi-buffer management are not provided in the context.
LLM Agents (Berkeley MOOC) ā· #mooc-questions (2 messages):
Google Docs confirmation, Mailing list for updates
- Google Docs confirms sign-ups: Members reported receiving confirmation emails from Google Docs after signing up for the program.
- Some members stated that they have not received any other communication about the program.
- Mailing list will provide updates: A member confirmed that the emails from Google Docs are expected, and a mailing list will soon provide updates about each lecture.
- Users can track updates via this mailing list.