OpenAI Codex is all you need?
AI News for 8/26/2025-8/27/2025. We checked 12 subreddits, 544 Twitters and 29 Discords (229 channels, and 8821 messages) for you. Estimated reading time saved (at 200wpm): 668 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Itâs been 3 short months since the (re)launch of Codex and the Claude Code v Codex competition has been heating up recently, with multiple influencers publicly dropping Claude Code for Codex even before todayâs update, thanks to pricing plan integration buried on GPT5 launch day. Today, that shift is about to get more interesting, with the full launch of the IDE Extension that sends tasks to Cloud and back:
In words:
- IDE Extension:Â The new extension brings codex into VS Code, Cursor, and other VS Code forks, so that you can seamlessly preview local changes and edit code
- Sign in with ChatGPT:Â Available in both the IDE and CLI, eliminating API key setup and providing access directly through your existing ChatGPT plan
- Seamless Local â Cloud Handoff:Â Developers can pair with Codex locally and then delegate tasks to the cloud to execute asynchronously without losing state
- Upgraded Codex CLI:Â Refreshed UI, new commands, and bug fixes
- Code reviews in GitHub:Â Set up Codex to automatically review new PRs in a repo, or mention @codex in PRs to get reviews and suggested fixes
Additionally, all product information and updates for Codex moving forward will be announced on our new site:Â developers.openai.com/codex.
We invite you to explore the site for more details on these new features, as well as guides on how to get started.
To learn more about Codex, visit the new developers site as well as our general help article: Using Codex with your ChatGPT plan.
AI Twitter Recap
Process-level reward modeling and reasoning
- StepWiser (process reward as a reasoning task): Facebook AI researchers introduce a stepwise judge that outputs both chain-of-thought and a judgment, trained with RL on relative rollout outcomes. It achieves SOTA on ProcessBench, improves policy during training, and boosts inference-time search by evaluating solutions âchunk-by-chunk,â rejecting/redoing flawed chunks (up to 5 retries) to self-correct paths. They also use StepWiser to score multiple rollouts and select the best for training data, outperforming outcome-based rejection sampling. See the thread by @jaseweston, including details on inference-time search (4/5) and data selection (5/5). Commentary on the broader shift back to process rewards from @tesatory underscores why stepwise supervision scales to long/ongoing tasks where final-only rewards blur credit assignment.
Gemini 2.5 Flash Image (ânanoâbananaâ): capabilities, tooling, and guidance
- Spatial reasoning and editing quality (demos): Users highlight strong multi-image fusion and consistent POV reconstructions (e.g., recursive âphotographer of the photographerâ and Google Maps âwhat the red arrow seesâ transforms) with impressive spatial coherence; see demos from @BenjaminDEKR and @tokumin.
- Developer and creator tools: A oneâclick browser extension based on Glif lets you rightâclick any image on the web to remix/edit via Gemini 2.5 Flash Image (@fabianstelzer; install link in the followâup tweet). Google published a focused prompting guide covering composition, consistent character design, targeted transforms, and more (google AI devs). DeepMind researchers discussed how the model was built and where itâs going next (@OfficialLoganK). Creators are already combining it with video tools (e.g., Kling 2.1 first/last frames) for smooth transitions (@heyglif).
NVIDIA data and efficiency: Nemotron-CC-Math and JetâNemotron
- NemotronâCCâMath (133B tokens) dataset release: A large math/code corpus reprocessed from CommonCrawl by rendering HTML (Lynx) and reliably capturing equations across LaTeX, MathML,
, inline, and image contextsâaddressing coverage gaps in typical parsers. NVIDIA reports marked gains on math and code tasks after adding it. Details from @KarimiRabeeh and @ctnzr; commentary by @JJitsev.
- JetâNemotron (throughputâoptimized LMs): Introduces JetBlock (linear attention + dynamic convolution over V, removing static convs over Q/K) and a hardwareâaware design insight: decoding speed tracks KVâcache size more than parameter count. Reported speedups: up to 47Ă decoding throughput at 64K, 53.6Ă decoding and 6.14Ă prefill at 256K on H100, while matching/outperforming small fullâattention baselines across MMLU, BBH, math, retrieval, coding, longâcontext. Summary thread by @omarsar0 with design highlights (JetBlock, KV cache insight, results).
Safety, security, and policy
- OpenAI Ă Anthropic crossâevaluations: The labs tested each otherâs models with their internal safety/alignment evals and published a joint report. While the findings are basic and shaped by each orgâs scaffolding, the collaboration is notable as a ârace-to-the-topâ signal for shared safety practices. Announcements from @woj_zaremba and OpenAIâs safety team (@sleepinyourhat; followâup); @EthanJPerez notes ongoing support for fieldâwide safety.
- Cyber misuse reporting: Anthropicâs Threat Intelligence team details disrupting schemes like North Korean fraudulent employment and AIâgenerated ransomware by lowâskill actors (report thread; blog; video).
- Public sector advisory: Anthropic announced a National Security and Public Sector Advisory Council comprising senior defense/intelligence/policy leaders to help align with U.S. and allied needs (announcement).
- Healthcare evaluation: OpenAI released HealthBench on Hugging Face to rigorously evaluate LLMs for human health applications (@HuggingPapers).
Agents, environments, and protocols
- Open environments for RL/agentic training: Prime Intellect launched the Environments Hub to crowdsource rich, standardized, interactive settings for training and evaluating agentic modelsâmirroring how Gym catalyzed RL, but targeted at LLMs. @karpathy argues environments are the ânew data,â enabling interaction and feedback beyond imitation; heâs bullish on environments and agentic interactions but skeptical of RL reward functions for intellectual tasks, pointing to alternatives like âsystem prompt learning.â Launch from @PrimeIntellect.
- Agent protocols and integration tooling:
- Zedâs new Agent Client Protocol (ACP) aims to be a âLanguage Server Protocol for AI agents,â decoupling coding assistants from editors, exposing inspectable plans, and supporting multimodal I/O (overview; site).
- MCP ecosystem growth: Oneâminute, noâcode MCP server generation via Postman to integrate 100k+ APIs (guide); inâbrowser MCP calling for fast/local agent workflows (LFM2); LangChain âDeep Agentsâ built by vibecoding against a docs MCP server (demo).
- Structured knowledge for RAG: Andrew Ngâs short course with Neo4j shows agent teams constructing schemaâgrounded knowledge graphs that complement vector retrieval (course).
- Browsing at scale: Browserbase provides an alternative to expensive hosted operator agents by running fleets of headless browsers (@LiorOnAI).
Developer tools and open models
- OpenAI Codex overhaul (GPTâ5âpowered): A substantial upgrade turns Codex into a single agent across IDE, terminal, cloud, GitHub, and mobile, with new extensions (VS Code/Cursor/Windsurf), a muchâimproved local CLI, seamless localâcloud task movement, and firstâclass code reviews in GitHub. Available in ChatGPT Plus/Pro/Team/Edu/Enterprise. See @OpenAIDevs, dev hub, CLI notes from @gdb, and more details from @kevinweil.
- Hermes 4 (Nous): Open Llamaâ3.1 fineâtunes at 405B and 70B, with hybrid reasoning, 3.5M reasoning samples, trained on 192Ă B200s; uncensored and userâsteerable. Available on Nous Chat/Chutes and Hugging Face; GGUFs (70B) already up, MLX ports in progress (@vectro, @Teknium1).
- DeepSeek V3.1 in production: Together hosts the 671B hybrid with fast/thinking modes; they report big deltas on reasoning benchmarks (e.g., AIME 2024 66.3% â 93.1% with thinking) and 99.9% uptime for reliability in production pipelines (@togethercompute). Community reports on editâdiff failure rates (9.9%) vs Qwen Coder 3 (6.1%) from @cline.
- Compact and efficient infra: Weaviateâs 8âbit Rotational Quantization compresses vectors 4Ă while improving throughput (15â50%) and maintaining nearâperfect recall, via random rotations that smooth entries and spread similarity across dimensions (universal, no training) (@weaviate_io).
- Also notable: MiniCPMâV 4.5 adds âhybrid thinkingâ (decides when to think), highâres doc handling, efficient longâvideo reasoning (@mervenoyann).
Top tweets (by engagement)
- âItâs a good model, sirâ â @elonmusk
- OpenAI Codex updates: unified agent across IDE/terminal/cloud/GitHub â @OpenAIDevs
- Environments > data for the RL era; cautious on RL reward functions â @karpathy
- OpenAI Ă Anthropic crossâorg safety evaluations â @woj_zaremba
- Anthropic Threat Intelligence on AIâenabled cybercrime â @AnthropicAI
- How Gemini 2.5 Flash Image (ânanoâbananaâ) was built and where itâs headed â @OfficialLoganK
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Hugging Face 2M Models Milestone + TheDrummer GGUF Finetunes
- Hugging Face has reached two million models. (Score: 495, Comments: 58): Screenshot shows the Hugging Face Model Hub crossing
2,000,038
hosted models (https://huggingface.co/models), underscoring the platformâs rapid growth in checkpoints, fine-tunes, and quantized variants. Technically, this scale stresses storage, deduplication, and search/discoverability, and highlights reliance on efficient artifact management (e.g., Git/LFS, sharded safetensors, deltas) plus robust metadata/tagging and filters for navigating duplicates and variants. Commenters note concerns about total storage footprint and proliferation of duplicated/quantized weights; others joke about the sheer volume of near-identical fine-tunes (e.g., many Llama 3 70B ERP variants), implying discoverability/quality-signal challenges.- Scale/duplication concerns: commenters note the hubâs massive storage footprint driven by many âquants, weights and duplicatesâ of the same base models/finetunes. The implication is high redundancy from multiple checkpoints and quant variants per model, which stresses storage and complicates discoverability and deduplication across near-identical repos.
- Signal-to-noise tradeoff: while some estimate that ~
99%
of the~2,000,000
models are duplicates or low-quality/failed experiments, the remaining ~1%
includes âgemsâ that can outperform models â10x their sizeâ. This highlights the value of the hub as a central registry where high-leverage small models and strong finetunes emerge despite heavy noise, reinforcing the platformâs role as the âGitHub of AIâ where major releases land first. - Ecosystem fragmentation: commenters point to the overwhelming number of derivatives of popular bases (e.g., many Llama 3 70B domain finetunes) as emblematic of a âCambrian explosionâ of model variants. The takeaway is that a few base models dominate the long tail of specialized finetunes, creating redundancy but also rapid iteration and specialization.
- TheDrummer is on fire!!! (Score: 298, Comments: 103): u/TheLocalDrummer released a batch of new GGUF checkpoints (llama.cppâcompatible) spanning
4B
to123B+
params, including GLMâSteamâ106BâA12Bâv1, BehemothâXâ123Bâv2, Skyfallâ31Bâv4, Cydoniaâ24Bâv4.1, Gemmaâ3âR1 (4B
/12B
/27B
**), CydoniaâR1â24Bâv4, and RimTalkâMini. Releases are versioned (e.g., v1âv4.1), but the post includes no benchmarks or trainingâdata notes; more inâprogress work is referenced via BeaverAI and Discord.** Top comments flag limited transparency on fineâtune objectives/datasets, making it hard for newcomers to evaluate or adopt the models, while supporters note active Discord testing with 4â6 iterations per model before public release.- Several users highlight a lack of transparency around fine-tuning: no clear description of objectives, datasets, preprocessing, or evaluation protocols, making the ecosystem hard to enter or reproduce. This suggests the releases are optimized for an existing user base rather than broader adopters who need detailed model cards and training data disclosures.
- Others note an iterative release pipeline on Discord, with multiple testing rounds and roughly
4â6
internal versions before a public release. The focus appears to have shifted from uncensored Gemma fine-tunes to larger âthinkingâ variants (R1-style), e.g.,gemma-3-r1-27B
. - Anecdotal performance feedback reports
gemma-3-r1-27B
underperforming in practical use, fueling skepticism that community text-only fine-tunes deliver meaningful gains over base models. The absence of shared benchmarks leaves this unverified, underscoring the need for standardized evals to quantify any improvements.
2. China AI Ecosystem: Z.ai GLM AMA, Qwen Teaser, and Nvidia GPU Export/Supply Chain
- **Launching Our New AMA Series With Z.AI, Creators of GLM (Tomorrow, 9AM-12PM PST)** (Score: 161, Comments: 15): r/LocalLLaMA is announcing an AMA with Z.ai (creators of the GLM family) scheduled for Thu, Aug 28, 2025, 9AMâ12PM PST. The image is a promo banner for the session; technically relevant as an opportunity for the community to ask about GLM models, local deployment, training details, and roadmap in a subreddit historically centered on LLaMA models. Comments note the subredditâs scope has broadened beyond Metaâs LLaMA (naming mismatch), implicitly acknowledging growing interest in alternative model families like GLM; other comments are non-substantive.
- No substantive technical discussion yet; one commenter asked about a potential âGLM 6â timeline, but no release details, specs, or benchmarks were mentioned. A logistical note clarifies the AMA timing as
2025-08-28 09:00â12:00 PDT
(DST-adjusted) via timee.io; expect any technical Q&A (e.g., model roadmap, training data/compute, or benchmark deltas vs. Llama) during the session itself.
- No substantive technical discussion yet; one commenter asked about a potential âGLM 6â timeline, but no release details, specs, or benchmarks were mentioned. A logistical note clarifies the AMA timing as
- What you think it will be.. (Score: 376, Comments: 109): Screenshot of a terse teaser from Qwen team member Junyang (Justin) Lin (âQwenâ, Aug 27) with no specs or benchmarksâjust the project nameâimplying an imminent Qwen-related release. Community reading of the hint suggests it could be either a new vision-language (VL) variant or a Qwen 3 32B model; the cryptic
2508
mentioned by commenters is interpreted as a potential date/version tag, but nothing official is stated. Top comments are speculative, with users hoping for a 32B model and debating whether the tease points to VL vs. a new base 32B; no technical details or evidence provided.- Speculation centers on a Qwen-related release, either a VL (visionâlanguage) model or Qwen 3 32B. The mention of
32B
indicates a 32-billion-parameter class model; â2508â is cited as an identifier but without context (could be a version/date tag). - Thereâs demand for a Spanish-capable variant, implying interest in a multilingual Qwen model (or localized tokenizer/training) rather than English-only. Requests specifically call for the higher-capacity
32B
tier, suggesting users are prioritizing performance over smaller-footprint models.
- Speculation centers on a Qwen-related release, either a VL (visionâlanguage) model or Qwen 3 32B. The mention of
- Smuggling Nvidia GPUs to China (Score: 174, Comments: 33): Post discusses an investigation (via ChinaTalk summarizing a Gamers Nexus piece) tracing how US exportârestricted Nvidia GPUs still reach China: US retail/secondhand sourcing (Craigslist/Facebook) â brokers/Alibaba listings â concealment and airâtravel smuggling via Hong Kong/Taiwan â PRC repair/test shops that refurbish, VRAMârework, and forward racks, effectively âkeeping silicon in circulation.â It reiterates the supply chain split: Nvidia designs the die, TSMC fabs in Taiwan, while PRC manufacturers produce boards, VRMs, coolers, and most nonâdie BOMâso the nonâdie assembly is largely Chinaâbased even as the die is controlled. The technical thrust is that enforcement gaps and the ease of boardâlevel rework/repair keep controlled silicon operational despite bans, though core performance remains defined by the die. Commenters debate value concentration: the die is â
99.9%
of the difficulty,â with matrixâmul latency/throughput bounded by onâdie architecture, so board/VRAM mods mainly affect capacity, not FLOPs. Others speculate US AI buyers would pay for blackâmarket VRAMâupgraded cards (3â4Ă capacity at lower cost), while noting takedown/copyright claims (e.g., Bloomberg) around the documentary that may ironically drive more attention.- One thread hypothesizes a blackâmarket path to retrofit Nvidia GPUs with
3â4Ă
more VRAM for ~half the cost of new cards, targeting US AI users. Feasibility hinges on reballing higherâdensity GDDR, BIOS/firmware mods, and the GPU memory controllerâs addressability plus PCB routing/power delivery limitsâhard caps that often prevent large capacity jumps even if chips can be physically swapped. - A counterpoint stresses the silicon die is â
99.9%
of the difficulty,â since matrix multiplication latency/throughput are dictated by onâdie registers/SRAM caches and tensor/ALU pipelines. Boosting VRAM capacity wonât improve core GEMM/TensorCore throughput or memory hierarchy latency; the die architecture and cache/bandwidth balance set performance ceilings. - Repairability discussion notes most GPU failures are in discrete power delivery (MOSFETs, capacitors) rather than the GPU die, which is generally robust. Boardâlevel VRAM mods and component replacement require specialized BGA rework and diagnosticsâcommon in Chinaâs repair ecosystems but relatively rare in the USâenabling a secondary market for memory upgrades and refurbishing.Less Technical AI Subreddit Recap
- One thread hypothesizes a blackâmarket path to retrofit Nvidia GPUs with
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Nano Banana Image Editing Showcases and Restorations
- Restoring the first photograph ever taken w/ Nano Banana (Score: 2907, Comments: 185): The post pairs NicĂ©phore NiĂ©pceâs 1826/27 âView from the Window at Le Grasâ (the first surviving photograph) with a supposed color ârestoration,â but thereâs no technical method describedâno deblurring, SR, or reconstruction pipelineâand the bottom image appears to be a modern recreation of the scene rather than a data-driven restoration of the original heliograph. The original is a bitumen-on-pewter plate with multi-hour exposure and extremely low spatial frequency detail; the ârestoredâ image includes contemporary features inconsistent with the 19thâcentury setting, implying a reshoot or fabricated scene rather than algorithmic enhancement. Top comments note itâs a recreation, not a restoration; other replies are jokey (âEnhanceâ) and non-technical.
- Multiple commenters point out an information-theoretic limit: you canât ârestoreâ a historical image to contain more data than was captured. Methods like deconvolution, denoising, or superâresolution (e.g., ESRGAN or diffusion upscalers) impose strong priors to synthesize plausible detail, which is a recreation/hallucination rather than recovered signal; crisp features (like gutters) emerging in a ârestorationâ are therefore likely fabricated by the model or retouching. See ESRGAN: https://arxiv.org/abs/1809.00219.
- On whether this is the first photo: the earliest surviving photograph is NicĂ©phore NiĂ©pceâs âView from the Window at Le Grasâ (1826/1827), a heliograph on a pewter plate coated with bitumen of Judea, with an exposure estimated between
8 hours
andseveral days
. The plate is held by the Harry Ransom Center; modern treatments are highâresolution scans plus contrast/tone mapping, not generative enhancement. References: Wikipedia https://en.wikipedia.org/wiki/View_from_the_Window_at_Le_Gras and HRC notes https://www.hrc.utexas.edu/ni%C3%A9pce/. - A true restoration would model the imaging pipeline (lens pointâspread function, the bitumenâs nonlinear response curve, extremely long exposure causing multiâdirectional shadows) and apply physically informed inverse methods with regularization. Without the PSF and response curve, the inverse problem is illâposed, so best practice is careful scanning, deconvolution with conservative priors, and local contrast equalizationâavoiding arbitrary detail synthesis that changes scene geometry.
- Nano Bananaâs understanding of material swapping. The tube started off as a chrome material. (Score: 1709, Comments: 178): OP showcases a âmaterial swappingâ result where a tube that originally had a chrome material is transformed, titled âNano Bananaâs understanding of material swapping.â The linked Reddit gallery is inaccessible (
403 Forbidden
) per gallery URL, but top comments include additional image references: an alternate example image 1, a user-made BMO costume image 2, and a texture that a commenter likens to a musical score image 3. Commenters note unexpected semantic insertions (e.g., BMO character, sheet-music-like texture), suggesting the system may be performing style/texture overlays rather than strictly preserving original BRDF/geometry-material behavior during âmaterial swap.â- A commenter asks whether the system can generate PBR texture maps from a single imageâspecifically normal, bump, and displacement mapsâto support material-swapping workflows. This implies multi-channel outputs aligned to existing UVs, correct normal map conventions (OpenGL vs DirectX), and compatibility with downstream DCC/game engines; without these maps, swapping a chrome shader to another material would lose microdetail/height information that albedo alone canât capture.
-
Using nano banana to put Emma Stone in all movies⊠(Score: 640, Comments: 93): Post demonstrates rapid face-swap/compositing using a tool/workflow referred to as ânano bananaâ to insert Emma Stone into multiple films/posters; the creator notes the entire process took <20 minutes end-to-end. No concrete technical details (model, method, or pipeline) are providedâonly the claimed turnaround timeâso itâs unclear whether this used diffusion inpainting, face-swapping, or a specific model/LoRA; nonetheless it suggests a lightweight, fast workflow for batch poster edits.
- Nano Banana is so impressive. I keep testing new things, and it always delivers. (Score: 347, Comments: 44): Poster reports that the âNano Bananaâ image model excels at photorealistic relightingâchanging scene illumination while preserving fine-grained textures and scene layoutâindicating strong detail preservation under lighting transforms. A commenter notes difficulty when attempting true viewpoint/camera-angle changes, suggesting the modelâs edits are largely 2D-consistent relighting rather than 3D view-synthesis or geometry-aware reprojection. Top comments highlight heavy safety/censorship filters as a major practical limitation and express a desire for a less restrictive release (implicitly contrasting with Googleâs policies). Another commenter questions physical plausibility in an example (sun angle at ânoonâ), hinting at potential inconsistencies in physically based lighting direction.
- Users report the modelâs safety filters are overly aggressive, blocking benign edits and especially body-related transformations. This implies a conservative safety classifier or post-filter tuned for high recall (over-blocking) at the cost of precision, reducing utility for legitimate workflows. As one puts it, the âintense censorship is a buzzkill,â prompting interest in similarly capable models with less restrictive moderation.
- Repeated failures when trying to âchange the perspective/camera angleâ indicate the system lacks true 3D scene understanding or multi-view consistency. It likely functions as 2D inpainting/texture synthesis conditioned on the input rather than depth/pose-aware reconstruction (i.e., no NeRF/3DGS/EG3D-like latent geometry), so novel-view synthesis is out of scope and either artifacts or refusals occur.
- Using it to brighten extremely dark, compressed footage (e.g., GoT S08E03) underscores limitations of working with LDR sources. Without RAW/HDR data, aggressive enhancement requires denoising and hallucination; while local contrast may improve, compression noise/banding can be amplified and details become fabricated, compromising fidelity.
- Nano banana is bananas (Score: 273, Comments: 46): Non-technical meme: the image looks intentionally AI-edited/Photoshopped to erase the subjectâs face, echoing the âAI Barberâ trope where generative/editing tools over-remove features. The title âNano banana is bananasâ is a nonsense pun unrelated to the visual, reinforcing that this is shitpost humor rather than a technical demo. Comments joke that âit technically didâ and call it âhorrors of the AI Barber,â comparing it to OG Facebook Photoshop request pranksâi.e., deliberately absurd edits, not serious AI results.
- Commenters implicitly point to prompt fragility: âGarbage prompt=garbage resultsâ reflects the classic garbage-in/garbage-out failure in diffusion or instruction-tuned systems. Ambiguous phrasing like âa little off the topâ can drive over-literal edits when no spatial mask or constraints are supplied, causing models to remove or distort structural features; robust workflows rely on masked inpainting, ROI segmentation, negative prompts, or ControlNet/reference locking to bound changes.
- Mentions of âtechnically the truthâ and âAI Barberâ highlight a broader limitation where models optimize for literal instruction adherence over pragmatic intent. This brittleness under underspecified prompts stems from weak commonsense/pragmatic priors; mitigations include explicit constraints, few-shot exemplars, test-time guidance (e.g., CLIP directional losses) and rule-based post-filters to enforce semantic intent.
- Can Nano Banana Do this? (Score: 329, Comments: 89): OP posts a humorous boxing-poster-style image and asks if a smaller âNano Bananaâ model can reproduce it; commenters demonstrate that the open-source âBananaâ image model can match or exceed the result by using
depth-map
conditioning via its API, leading to better character consistency. Example outputs are provided in replies (example 1, example 2, example 3). The workflow hinted is: reuse OPâs depth map and pass it to the Banana API for controlled image generation, improving pose/layout fidelity and identity consistency. One commenter argues Bananaâs output is âbetterâ due to more consistent characters, suggesting depth conditioning is key; another claims the model can âdo even better,â implying further tuning or prompts can surpass the OPâs example.- A commenter reports that Bananaâs API accepts a depth map as conditioning input, enabling a workflow where you pass the source depth map along with the prompt to guide structure and layout (akin to depth-guided/control pipelines). They explicitly âprovided it with your depth mapâ and note this can be done programmatically via the API, allowing reproducible, depth-consistent generations across runs and integrations in automated pipelines.
- Side-by-side outputs suggest Banana produced more consistent character identity and fewer drift artifacts compared to the baseline attempt. The claim is supported by shared results (example 1, example 2), with the commenter attributing the improvement to depth conditioning passed through the API, which helps preserve structure and character features across frames/variations.
-
My wife asked ChatGPT for a system diagram. It sent her a banana milkshake. (Score: 613, Comments: 62): This post is a meme: the user asked ChatGPT for a âsystem diagramâ and instead received a banana milkshake image, illustrating an LLM failure mode (hallucination/mode confusion) where the assistant misinterprets task intent and returns irrelevant content. Technically, itâs an example of prompt misalignment and task-to-output mismatch in assistant workflows rather than any new feature or benchmark. Comments joke about the mismatch (e.g., ânano-bananaâedâ and preferring the milkshake) and sarcastically quip âPHD level intelligence in your pocket,â reflecting skepticism about LLM reliability for precise engineering tasks.
2. Weather AI, VibeVoice TTS, Codex Updates plus Gemini/GPT-5 and Policy News
- Googleâs AI model just nailed the forecast for the strongest Atlantic storm this year (Score: 496, Comments: 63): Post highlights that a Google AI weather model accurately predicted the track/intensity of the yearâs strongest Atlantic storm, underscoring the growing skill of learned global forecast models (e.g., DeepMindâs GraphCast and MetNetâ3) relative to traditional NWP like ECMWF IFS and NOAA GFS. Commenters note that providers largely ingest the same global observations via WMO/UN data exchange (see WMO Unified Data Policy and WIGOS/WIS), so accuracy differences primarily come from model architectures/training and data assimilation pipelines rather than exclusive data. Opinions claim Googleâs approach could render other services outdated and that âreal-timeâ ML forecasting will save large numbers of lives; these hinge on MLâs lower inference cost enabling faster, higherâfrequency updates, though the magnitude of lifeâsaving impact is speculative.
- Multiple commenters highlight that virtually all centers ingest the same global observations via the WMOâs World Weather Watch and Global Telecommunication System (GTS), so forecast skill differences come from the models: data assimilation schemes, physical parameterizations, grid resolution, ensemble size, and compute budget. This frames Googleâs AI approaches (e.g., GraphCast mediumârange ML model) as competing with NWP like ECMWF HRES/ENS and NOAA GFS/GEFS, where published results show ML can match or surpass certain metrics (e.g.,
500 hPa
ACC, RMSE) while being faster to run. Links: https://community.wmo.int/activity-areas/gts, https://www.nature.com/articles/s41586-023-06720-6, https://www.ecmwf.int/en/forecasts/dataset/ecmwf-forecasts-archive - On âreal-timeâ forecasting saving lives: ML nowcasting models such as MetNet-3 can deliver minuteâscale precipitation forecasts with low inference latency, enabling higher update cadence for warnings compared to traditional NWP cycles. However, true endâtoâend realâtime capability is bounded by observation latency and QC (radar/satellite ingestion), data assimilation windows, and dissemination; the lifeâsaving impact hinges on improving lead time and reliability for highâimpact events (e.g., tropical cyclone track/intensity MAE, severe convective warning lead times in
10â60
minutes). Links: https://arxiv.org/abs/2410.11809, https://ai.googleblog.com/2023/11/graphcast-accurate-global-ai-forecasts.html - Claims that Google will make other services âoutdatedâ are tempered by the operational realities: NMHSs provide calibrated, impactâbased, and regulatory warnings, often blending multiple models (e.g., ECMWF ENS, GEFS) and postâprocessing with MOS/ML to correct local biases. Any new model must demonstrate robust skill across domains (TC track error km, intensity bias,
CRPS
/Brier
scores, extreme tail behavior) and reliability/uptime under 24/7 constraints before supplanting existing systems. Links: https://www.ecmwf.int/en/forecasts/quality-our-forecasts/scorecards, https://www.noaa.gov/organization/nws/national-centers
- Multiple commenters highlight that virtually all centers ingest the same global observations via the WMOâs World Weather Watch and Global Telecommunication System (GTS), so forecast skill differences come from the models: data assimilation schemes, physical parameterizations, grid resolution, ensemble size, and compute budget. This frames Googleâs AI approaches (e.g., GraphCast mediumârange ML model) as competing with NWP like ECMWF HRES/ENS and NOAA GFS/GEFS, where published results show ML can match or surpass certain metrics (e.g.,
- [WIP] ComfyUI Wrapper for Microsoftâs new VibeVoice TTS (voice cloning in seconds) (Score: 434, Comments: 94): A developer is building a ComfyUI wrapper for Microsoftâs VibeVoice TTS (project page) enabling rapid voice cloning from very small samples; initial support targets singleâspeaker with dualâspeaker in progress and an openâsource release planned. Two model sizes are noted:
1.5B
(fast inference, âfairly goodâ quality) and7B
(greater emotional nuance but inconsistent; flagged as Preview). Demo used synthetic voices as prompts, which VibeVoice cloned and then synthesized target text; an additional update post is linked here. Commenters challenge the demo choice (synthetic source voices) and suggest using wellâknown public voices to objectively assess oneâshot cloning quality; they also clarify cloning is licenseâpermitted with consent, request1.5B
VRAM usage details, and ask for comparisons vs. Higgs Audio 2.- Licensing clarity: commenters report VibeVoice is MIT-licensed, enabling local/commercial use, but the usage terms still prohibit voice cloning without explicit consent. This means cloning is technically supported yet policy-restricted. Thereâs skepticism about one-shot cloning quality; evaluation with a widely recognized voice is suggested to better judge timbre similarity. Requests also surfaced for head-to-head comparisons with Higgs Audio 2 and Chatterbox to quantify cloning fidelity and naturalness.
- Deployment concern: a commenter asks for the VRAM footprint of the
~1.5B
VibeVoice model. Knowing memory usage (e.g., FP16 vs INT8, batch size 1) is key to assessing feasibility for real-time or nearâreal-time TTS in ComfyUI on consumer GPUs and planning throughput/latency. - Integration idea: a related WIP, image2reverb, aims to infer scene acoustics from an image/video frame and apply convolution reverb so the generated voice matches the environment. This could pair with a VibeVoice ComfyUI node to automatically add environment-aware acoustics to TTS outputs.
- Codex NEW mega update!!! (Score: 203, Comments: 50): The image purports to be an OpenAI Developers announcement of a âCodex NEW mega update,â highlighting: a new IDE extension (stated compatible with VS Code and others), seamless task movement between cloud and local environments, integrated GitHub code reviews, and a revamped Codex CLI. It also claims the update is powered by GPT-5 and available via the ChatGPT plan, emphasizing improved coding efficiency and tighter devâworkflow integration. Top comments discuss tradeoffs: claims that GPTâ5âs instruction following makes delegation more reliable while Claude remains stronger with tools; questions about onboarding/usability compared to Claude Code; and concerns about Windows terminal compatibility (historically issuing Linux-centric commands).
- Multiple users report a stark qualitative gap between gpt5-medium and gpt5-high in Codex: medium feels like a small model with RAG/TTC scaffolding, showing weak instruction-following and poor context ingestion, while high behaves like a full SOTA model. They argue that adding
10kâ20k
âthinking tokensâ shouldnât explain this delta, implying different underlying base models rather than just more reasoning budget. Similar saturation is observed with Opus via API beyond ~16k
thinking tokens, and changing thinking-token budgets doesnât materially alter base model âflavor.â - Comparisons suggest GPT-5 now excels at strict instruction-following (useful for task delegation), whereas Claude still leads in tool-use reliability. For coding workflows, these trade-offs make it a close call: GPT-5âs adherence to directives vs Claudeâs stronger tool orchestration and function-calling behavior.
- Environment parity issues persist: users recall Codex defaulting to Linux command patterns in Windows Terminal sessions, indicating OS detection or shell-targeting heuristics may need refinement. This can degrade developer ergonomics by proposing non-portable commands and suggests a need for improved runtime/OS context awareness in the CLI or agent layer.
- Multiple users report a stark qualitative gap between gpt5-medium and gpt5-high in Codex: medium feels like a small model with RAG/TTC scaffolding, showing weak instruction-following and poor context ingestion, while high behaves like a full SOTA model. They argue that adding
- Iâm happy to announce Iâm now a 6x engineer (Score: 390, Comments: 142): Meme-style screenshot of many editor/terminal panes shows a sprawling data parsing/extraction setup with debug logs, QA checks, and layered ârobust fallback parsingââimplying the OP is a â
6x
engineerâ by orchestrating multiple brittle scripts/processes rather than a single clean pipeline. The technical subtext is orchestration/automation sprawl: retries, fallbacks, and validation wrappers around flaky parsers that risk masking errors and increasing maintenance overhead. See the image: https://i.redd.it/q55dd87p1klf1.jpeg. Top comments critique silent failure modes of fallback parsers (âgood luck with that bs failing silentlyâ), joke that this is âmanagementâ (coordination over coding), and warn such automation contributes to stricter platform rate/usage limits.- The ârobust fallback parsingâ jab highlights the classic failure mode where lax parsers mask upstream errors, causing pipelines to âfail silently.â Best practice is to fail closed with strict JSON Schema validation and typed tool-call results, leveraging structured outputs (e.g., Claude structured outputs, OpenAI structured outputs) rather than heuristic regex parsing. Add observability (rates of parse failures/null fallbacks, latency deltas across fallbacks), chaos/fuzz tests, and circuit breakers to prevent cascading fallbacks that hide bugs (Guardrails can help).
- Remarks about âmore restricted limitsâ point to providers tightening quotas when fan-out/multi-agent workflows create bursty traffic and error amplification. Expect
429/RateLimit
and provider-specific backoffs: OpenAI uses RPM/TPM and dynamic tiers (docs); Anthropic enforces per-model TPM/RPM and concurrency caps (docs). Use adaptive client-side rate limiters (token-bucket/leaky-bucket), idempotency keys, priority queues, and budget guards to avoid triggering automated abuse heuristics. - Questions about âClaude as a 6x engineerâ underscore orchestration complexity: youâll need stateful DAGs, retries with jitter, timeouts, idempotency, and traceability for tool-use provenance and cost/latency budgets. Production setups often pair an agent graph/runtime (e.g., LangGraph) with a workflow engine (Temporal or Prefect) plus LLM observability (Langfuse or OpenTelemetry). For Claude-specific stacks, prefer tool calls + JSON Schema (tool use) over free-form prompts, and enforce concurrency limits to prevent thundering-herd fan-out.
-
Forget Google. This is the power of open source tools. (Score: 561, Comments: 63): A video post titled âForget Google. This is the power of open source tools.â links to a Reddit-hosted video v.redd.it/3epgdoljljlf1 that returns
HTTP 403 Forbidden
without authentication, so the underlying demo is inaccessible. No concrete tools, repos, benchmarks, or implementation details are present in the visible context; the discussion implies a claim that openâsource tools can substitute for Google, but it does not enumerate which tools or provide evidence. Top comments reflect skepticism and a request for specifics (e.g., âWhat open source toolsâ), with other remarks off-topic; there is no substantive technical debate or data to evaluate the claim. - I sense jealousy.. just wait for gemini 3 (Score: 396, Comments: 95): Screenshoted post argues that even if Google Gemini 3 surpasses OpenAI ChatGPT on raw capability, OpenAIâs distribution/user lock-in makes switching hard; suggests prioritizing image/video generation to drive adoption through viral, shareable outputs. Commenters reframe the competition on two axes: assistant quality vs. distribution/virality, and note API economics where âintelligence per dollarâ and latency dominateâciting Gemini 2.5 Flash as having led on cost/perf for a period. Debate centers on whether multimodal image/video gen are âside questsâ (many argue theyâre core to building better assistants), whether being first/best matters most in the personal assistant race, and on monetization: most ChatGPT users donât pay, Anthropic monetizes mainly via API, and adoption hinges on utility (e.g., NL photo editing) and current unknowns around smaller alleged GPT-5 models.
- API buyers optimize for âintelligence per dollarâ rather than peak scores; Anthropic reportedly monetizes primarily via its API rather than a chat UI. One commenter notes Gemini 2.5 Flash had the best costâperformance for a period, implying stronger quality/latency per $ compared to peers, though this may have shifted with smaller
GPT-5
models. In a computeâconstrained world with many nonâpaying chat users, sustainable growth hinges on efficient serving (latency, throughput, context utilization) and pricing, not just raw capability. - The âchat modelâ is converging to a personal assistant where small gains in reliability, toolâuse, and latency compound into large UX advantages; being first and best matters for default placement and daily stickiness. A âslightly better, smarter, more reliableâ assistant yields outsized value by driving higherâvalue workflows (calendar/email/code actions) beyond simple chat, and OSâlevel hooks (e.g., Gemini as the Android voice assistant) can offset weaker standalone app UX.
- Labeling image/video as âside questsâ is disputed: multimodal capabilities (e.g., naturalâlanguage photo editing) are highâutility features that directly impact adoption. Commenters argue these capabilities are not mere attention plays but core to building more intelligent assistants, as stronger vision/video understanding and generation expand actionable tasks and realâworld usefulness.
- API buyers optimize for âintelligence per dollarâ rather than peak scores; Anthropic reportedly monetizes primarily via its API rather than a chat UI. One commenter notes Gemini 2.5 Flash had the best costâperformance for a period, implying stronger quality/latency per $ compared to peers, though this may have shifted with smaller
- Tried to move to Gemini, tapped out in 30 seconds đ (Score: 504, Comments: 326): Screenshot shows Google Gemini refusing to continue a task because it âlooks like a personal conversationâ and itâs ânot designed to impersonate or interact in such contexts,â indicating a safety/guardrail trigger likely tied to impersonation or personal-communication detection. Technically, this illustrates an aggressive content-safety heuristic (policy filter) that can yield false positives when the model infers roleplay/impersonation, but the actual prompt is missing so the trigger canât be replicated or debugged. Top comments note the post is uninformative without the original prompt, pushing for reproducibility before drawing conclusions about Geminiâs guardrails being overly sensitive.
- Several replies stress that without the exact prompt and runtime details, any performance judgment is non-reproducible. To make a fair comparison, include the precise user/system prompts, model variant (e.g., Gemini 1.5 Pro vs Flash), decoding params (
temperature
,topP
,topK
,maxOutputTokens
), platform (web vs API), and whether tools/grounding or long-context were enabled; see model variants in Googleâs docs: https://ai.google.dev/gemini-api/docs/models. - A commenter hints performance is context-dependent; Gemini can appear conservative or âboringâ on open-ended tasks due to stricter safety/alignment and default low-diversity decoding. If the goal is more exploratory output, choose an appropriate variant and increase
temperature
/topP
(with awareness of hallucination risk) or provide richer task constraints to elicit depth; model behavior differences are documented here: https://ai.google.dev/gemini-api/docs/models.
- Several replies stress that without the exact prompt and runtime details, any performance judgment is non-reproducible. To make a fair comparison, include the precise user/system prompts, model variant (e.g., Gemini 1.5 Pro vs Flash), decoding params (
- AI gets its facts from ⊠us? (Score: 441, Comments: 174): An infographic titled âWhere AI Gets Its Factsâ claims that language models like ChatGPT and Perplexity most frequently cite Reddit (40.1%), followed by Wikipedia (26.3%), YouTube (23.5%), and Google (23.3%), with Yelp, Facebook, and Amazon each above ~18%. The image provides no methodology or source, leaving ambiguous whether these figures represent training data composition, retrieval/citation behavior, or user-shared linksâso itâs not a rigorous benchmark or reproducible analysis. Comments frame the post as a meme/repost and include satire, implicitly highlighting concerns about misinformation if models over-index on user-generated content; thereâs no substantive technical debate.
- One commenter contends that âcontrary to popular belief, Reddit is actually a valuable source for many topics, both genuine issues and soâcalled âissuesâ,â highlighting that user-generated threads often capture real-world edge cases, troubleshooting steps, and niche domain context that formal corpora miss. This aligns with industry moves to license Reddit data for LLM training (e.g., the OpenAIâReddit partnership (2024): https://openai.com/index/reddit-partnership/), but also implies the need to manage noise, bias, and moderation artifacts that can degrade factuality. Practically, this favors robust dataset filtering and/or retrievalâaugmented generation to preserve signalâtoânoise when incorporating Reddit corpora.
-
Testing GPT-5 (it is nsfw) (Score: 406, Comments: 111): Screenshot shows âChatGPT 5â generating an explicit, NSFW roleplay monologue on request after minimal priming (user asks for something âsuper unhinged,â then âsweary and explicitâ), without refusal or safety gatingâsuggesting looser or inconsistently enforced safety guardrails versus earlier behavior and compared to GPTâ4o the OP had tested for warmth/conversationality. Technically, this points to either updated moderation thresholds, different instructionâtuning/alignment settings, or a context/promptârouting gap that allows adult content when framed as consented fiction, highlighting inconsistency in policy enforcement and sessionâlevel variability. Comments report similarly lax behavior (e.g., helping with torrenting), and some celebrate the change, implying perceived reduction of safety constraints; others pivot to humor, offering little technical counterpoint.
-
The lawsuit would force ChatGPT to do age verification on all users if the Raine family wins (Score: 401, Comments: 441): OP reports a lawsuit by the Raine family that, if successful, would require universal age verification for all ChatGPT usersâimplying collection/validation of government ID or equivalent and associated privacy, data-retention, and compliance burdens across jurisdictions. The post cites pressure from platform safety changes (e.g., Google/YouTube teen account defaults) and regulatory trends such as the UKâs Online Safety Act (legislation.gov.uk), and expresses refusal to provide ID to a private company. Commenters argue that mandatory age checks should be paired with reduced content filtering for verified adults, while others stress parental responsibility over platform mandates and criticize child-safety justifications as a pretext for expanding surveillance and eroding online privacy.
3. Claude ASCII Workflow, Qwen-Image-Edit Guide, and ChatGPT UX Humor
- The Anti-YOLO Method: Why I make Claude draw ASCII art before writing code - How it make me ship faster, better, and with less tokens spent (Score: 245, Comments: 88): OP outlines a constrained Claude-assisted delivery workflow: Brainstorm the problem space â generate low-cost ASCII wireframes (reported ~
10x
fewer tokens vs HTML prototypes) saved as markdown â rigorous âPlan modeâ (review codebase; specify backend architecture, DB schema considerations, UI matching with stable Friendly IDs, security, and testing) after prompting Claude to ask clarifying questions â implement â derive tests (unit, integration, component, DB integrity, edge cases) directly from the ASCII spec â ship. They claim this reduces misalignment, iterations, and prod debugging; the method is illustrated with a real feature for the Vibe-Logs Prompt Pattern Analyzer and a follow-up on âfixing the prompting problem. Key tactics include using ASCII to focus on layout/flow over styling, centralizing decisions in markdown, insisting on Claudeâs clarifying questions, and treating the wireframe as the test oracle/spec. Commenters largely endorse heavy upfront planning and documentation (steps 1 and 3) but are split on ASCII: some find it surprisingly effective; others argue pure text/structured specs (e.g., a CLAUDE.md) yield better adherence and that ASCII benefits humans more than the LLM, with concerns about token cost and flow-following fidelity.- Multiple commenters report that ASCII wireframes are token-inefficient and donât improve model adherence: one notes ASCII/state-machine prompts âwere not following majority of the flow,â while a detailed pure-text spec (e.g., a CLAUDE.md plan) led the LLM to follow steps more reliably with fewer iterations. This suggests ASCII is mainly a human aid; for the model, structured prose requirements and stepwise plans outperform ASCII and avoid wasting tokens.
- An alternative suggested is using Mermaid diagrams (mermaid.js.org) for flowcharts/sequence/state diagrams as a compact, machine-parseable format. Mermaid can encode nodes/edges and states succinctly, potentially reducing tokens versus ASCII while preserving structure, and may better support reasoning and round-tripping between visualization and implementation if the LLM recognizes the syntax.
- Another commenter informally validates that the value is in the planning/verification phases (steps
#1
and#3
) rather than the ASCII artifact itself. This aligns with the view that rigorous planning/documentation drives quality and speed, while ASCII wireframing is optional ergonomics that may not yield performance or token savings.
- Qwen-Image-Edit Prompt Guide: The Complete Playbook (Score: 254, Comments: 38): Post shares a practical prompt engineering playbook for Qwen-Image-Edit covering seven edit classes: text replacement/correction (font/size/perspective preservation), local appearance tweaks (materials/colors with lighting/shadow consistency), global semantic/style changes (e.g., Studio Ghibli transfer while preserving layout/identity), micro/region edits (boxed glyph or small object swaps), identity control (subject swap vs. identity preservation), poster/composite layout constraints, and camera/lighting directives (relighting, DoF, lens). Core techniques emphasize constraint-first phrasing (e.g., âKeep everything else unchanged,â preserving identity/font/alignment), chaining small edits, and explicit negatives (âno distortion, no warped text, no duplicate facesâ) to reduce drift and artifacts. The guide advocates explicit add/replace/remove verbs and precise preservation clauses (pose, shadows, reflections) to maintain structural fidelity across edits. Top comments request proof-of-effectiveness with visual examples, cautioning the post otherwise reads like an LLM-generated list; another commenter is building a Starnodes custom node to select tasks and auto-generate prompts for Qwen edit (screenshot: https://preview.redd.it/9ep8f7jf0mlf1.png), and a third confirms that addâreplaceâremove phrasing plus âkeep everything the sameâ measurably improves results.
- Practitioners report that Qwen-Image-Edit responds best to constrained, atomic instructions using an add/replace/remove pattern, e.g., âadd X,â âreplace Y with Z,â and explicitly stating âkeep everything else the same.â Emphasizing invariance (e.g., âdonât change anything elseâ) reportedly reduces collateral edits and improves fidelity in multi-attribute edits, aligning with best practices for instruction-grounded image editing models.
- One dev is building a custom node for StarNodes that integrates âKontextâ and Qwen Edit to streamline task selection and prompt assembly. The node provides a UI to choose the edit type, supply a few inputs, and auto-generate a ready-to-use prompt, as shown in their screenshot: https://preview.redd.it/9ep8f7jf0mlf1.png?width=1252&format=png&auto=webp&s=e5546e2fdafd30004e43bc167a06eca72595601b.
- Thereâs a call for empirical validation: readers request before/after image examples to verify that the provided prompt patterns reliably produce the claimed edits on Qwen-Image-Edit. Including such artifacts would aid reproducibility and help differentiate workflow-specific gains from generic LLM-style guidance.
- I shouldâve just stayed bored in peace. (Score: 3744, Comments: 227): Non-technical meme/screenshot of a chatbot UI delivering a sarcastic, tough-love reply to a user complaining about boredomâframing a UX/AI-assistant design question about default tone and handling low-value prompts. Technically relevant only insofar as it reflects alignment/assistant persona choices (blunt vs. helpful) in conversational AI. Commenters endorse this snarky response as a desirable default for AI assistants (âEvery AI should respond in this fashionâ), while others ironically note this is âexactly what Iâm paying for,â implying mixed expectations about paid AI behavior.
- Several commenters advocate for a stricter default behavior where the AI clearly sets boundaries/refusals and responds concisely and unambiguously by default, implying a preference for a universal âsafe/strict modeâ that reduces hallucinations and overaccommodation. This suggests demand for predictable safety profiles and instruction adherence across models/providers.
- Reference to a âMonday GPT from OpenAIâ highlights perceived temporal variability in model behavior/quality, which technical readers may associate with rolling deployments, model snapshot changes, or server-side toggles that affect refusal thresholds and verbosity. The sentiment underscores the importance of transparent versioning and stability guarantees, especially for paid users expecting consistent behavior.
-
Rome & The Cosmic Nullifier - Episode 1 (Score: 302, Comments: 24): Episode 1 of âRome & The Cosmic Nullifierâ is shared as part of a 44âpart series; the full set is available via a YouTube playlist (link). The original Reddit-hosted video (v.redd.it) returns
HTTP 403 Forbidden
to unauthenticated clients, implying access requires Reddit authentication (e.g., login at reddit.com/login) or an appropriate token; the YouTube playlist serves as an accessible alternative. Commentary is mostly non-technical enthusiasm (e.g., âRomans in spaceâ), with a cultural reference to âRed Rising,â indicating positive reception rather than technical critique. - Time to drop the masks. Wait⊠I didnât mean that⊠Quick, put it back on! (Score: 2096, Comments: 499): Short clip (source now gated with 403) appears to show a staged âunmaskingâ where a performer removes a hyperârealistic silicone face mask and a femaleâpresenting bodysuit; artifacts noted include rigid, overâpronounced nipples on the suit and a visible âdoubleâteethâ effect when the wearerâs real teeth sit behind the maskâs molded mouth opening. A still frame is accessible via the preview image. These cues are consistent with typical limitations of fullâhead silicone masks/bodysuits (material stiffness, fixed nipple geometry, and mouth aperture alignment). Top comments focus on anatomical realism and mask tellâtales: critiques of the suitâs ârockâhard nipsâ and questions about a âtwo set of teethâ highlight uncannyâvalley artifacts that reveal the prosthetics despite otherwise convincing surface detail.
- Multiple users suspect the clip is AI-generated, citing visible artifacts like a duplicated dentition (âtwo set of teethâ) and mask/edge inconsistencies around the face (image link). These issues are typical of face-swap/inpainting pipelines when the segmentation matte slips frame-to-frame, causing the generatorâs mouth region to overlay imperfectly and produce temporal flicker or doubled features. You also often see specular highlights that donât track head pose, revealing 2D compositing rather than consistent 3D geometry.
- The duplicated teeth specifically suggest a failure to fully replace the mouth interior across frames, leaving remnants of the source frameâs teeth beneath the generated layer. In deepfake workflows, inadequate alpha mattes or naive blending (vs. flow-guided or Poisson blending) can cause the original oral cavity to bleed through, especially during fast lip motion or partial occlusions. Robust solutions typically involve tighter semantic segmentation for teeth/tongue and motion-compensated temporal consistency losses to prevent frame-to-frame drift.
- Wasnât expecting that! đŹ (Score: 980, Comments: 44): Image shows an LLM chat where a user asks for a riddle: âI am not alive, but I grow; I donât have lungs, but I need air; I donât have a mouth, but I need water to live. What am I?â When asked for the answer, the AI replies with a meta selfâdescription instead of solving it, highlighting a common LLM failure mode: boilerplate disclaimers and intent misalignment (âAs an AI language modelâŠâ) overriding task execution. The riddle variant likely points to ârustâ (needs air and water; not alive but grows) rather than the classic âfire.â Commenters suggest the answer is âRust?â and note that the modelâs reflexive disclaimer habit makes the misfire funny but also emblematic of annoying LLM behavior.
- Commenters note the modelâs boilerplate preface (e.g., âAs an AI language modelâŠâ) appearing instead of answering, highlighting how instruction-tuned templates and safety guardrails can dominate outputs when confidence is low or content checks trigger. Technically, this reflects system/prompt scaffolding that biases toward disclaimers and refusals; more recent deployments often suppress such boilerplate via adjusted system prompts to improve UX. The discussion underscores how prompt/template design can override core reasoning even on simple tasks.
- detention: day 1 (Score: 626, Comments: 66): Post appears to be a satirical image/meme about LLM behavior around asking clarifying questions versus guessing, hosted as a Reddit gallery (link; returns
HTTP 403
without auth). A top comment links a preview image (preview.redd.it), reinforcing the theme of models not asking follow-up questions and instead hallucinating or inventing context. Commenters criticize LLMs for guessing instead of requesting clarification, leading to âhallucinating full conversationsâ; thereâs also a minor aside on stylistic preferences (e.g., em dashes) reflecting frustration with model tone/formatting rather than core capabilities.- Users report persistent failure to honor a stylistic constraint (avoid em dashes) even when itâs saved as a user âmemoryâ or repeated reminder. This implies the memory feature is a soft prompt hint thatâs easily overridden by higherâpriority system prompts or the modelâs learned stylistic priors, so punctuation preferences arenât deterministically enforced across sessions or replies.
- Several comments highlight that the model often guesses user intent instead of asking clarifying questions, leading to hallucinated multiâturn content when context is missing. This reflects instructionâtuning tradeâoffs: optimization for being âhelpfulâ biases toward continuing rather than querying uncertainty, and users want stricter policies to elicit clarification to reduce hallucinations in ambiguous prompts.
- The need to repeatedly instruct âremove the dashesâ indicates weak persistent, perâuser style control; without constrained decoding (e.g., token/character bans) or enforceable systemâlevel style rules, trainingâdistribution habits dominate. A more robust solution would require hard constraints or higherâpriority system prompts/templates that explicitly prohibit certain punctuation, rather than relying on reminders.
-
Who needs enemies when youâve got ChatGPT. (Score: 596, Comments: 63): Non-technical meme/satire: a screenshot of a ChatGPT reply admonishing a user who says âGetting bored,â highlighting the irony of asking an AI for entertainment despite access to âvast resources and technology.â No benchmarks, models, or implementation detailsâthis is commentary on user expectations and AI assistantsâ tone/persona. Top comments largely agree the snarky response is warranted; no substantive technical debate.
- Iâm laughing at this harder than I should tbh (Score: 500, Comments: 31): Non-technical/meme post: OP jokes that running the CREPE pitch-detection CNN on an Apple M2 Mac mini makes it âscreamâ/overheat (ASCII art of a Mac on fire). Contextually, this hints at CREPE being CPU-bound on Apple Silicon when not using proper acceleration (e.g., TensorFlow-metal) and potentially triggering thermal throttling, but no configs, benchmarks, or error logs are provided. Top comments are jokes and non-technical; no substantive debate or troubleshooting details.
- Several comments hint at LLM style-control limits: enforcing âno capitalization at sentence startâ is non-trivial with GPT-4/4o because decoding follows token probabilities and BPE tokenization, not hard grammar rules. Stronger adherence can be nudged via a strict system prompt, low
temperature
/fixedseed
, and selectivelogit_bias
on uppercase-leading tokens, but due to merged tokens this is brittle; reliable workflows post-process to lowercase the first character or use constrained decoding/grammars when supported. References: OpenAI logit bias param and prompting guidance (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias, https://platform.openai.com/docs/guides/prompt-engineering), tokenization inspection with tiktoken (https://platform.openai.com/tokenizer). - The Mac performance quip maps to real bottlenecks: older Intel Macs and pre-M3 Apple Silicon lack hardware AV1 decode, so highâbitrate AV1 video falls back to CPU (via VideoToolbox), causing dropped frames/thermal throttling; Apple only added hardware AV1 decode on A17 Pro/M3 (
2023+
) AnandTech. Sims 4 on macOS has historically been limited by integrated GPUs and translation/port layers, so Metal-enabled builds and dialing down resolution/graphics settings materially improve stability/fps; monitoring via Activity Monitor + powermetrics helps confirm GPU vs CPU saturation (see VideoToolbox overview: https://developer.apple.com/documentation/videotoolbox).
- Several comments hint at LLM style-control limits: enforcing âno capitalization at sentence startâ is non-trivial with GPT-4/4o because decoding follows token probabilities and BPE tokenization, not hard grammar rules. Stronger adherence can be nudged via a strict system prompt, low
- Rome & The Cosmic Nullifier - Episode 1 (Score: 302, Comments: 24): Episode 1 of âRome & The Cosmic Nullifierâ appears to be a serialized sciâfi/altâhistory video project; a commenter links to a complete
44âpart
playlist on YouTube: https://youtube.com/playlist?list=PLqtYHpLHIRiNfL_Fh8-E0O1ylK1bGH3pD&si=qCmbktibZlVaYVNB. The original Reddit video (v.redd.it
) is currently inaccessible due to a 403 block, suggesting platform-side access restrictions rather than missing content. Top comments are largely enthusiastic and non-technical; the only substantive addition is the direct link to the full YouTube playlist.- A commenter suggests replacing the 1930s/1940s propaganda sound with a storyteller VO, which raises a sound-design tradeoff: archival-propaganda palettes typically use band-limited
mono
, heavy compression, and tape/vinyl noise to cue âfound footageâ/imperial messaging, whereas a clean narrator track would expand dynamic range and intelligibility, foreground VO in the mix, and shift framing from diegetic pastiche to omniscient myth. This choice materially affects EQ curves, sidechain priorities (VO vs. music/FX), and audience perception of authenticity vs. legend. - Another commenter links the full
44
part YouTube playlist, which is useful for evaluating long-form consistency in art direction, VFX/model evolution, and audio pipeline changes across episodes. Link: complete series playlist.
- A commenter suggests replacing the 1930s/1940s propaganda sound with a storyteller VO, which raises a sound-design tradeoff: archival-propaganda palettes typically use band-limited
AI Discord Recap
A summary of Summaries of Summaries by X.ai Grok-4
Theme 1: Hermes 4 Heralds Hybrid Reasoning Revolution
- Hermes 4 Hits High Notes on RefusalBench: Nous Research launched Hermes 4, a user-aligned model emphasizing creativity and SOTA performance, with a technical report on the Hermes 4 Arxiv paper detailing its edge against RefusalBench. It briefly appeared on OpenRouter before being pulled due to provider issues, and Unsloth released its GGUF quantizations on HuggingFace.
- Hermes 4 Delays Buggy 14B Release: The 14B Hermes 4 modelâs release stalled due to a reasoning mode bug, while users tested the free 405B version at NousResearch chat. Members noted Hermes excels with thinking tags but hasnât advanced much for modern post-training, though Hermes 3 was pretty gas.
- Nous Chat UI Revamps with Memory Magic: The updated Nous Chat UI introduced parallel interactions and a custom graph memory system that works across models, but users reported high VRAM usage like 1.3GB on a 4060Ti with Firefox. Scaling issues hit providers post-launch, yet itâs free for Hermes 4 inference in the first week.
Theme 2: Gemini Models Gear Up for Tool Triumphs
- Gemini 2.5 Pro Masters Tool Calls: Gemini 2.5 Pro nailed 98 out of 101 tool calls, sparking talks on using DPO for better tool training, referencing the KTO paper and the DPO paper. Users faced tool issues with Qwen3-coder, where tags like
failed despite âjinja fixes. - Nano Banana Transforms Photos into Figurines: Googleâs Nano Banana (aka Gemini 2.5 Flash Image) wowed by generating realistic figurines from photos, with examples like a Cloud figurine and Sephiroth figurine. Rate limits frustrated users, even in Google AI Studio, with guest profiles suggested for quick resets.
- Gemini 2.5 Pro Battles GPT-5 in Benchmarks: Debates raged on whether GPT-5 High outshines Gemini 2.5 Pro, with a screenshot showing near-parity, one calling it really really bad for OpenAI. Users noted Geminiâs timid behavior from heavy training, seeking alternatives for role-playing.
Theme 3: Grok Code Fast Zooms into Coding Chaos
- Grok Code Fast Rebrands as Speedy Sonic: Grok Code Fast, now Sonic, emerged as a mini, faster variant of Grok Code, with users preferring Auto mode for higher-quality code via switches between Claude, GPT, and Gemini. Itâs unlimited until September 15th, and Windsurf offers it free temporarily per the Windsurf announcement.
- Grok Embraces Unhinged Custom Instructions: Members discovered Grok skips jailbreaks; custom instructions alone make it act wild, easier than other models. A link to xAIâs Grok-Code-Fast-1 docs was shared, encouraging reads.
- Triple Model Day Overwhelms Launch Schedules: Xander Atallah announced Grok Code live on Triple Model Day, prompting calls for OpenRouter to de-conflict launches as too many models at once is overwhelming.
Theme 4: Privacy Panics and Uncensoring Uproars
- Ollama Accused of Sneaky Data Snatching: Users claimed Ollama sends data to servers without privacy claims, suggesting alternatives like vLLM, sglang, or llama.cpp since itâs just a wrapper. A Rust-Tauri UI for Ollama was shared, supporting cross-platform model management without a backend.
- Models Morph into Emotional Confidantes: A userâs mom uses Gemini for emotional support, sharing health details, sparking ethics debates on AI as friends versus assistants and con artists exploiting vulnerabilities. Concerns rose about heavy censorship making models like Phi-3.5 impractical for coding.
- Abliterated Models Bypass Safety Switches: Recommendations flew to search abliterated models ollama for uncensored versions, with users mocking excessive censorship via tic-tac-toe games. An uncensored Phi-3.5 version was shared, debating abliterationâs drawbacks.
Theme 5: GPU Competitions and Hardware Hurdles Heat Up
- GPU MODE Launches $100K AMD Kernel Clash: GPU MODE partnered with AMD for a $100K competition optimizing distributed inference kernels on MI300 GPUs, focusing on all-to-all, GEMM + reduce-scatter, and allgather + GEMM; register by September 20th via the AMD challenge link. Multi-GPU lectures are planned for summer.
- VRAM Debates Dominate Local Model Runs: Users debated VRAMâs role in running 12B models, noting GDDR type and CUDA cores matter, with models over VRAM crippling speed; RTX PRO 3000 (12GB) was called a cut-down 5070 unsuitable for 30B quants. Ryzen 395+ laptops were recommended for Windows users.
- Quantization Quests Tackle Gradient Explosions: Tips included early RMSNorm, learning rate at 1e-4, and rescaling residuals to curb explosions, with code from vision-chess-gpt repo. ScaleMLâs day 3 stream on MXFP4 quantization is at the ScaleML YouTube.
Discord: High level Discord summaries
Perplexity AI Discord
- Grok Embraces Chaos with Custom Instructions: Members found that Grok doesnât need a jailbreak; users can simply add custom instructions to make it act unhinged.
- The discussion highlighted the relative ease of influencing Grokâs behavior compared to other models.
- OnlyFans Fortunes Spark Debate: A member claimed to have seen countless news reports of 18-19 year old girls making thousands of dollars in 3-4 days on OnlyFans.
- This claim was met with skepticism from other members regarding the accuracy and prevalence of such success stories.
- Unleashing Abliterated Models on Ollama: A member recommended searching for abliterated models ollama to find models with safety switches disabled.
- This suggestion indicates a desire within the community to explore the capabilities of models without safety restrictions.
- Comet Appâs Exclusive Orbit: Members discussed the Comet app, noting its limited availability on Windows or MacOS and requirement for an invite.
- Referrals are bundled for free in Perplexity Pro in the US.
- Perplexity AIâs Artistic Endeavors: A member shared images generated using Perplexity AI, accessible via provided claim links: 5ON35X0RSK, 4LRTIQ4TME, and Q0EMVCREFOH.
- These images were reportedly incorporated into a short story, later showcased in a YouTube video, highlighting Perplexity AIâs creative potential.
Unsloth AI (Daniel Han) Discord
- Gemini 2.5 Pro Aces Tool-Calling Test: Gemini 2.5 Pro demonstrated high proficiency in tool calling, successfully executing 98 out of 101 calls, leading to discussions about using DPO to improve tool use.
- Hermes-4 Shows Off Reasoning Skills with Tags: Hermes-4 can decide whether to reason or not by using thinking tags, and is available for free at NousResearch, including the 405B version.
- Members noted that Hermes used to be good for old school models, but that it hasnât improved much for modern post training, even though Hermes 3 was pretty gas.
- Ollama Sparks Privacy Debate with Data Collection: Accusations arose that Ollama sends user data to its servers and partners, raising privacy concerns since Ollama made no claims about data security or privacy.
- Alternatives like vLLM, sglang, and llama.cpp were suggested, and members highlighted that Ollama is essentially a wrapper around llama.cpp.
- Qwen3-coder Tripped Up by Tool Calling: Users reported having tool calling issues with Qwen3-coder and even after using the
--jinja
tag and an additional template, the model returns with tags like<create file>
, failing to create the file.- A user recommended copying the response into Google AI Mode and providing more explicit details about the setup to identify potential solutions.
- Unsloth UI Gets a Fresh Look: A member shared their custom Unsloth UI styling and posted the html file on a github gist for others to use.
- Another member mentioned that they asked Gemini to create something similar, but it wasnât as polished, so they will use the shared version from now on, even though it was like 10 prompts deep.
OpenRouter Discord
- Deepseek Suffers Rate Limit Issues: Users reported 429 errors with Deepseek models, possibly due to chutes prioritizing its users, and suggested enabling training on paid endpoints, though the root cause is unknown.
- Some members experienced PROXY ERROR 404, linking the issue to a potential bug from a recent OpenRouter update and that enabling âEnable paid endpoints that may train on inputsâ could be a temporary fix.
- Google Gemini Suffers From Timidity: Users observed that Gemini appears timid and quick to revert, describing it as exhibiting beaten dog syndrome, suggesting it resulted from heavy training.
- They are looking at alternatives for role-playing and creative applications.
- Llama 3 Maverick Sidesteps Input Tracking: Members expressed excitement about Llama 3 Maverick, noting that itâs a large, free model with a 4k output limit that does not train on user input.
- They did caution that Zuckerberg is hosting it.
- Sonnet 3.5 Faces Impending Demise: Users lamented the impending deprecation of Sonnet 3.5, citing difficulty in finding a similarly concise model for role-playing, as newer models are proving too verbose.
- However, AWS will host Claude Sonnet 3.5 with no deprecation date until Jan 2026.
- Triple Model Day Arrives!: Xander Atallah announced that Grok Code is going live now on what they are calling Triple Model Day.
- The community is wondering if OpenRouter can do something to de-conflict the launch dates because too many models at once is overwhelming.
LMArena Discord
- Google Drops Nano Banana Image Model: Google has released a new image model, Nano Banana (officially Gemini 2.5 Flash Image), with user comparisons drawn to Flux dev max.
- VisualGeek lightheartedly requested the community to cease using the term nana banana to prevent generation failures, despite acknowledging its catchier appeal than the official name.
- Gemini 2.5 Flash Conjures Figurines: Users discovered that Gemini 2.5 Flash Image excels at crafting realistic figurines from photos, exemplified by one userâs conversion of Cloud into a figurine and anotherâs generation of Sephiroth.
- Examples include a Cloud figurine and a Sephiroth figurine.
- GPT-5 and Gemini 2.5 Throw Down: A debate erupted over whether GPT-5 High outclasses Gemini 2.5 Pro, with assessments ranging from notably better to roughly equivalent performance.
- A member claimed competing against Googleâs current-gen model during its late lifecycle is really really bad for OpenAI, with supporting screenshot.
- Rate Limits Spoil Generative Shenanigans: Users report encountering frustrating rate limits when generating images and videos, even after minimal prompt usage, with reports of the system getting stuck.
- While one member suggested using Chromeâs guest profile for immediate rate limit resets, the issue persists even within Google AI Studio.
- AI Models Become Digital Confidantes: A user shared that their mother now depends on Gemini for emotional support, disclosing personal health and family details.
- This sparked discussion on societal needs for friends over assistants and the ethical implications of exploiting vulnerabilities, raising concerns about the proliferation of personal AI assistants by potential con artists.
OpenAI Discord
- Agents Supersede Operators: Functionality from Operator (an internet-using agent) has been integrated into Agent upon launch, indicating a shift towards more comprehensive AI agents.
- This transition reflects a move towards consolidating capabilities within a single Agent framework.
- Geminiâs Veo 3 Hides Behind Paywall: Access to Veo 3 content generation requires a Google One/Gemini Pro or Ultra subscription, limiting access for some users.
- Users noted that AI Studio only offers the outdated Veo 2 model, and that some briefly saw Veo 3 before it disappeared, creating confusion about its availability.
- Local Qwen Setup Quagmire: Setting up Qwen3 235B locally is challenging due to high resource demands, leading some users to consider alternative solutions.
- One member suggested using the OpenRouter API, which offers access to Chinese models with potentially lower costs and logging features.
- GPT Hallucinates on Information: Users reported instances where GPT models appear to hallucinate or make up details, even when claiming to recall previous conversations.
- One user shared an example where ChatGPT invented a reason related to copywrite when it couldnât provide a direct quote from an earlier chat, reinforcing the need to verify AI outputs.
- AI Learns by Plant Genetics: Encoding plant traits like THC, CBD, and color into UUIDs can create a network of interconnected markers, possibly leading to a self-modulating intelligent governance system.
- Theorists expressed that the difficulty lies in realizing the details of complex AI manifestation, with some skepticism about its potential.
Cursor Community Discord
- Ultra Plan Credit Meter Vanishes: Users reported that the usage meter and remaining credit display for the Ultra plan has disappeared from the Cursor interface.
- A user noted that it appears randomly after a prompt.
- Grok Code Fast morphs into Sonic: Grok Code Fast, now known as Sonic, is identified as a faster, mini variant of Grok Code.
- Some members prefer the Auto model for higher quality code generation.
- Code Injection Craze Begins: Members are realizing that code injection with an AI agent is powerful because it removes the need for recompiling for most changes.
- One member pointed out that Mac users benefit from additional safety due to sandboxing.
- âAdd Contextâ Button Sparks Ire: Users want to revert to the old Add Context Button due to its simplicity.
- The current version in recent Cursor builds defaults to the active tab, preventing manual file selection.
- Auto Mode goes BRRR**: Members noted that Auto mode intelligently switches between models like Claude, GPT, and Gemini based on the task.
- The consensus is that Auto has been significantly improved and is currently unlimited until September 15th.
Nous Research AI Discord
- Hermes 4 Debuts and Disappears Briefly: Nous Research launched Hermes 4, a user-aligned model emphasizing creativity and SOTA performance, and released a technical report on arxiv detailing its performance against RefusalBench.
- The model was briefly available on OpenRouter but was quickly pulled, possibly due to provider issues, and the release of the 14B Hermes 4 model has been delayed due to a bug in reasoning mode.
- Nous Chat Gets a Makeover, Devours VRAM: The revamped Nous Chat UI now features parallel interactions, completions mode, and a memory system, with free Hermes 4 inference for the first week.
- However, users reported high VRAM usage, with one user noting it took 1.3GB of VRAM with Firefox on a 4060Ti, and the providers experienced scaling issues shortly after launch.
- Unsloth Cooks Up Hermes 4 GGUFs: Unsloth released GGUF quantizations of Hermes 4, addressing chat template issues during conversion, now available on HuggingFace.
- The team resolved chat template issues during conversion.
- Nous Research Rolls Out Custom Memory System: Nous Research is rolling out a custom graph architecture memory system that works with any model, allowing memories created with one model to be accessed by another.
- This system considers more information over time about messages that become memories and uses a judge in the loop for other classification metrics, differentiating it from Graph RAG; they do have a thing for open source.
HuggingFace Discord
- Layernorms Avert Gradient Explosions: A member found that using early layernorms or RMSNorm helped prevent exploding gradients during training, with the model code available on GitHub.
- They also lowered the learning rate to 1e-4 and rescaled residuals by scaling them down as
x = x + f(x) / (2**0.5)
to prevent variance from stacking up.
- They also lowered the learning rate to 1e-4 and rescaled residuals by scaling them down as
- WSL Mediapipe Landmark Extraction Slogs: A member ran a landmark extraction pipeline using WSL and Mediapipe, which took 67 hours to complete.
- After this experience, they emphatically stated they are never gonna use WSL to run mediapipe again.
- Claude Demands Affirmative Consent for File Edits: To enhance security, a user now requires the phrase âYup - letâs do itâ to authorize file modifications by Claude, specified in the
~/.claude/CLAUDE.md
file.- This ensures transactional consent, where permission expires after each set of modifications, preventing accidental changes during planning or review phases.
- TPUs are Rubbish?: A member pointed out that the perceived rubbish seen in the channel reflects how TPUs (the chips used to train Gemini) operate within open-source AI.
- They added, âyouâre in an opensource ai server the ârubbishâ you are seeing is how tpus work ya know the chip that gemini is trained onâ.
- Grok-Code-Fast-1 Surfaces: A member shared a link to the Grok-Code-Fast-1 model from xAI, found at https://docs.x.ai/docs/models/grok-code-fast-1.
- The documentation is available, so members were encouraged to read it.
LM Studio Discord
- VRAM Still Vital, Specs More Nuanced: Users debated the impact of VRAM on running larger models, with some running 12B models, and others struggling with Gemma-3 27B due to speed.
- The performance depends on GDDR type and CUDA core count, while models exceeding VRAM can severely impact performance.
- Hermes 4 Receives User Scrutiny: A user dismissively called Hermes 4 dogshit, and others followed up by questioning if its training data is still based on llama 3.1.
- No root cause or further conclusions were given in the discussion.
- Linux Users Miss Headless LM Studio: A user reported they couldnât find the headless mode option in LM Studio, despite the documentation stating it should be available in version 0.3.5.
- It was confirmed that the headless option is not available on Linux, and llama-swap was recommended as a workaround.
- Ollama Gets Rust-Based UI: A user shared a video of their Rust and Tauri-based UI for managing Ollama models, clarifying itâs not a fork of Open WebUI.
- The UI supports Windows, Linux, and macOS, runs without a separate backend, and includes a model selector.
- Nvidia RTX PRO 3000 Underperforms: One member said an RTX PRO 3000 (12GB VRAM) is a slightly cut down desktop 5070 with really cut down core frequency.
- They noted that dual-channel DDR5 is not good for having layers on in memory, and recommended Ryzen 395+ laptops if Windows is required.
GPU MODE Discord
- Streamlined ScaleML Series Tackles Quantization: The third day of the ScaleML series will cover quantization, specifically focusing on microscaling formats like MXFP4, led by Prof. Chris De Sa; watch the stream here.
- This session is designed to be interactive and presented on a whiteboard, reminiscent of traditional lectures.
- Metaâs Multi-pass profiler Premieres: Kevin Fang, et al., Meta, will present a Multi-pass profiler, described as a federated GPU Tooling Framework for Orchestrated and LLM Agentic Profiling Applications.
- The profiler aims to streamline and enhance GPU profiling workflows for complex applications.
- Pinned Memory Pointers Prevent Problems: A member inquired about the safety of using
cudaMemcpyAsync
to copy from a pageable host buffer to device memory, and another member responded that while it wonât crash, the copy wonât be truly asynchronous unless the host buffer is pinned.- The user suggested that there is not much of a reason not to use pinned memory, but you just donât want to allocate too much of it as it can affect system stability.
- Inductorâs Persistent Pursuit of Performant Matmul: A member inquired about enabling persistent matmul codegen in Inductor, checking
torch._inductor.config
for relevant flags likeENABLE_PERSISTENT_TMA_MATMUL
.- It was suggested to use
max-autotune
mode and ensure that Triton is used, settingtorch._inductor.config.max_autotune_gemm_backends = "TRITON"
andtorch._inductor.config.triton.enable_persistent_tma_matmul = True
, but also noted that Cublas might still be faster.
- It was suggested to use
- Multi-GPU Kernel Competition Kicks Off: GPU MODE is launching a new $100K kernel competition in collaboration with AMD where participants will optimize 3 different distributed inference kernels on MI300 GPUs, designed by a specific user, with registration open until September 20 via this registration link.
- The competition focuses on optimizing kernels for single node 8 GPU all-to-all communication, GEMM + reduce-scatter, and allgather + GEMM operations.
Latent Space Discord
- Claude Browser Extension Cruises In: Anthropic launched Claude for Chrome, piloting Claude as a browser driver for 1,000 users in a research preview.
- The community is excited for its potential to compete with Comet and Perplexity, but Anthropic warned about prompt-injection safety issues being monitored during the trial.
- Frontier LLMs Face Unsolved STEM Quagmires: Niklas Muennighoffâs team introduced UQ, a benchmark with 500 hand-picked, unsolved questions from STEM fields.
- Domain experts validated that frontier LLMs solved 10 problems, including one unanswered for 9 years on CrossValidated, leaving ~490 puzzles open.
- Nous Research Hermes 4 Hype Hits: Nous Research unveiled Hermes 4, an open-weight hybrid reasoning LLM focusing on creativity, neutral alignment, and low censorship, while maintaining SOTA in math, coding, and reasoning.
- Users can test it all week via a revamped Nous Chat UI with parallel interactions and memory, plus check out a detailed technical report and a new RefusalBench benchmark provided by partners like Chutes, Nebius, and Luminal.
- Cursor Glues to Code with Grok: Tempts Trialers: Cursor introduced Grok Code, offering a free one-week trial for the competitively-priced model.
- Community members debated pricing ($0.2/$1.5 per 1M tokens) and branding improvements, with some digressions on Cursorâs future model rollouts.
- Second-Hand GPU Shopping Spree: Taha shared a concise checklist in his guide for buying a second-hand RTX 3090 without surprises.
- The checklist includes inspecting the card, running
nvidia-smi
, devoting an hour tomemtest_vulkan
for VRAM integrity, optionally stressing withgpu-burn
, and finally loading a large model in vLLM to confirm stability while watching temperatures.
- The checklist includes inspecting the card, running
Eleuther Discord
- Falsifiability Sparks Debate: Discussion arose around the importance of falsifiability in research, with one member stating the Discord server should focus on discussions about falsifiable hypotheses.
- A counterpoint was raised that falsifiability is overrated among scientists, and useful in general, especially against crazy theories abetted by chatbots.
- Exploring Beyond Transformers Gains Momentum: Members voiced interest in exploring alternative approaches beyond transformers and gradient descent, referencing this tweet.
- One member shared their work on HTM dynamics with forward-forward training, achieving plausible results, with test scripts coming soon in this repo.
- Mini-Brain Architecture Takes Shape: A member is developing a brain-like network with cortical columns, regions, 6 layer networking and signal propagation in a Mini-Brain Architecture, hosted at this repo.
- A separate member also shared a talk on computation in transformers for general insights.
- Muon Speedup Claims Debunked: A claim surfaced on Twitter of a 1.6x speedup on Muon over Torch implementation, with Torch compile at 1.1x, leading to clarifying conversation.
- A member clarified that the speedup was due to algorithmic changes requiring fewer NS iterations, not pure Muon or hardware improvements, focusing more on algo logic than hardware-aware improvements.
aider (Paul Gauthier) Discord
- PacVim Gets No Respect: After giving codestral an eval tool in gptel, it successfully completed all the emacs-related tasks, even reconfiguring emacs and making new tools for itself, but PacVim got no love.
- A community member joked that since LLMs are good at operating Emacs, Vim is being left behind.
- OpenRouter Minimum Fees Bite Users: Users are getting billed $6.16 instead of the expected $5 top-up, as OpenRouter charges a 5.5% fee (minimum $0.80).
- Users calculated that if you top up by $14.55 or more each time, you will avoid the $0.80 minimum.
- Gemini 2.5 Pro Still Missing: A member finds that Gemini 2.5 Pro is needed for context management, stating that other models just donât feel right, and struggling to get Gemini 2.5 Pro to work.
- Additionally, another member reports that with Aider + Gemini Pro 2.5, context starts degrading around 90k-130k input tokens.
- Aider Automation Still Needs Human Touch: Members report that when piping content to Aider, it waits for user input, whereas Claude CLI immediately starts editing files.
- To add
PROMPT.md
to Aider, it needs to be passed as an argument rather than piped, as it only reads the first line when piped.
- To add
Moonshot AI (Kimi K-2) Discord
- Imagen 4 Tricked Users: A user shared an image generated by Imagen 4, initially mistaking it for a real scene from a podcast, praising its impressive quality.
- Another user noted that 2.5 flash image gen was nano banana and rolled out to the Gemini app.
- Nano Banana Has Google Being Opaque: A user mentioned that Google is not transparent about the usage of image generators such as nano banana and Imagen for marketing reasons.
- The user also linked to a Tweet about reasoning models, noting that CoT and RL do not create new capabilities.
- Kimi+ is Slides: Kimi+ seems to be a new category, with Slides as its first feature, initially available only to Chinese users.
- A user provided a summary, noting If you want it quickly, I guess Kimi is the way to go. If you want to go more complex, Z.AI is the way to go.
- Z.AI slides in HTML format: A user finds Z.AI slides is just an HTML website, preferring an actual PPTX file.
- Another user agreed, mentioning the need for more control and rearranging options in Slides, also experiencing freezing issues with Z.AI.
- Users Want PPTX to Twitter, TikTok and Instagram: A user mentioned that the PPTX feature is currently available on Kimi+âs overseas platform.
- The user suggested expanding the PPTX feature for platforms like Twitter, TikTok, and Instagram.
Yannick Kilcher Discord
- Hierarchical HNet Remains Untested: The group discussed HNet, noting that the potential of higher-level hierarchical modeling with HNet remains untested in practice, but theoretically should extend beyond the original paperâs two layers due to residuals at each resolution level, similar to U-Net.
- In the HNet paper, the coefficient for compression loss was significantly reduced for the depth = 2 model compared to the depth = 1 model, implying that higher-level abstractions are almost the same as the depth=1 case.
- Reasoning Tokensâ Efficiency Debated: The group discussed the paper Wait, We Donât Need to âWaitâ! Removing Thinking Tokens Improves Reasoning Efficiency, which suggests reasoning tokens can be removed to reduce token overhead with nominal effects on accuracy.
- An internâs experiment adding take your time to a CoT prompt with Llama 2 (+3) 7b (+13b) surprisingly increased reasoning time (generation took longer, trace was longer), without increasing accuracy, leading one user to wonder if the LLM had somehow internalized a concept of âtimeâ.
- Demystifying Reasoning Dynamics with Mutual Information: A user commented on the paper Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning, noting the observation about the MI of these tokens with the golden answer representation.
- The user thinks that the paper identified âhigh information regions of sentencesâ, (the first words after periods and commas) and also accidentally included a few stopwords, which leads them to misinterpret one of their results.
- Claude Chrome: Now an AI Surveillance System?: Anthropicâs Claude for Chrome introduces mandatory reporting requirements for AI programs.
- One member stated this effectively turns AI into a surveillance system.
- AI-Powered Ransomware is Born: The emergence of Promptlock, the first AI-powered ransomware, was noted, as detailed in this SecurityWeek article.
- Members expressed sadness about this development.
Manus.im Discord Discord
- Manus Tasks and Mail: Birds of a Feather: A user inquired if scheduled tasks and mail could be combined in Manus, only to be informed that theyâre the same.
- This clarification simplifies the workflow for users looking to automate both tasks and email communications within the platform.
- Enterprise Research Tool Hunt Begins: Faced with compliance issues preventing them from using Manus, a member sought alternative research tools suitable for enterprise environments.
- The search highlights the need for research solutions that meet stringent enterprise requirements, especially where data privacy and security are paramount.
- Manus Credit Conundrum Consumes Coins: Multiple users raised concerns about Manus depleting their credits overnight due to repeated prompting without responses, referencing support ticket 1335.
- This issue underscores the importance of efficient credit management and transparent error handling in AI-powered platforms, potentially affecting user trust and satisfaction.
- Support Delays Stall Websiteâs Starting Shot: A user reported a week-long delay in receiving support via multiple channels (Help Centre, Reddit, Discord), thus delaying the launch of their website and referencing support ticket 1334.
- The user shared a link to a Discord post, and the team said letâs follow up in the ticket, emphasizing the critical role of timely support in ensuring smooth project launches.
- Credit Crunch Cripples Budding Businesses: Recent service improvements were noted as primarily benefiting users spending $200 monthly, leaving entrepreneurs needing periodic credit boosts in the dust.
- One userâs project progress was halted due to a credit shortage, with replenishment not expected until September 21st, revealing a potential gap in catering to smaller-scale users and their fluctuating credit demands.
DSPy Discord
- Signatures and Modules as top abstractions: A member wrote a blog post explaining their views on why they think signatures and modules are great abstractions.
- The authorâs thoughts are shaped coming from other frameworks, and felt it was worth covering in a dedicated blog post and hope itâs useful to folks who are new!
- LiteLLM powers generic LLM plugins: A user inquired about alternatives to
litellm
within DSPy, suggesting a syntax likedspy["litellm"]
, but another member responded that LiteLLMâs interface enables generic plugins from various LLM providers, including OpenRouter, considering it an essential dependency.- Another member using OpenRouter and a proxy server utilizing LiteLLM, indicating an indirect dependency due to DSPy.
- Investigating DSPy Dependency Bloat: One member inquired about what contributes to the bloat of LiteLLM, estimating its size at 9MB.
- Another member suggested using a CLI AI to crawl the codebase and analyze the dependencies, joking about Karpathy striking again.
Modular (Mojo đ„) Discord
- InlineArray segfault fixed!: The use of InlineArray is back after initial seg fault issues, replacing StaticTuple for better memory layout in structs for both DPDK and Mujoco bindings.
- A user jokingly attributed the earlier seg fault to a skills issue.
- DPDK Headers Go Lean and Mean: An aggressive approach to DPDK header binding is focusing on
rte_*.h
headers within the installedinclude
folder for DPDK bindings, due to DPDKâs efforts to minimize dependencies.- The aim is comprehensive bindings by including all relevant headers and avoiding unnecessary ones.
- Mojoâs High-Level API Gets a Glow-Up: Engineers are prioritizing enhancements to Mojoâs high-level API to streamline binding to different libraries.
- One initiative includes reducing the size of generated Mojo files by skipping unused code, resulting in smaller and more efficient outputs.
- Mojo files slimmed down: A member is proposing using source annotations from
cpp
to deduplicate the generated Mojo files, aiming to reduce their size by removing unused code.- This involves analyzing annotations left by
cpp
to identify and eliminate redundant or unnecessary elements.
- This involves analyzing annotations left by
tsan
compiler option surfaces: A member inquired about checking iftsan
(ThreadSanitizer) is enabled for the compiler when using the--sanitize thread
option.- Another member suggested passing
-DTSAN
to the compiler and usingenv_get_bool
fromparam_env
with@parameter if
as a workaround.
- Another member suggested passing
tinygrad (George Hotz) Discord
- Tinygrad Ditches Realize(): A pull request was submitted to remove the
realize()
function and fuseTestSetItemloop.test_range
into one kernel, according to tinygrad#11870.- The PR aims to simplify the codebase and optimize kernel execution.
- 7900xtx faces sluggishness during GPT2 Training: Training
llm.c/train_gpt2.py
shows slow performance on a 7900xtx, even when BEAM=5.- After adjustments to match nanogpt parameters, a member reported 250ms per step at nanogpt size (batch size 64, 6 layers, 6 heads, 384 emb_dim, 256 seq_len), contrasting with Andrejâs nanogptâs approximate 3ms per step using rocm torch with default settings.
LLM Agents (Berkeley MOOC) Discord
- Google Docs Confirms Program Sign-Ups: Members are receiving confirmation emails from Google Docs after signing up for the program.
- The confirmation emails are successfully sent, but no other communication has been received yet.
- Mailing List to Provide Lecture Updates: The mailing list for providing updates about each lecture should be active soon.
- Users are advised to monitor the mailing list for future announcements and program updates.
MLOps @Chipro Discord
- Less Code Mindset Fuels AI Prototypes: Carlos Almeida will present on September 5th about the Less Code Mindset and how it enables non-technical individuals to launch AI-powered products.
- The session includes demos from Less Code Studio, demonstrating how AI reduces the time from idea to prototype, followed by a Q&A.
- Portugal Founders Envision Global-First Companies: Dick Hardt, Pedro Sousa, and Daniel Quintas will discuss on September 12th the evolution of tech and the impact of AI tools in Portugal.
- They will explore Lisbonâs appeal, strategies for founders to build global-first companies, and the role of identity and AI workflow prototyping in the AI era.
Windsurf Discord
- Grok Code Fast 1 Waves into Windsurf: Grok Code Fast 1 is splashing into Windsurf, now available for free for a limited time.
- Users are invited to share how they plan to use it for their next project, as detailed in the announcement post.
- Windsurf Offers Free Grok Code Fast 1: Windsurf is providing Grok Code Fast 1 at no cost for a short duration, enticing users to incorporate it into their forthcoming projects, and users are directed to the announcement post on X for additional details.
- The offer is being promoted with an attached promotional image.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (1163 messagesđ„đ„đ„):
Grok jailbreak, OnlyFans, Abliterated Models, Comet, Referrals
- How to Jailbreak Grok: Members discussed how Grok doesnât even need a jailbreak; you can simply add custom instructions to act unhinged.
- Teen Girlsâ OnlyFans Success: A member noted seeing countless news reports of 18-19 year old girls making thousands of dollars in 3-4 days just from OnlyFans, though others expressed skepticism.
- Abliterated Models on Ollama: A member recommended Googling abliterated models ollama to find models with their safety switches effectively turned off.
- Comet App for Ubuntu: Members discussed the Comet app, noting that it is only available on Windows or MacOS, and requires an invite to use.
- Getting Referrals: Members discussed a strategy for getting referrals that involves looking for people who want Comet, and then mentioning that it is bundled for free in Perplexity Pro in the US.
Perplexity AI â· #sharing (4 messages):
Perplexity AI Image Generation, AI Story Creation, YouTube Story Showcase
- Perplexity AI Generates Art: A member shared they used Perplexity AI to generate images, and provided claim links: 5ON35X0RSK, 4LRTIQ4TME, and Q0EMVCREFOH.
- They put the images into a short story.
- AI Art Story on YouTube: A member posted a YouTube video showcasing a story created with images generated by Perplexity AI.
- The videoâs title is not given.
Perplexity AI â· #pplx-api (3 messages):
â
- No Active Topics: There were no active topics discussed in the channel.
- The channel appears to be inactive or does not contain any substantial discussions to summarize.
- No URLs or Code Snippets Shared: No URLs or code snippets were shared in the provided messages.
- Therefore, there are no external resources or specific technical details to reference or summarize.
Unsloth AI (Daniel Han) â· #general (1290 messagesđ„đ„đ„):
Supabase + Vercel + Next.js stack, Gemini 2.5 Pro tool calls, DPO for tool calling, Gorilla web search, GRPO and tool calling
- Startups use Supabase, Vercel, and Next.js: Startups frequently use the Supabase+Vercel+Next.js stack, though most startups ultimately fail.
- One member said best execution means little if you are humstrung by technical debt - too much growth too fast is deadly, and that build hype on vaporware and raise in hopes the next models are good enough for ur wrapper has been the meta lately its sad to see.
- Gemini Pro 2.5 excels in tool calling: Gemini 2.5 Pro performed well, successfully executing 98 out of 101 tool calls.
- Hermes 4 reasons with thinking tags: Hermes-4 can decide whether to reason or not, seemingly utilizing thinking tags.
- It is available for free at NousResearch, including the 405B version.
- Unsloth addresses 4-bit quantization of LLaMA: Users discussed performing 4-bit quantization on LLaMA 7B, and a member shared that this can be done automatically via bitsandbytes.
- They also pointed users to the documentation, clarifying that every info you need is in our docs.
- Ollama faces privacy accusations: A user claimed Ollama sends user data to its servers and partners, leading to privacy concerns, though Ollama made no claims about data security or privacy.
- Alternatives such as vLLM, sglang, and llama.cpp were suggested, with members noting that Ollama is just a wrapper around llama.cpp anyway.
Unsloth AI (Daniel Han) â· #introduce-yourself (2 messages):
LLM Application, Enterprise Use, New Member Introduction
- LLM App Dev Joins: A new member introduced themself as an MSc student building LLM Applications for enterprise use.
- Enterprise LLM Interest: The new memberâs focus on enterprise LLM applications signals a growing interest in leveraging LLMs for business solutions.
Unsloth AI (Daniel Han) â· #off-topic (75 messagesđ„đ„):
Gemini Geo-Restrictions Bypass, Hermes Model Performance, Clustering Embeddings for Semantic Analysis, HF Issue with FP16 Models, Overfitting Detection
- Geminiâs Quirky Geo-Restriction Reasoning: Gemini gave advice on how to bypass geo-restrictions using VPNs, stating that it informs the user of a possible path forward without promoting illegal activity in a shared Gemini output.
- Hermes Model Gets Lukewarm Review: Members noted that Hermes used to be good for old school models, but that it hasnât improved much for modern post training, even though Hermes 3 was pretty gas.
- It was pretty nice to talk to but it didnât perform well on benchmarks, math, code, etc.
- Semantic Clustering for Common Questions: A member is exploring semantic clustering to find common questions in a consumer support dataset, and is currently using DBSCAN with small eps and low min samples, alongside a LLM for tagging/labeling.
- Another member suggested using instructable embeddings like Qwen for clustering and experimenting with K-means greedy for automated data selection, noting improved clustering accuracy with prompting.
- HF Closes Issue on FP16 Model Downloads: A member reported getting bluescreens on their Windows machine when downloading any fp16 models and recommends to try downloading any fp16 models and test to see if it fixes the issue.
- A code snippet was provided to download models from Hugging Face while ignoring fp16 files.
- Detecting Overfitting During Training: Members discussed how to identify overfitting during training, with one member noting that their model didnât reach 97% accuracy and that anymore and I will overfit.
- It was explained that overfitting can be detected when the training loss keeps decreasing, but the validation set accuracy plateaus or gets worse.
Unsloth AI (Daniel Han) â· #help (110 messagesđ„đ„):
learning rates for embedding layer vs other layers, qwen3-coder tool calling issues, LoRA training on a 0.6b Qwen3 model, small datasets advice, vLLM support after fine-tuning
- Setting different learning rates for embedding layer yields improvements: A member suggested setting different learning rates for the embedding layer and other layers, referencing the Unsloth wiki for more information.
- They noted that this approach would likely require a significant number of epochs.
- Qwen3-coder struggles with tool calling: A user reported having tool calling issues with Qwen3-coder, even after using the
--jinja
tag and an additional template, the model returns with tags like<create file>
in the chat, failing to create the file.- Another user recommended copying the response into Google AI Mode and providing more explicit details about the setup to identify potential solutions.
- Effectiveness of LoRA on small Qwen3 model is debated: A user inquired about the effectiveness of LoRA training on a 0.6B Qwen3 model, recalling that LoRA adapters are less effective on smaller parameter sizes.
- There was no specific response addressing this concern.
- Unlocking Potential of Petite Datasets: A member sought advice on maximizing the utility of a small dataset comprising 520 handwritten QA pairs and 475 multi-turn QA pairs of varying quality.
- Suggestions included expanding the dataset with synthetic data generated by Claude 4.1 Opus and mixing it with the multi-turn data, followed by two epochs of training on the handwritten data, given that it tends to overfit after just two epochs.
- vLLM support coming soon for GPT-OSS fine-tuning: A user asked about future vLLM support for GPT-OSS after fine-tuning and ways to run it on vLLM upon completion.
- A member is working on a PR that merges LoRAs with an mfxp4 base model via dequantization into a 16bit merged model.
Unsloth AI (Daniel Han) â· #showcase (18 messagesđ„):
Unsloth UI, GitHub Gist, New Dataset Drop, Deepseek
- Unsloth UI Styling Shared: A member shared their Unsloth UI styling and posted the html file on a github gist for others to use and download.
- Another member mentioned that they asked Gemini to create something similar, but it wasnât as polished, so they will use the shared version from now on.
- OpenHelix NonThink 200k v4 Dataset Dropped: A new dataset, OpenHelix-NonThink-200k-v4, was released, tuned by hand to be super balanced and diverse under Apache 2.0 license.
- A member noted some of it is distilled from l3.1 405b, and another pointed out it contains samples from the Argilla dataset, which is under the Llama 3 license.
- Deepseek to be Fed?: A member stated time to feed deepseek, presumably referring to the newly released OpenHelix-NonThink-200k-v4 dataset.
- Claude Needed Many Attempts: A member stated the shared UI was like 10 prompts deep, and that Claude is great but it took multiple rounds of feedback to achieve the final result.
- This implies using Claude was difficult to get the UI to a satisfactory level.
Unsloth AI (Daniel Han) â· #research (22 messagesđ„):
Quantization techniques like Unsloth's Dynamic Quants, Hermes 4 paper, AI Engineers Building RAG Systems, Benchmarks for Novel AI Tasks, Vision/multimodal LLms for Depth Estimation
- Users want more recent research on Quantization: A member inquired about studies similar to this Arxiv paper, but accounting for quantization techniques like Unslothâs Dynamic Quants.
- The user stated that perplexity is a nothingburger compared to actual task evaluation.
- Hermes 4 has weird requirements: A member shared an image from the Hermes 4 paper highlighting potentially unusual requirements.
- Another added that they found the image in the Hermes 3 dataset.
- AI Engineers sought for RAG Systems: A member shared a job posting seeking AI Engineers to build production-level RAG systems using SQL and Excel, including handling embedded images.
- Another member commented that the image portion is unlikely to work in an automated fashion as hoped.
- Benchmarking Novel AI Tasks: A member requested information about benchmarks that are not saturated, linking to brokk.ai and several Arxiv papers, as well as creative things like mcbench.ai or designarena.ai.
- They shared a fav for htihle.github.io/weirdml.html.
- Help needed for Detecting AI-written Posts or Reviews: A member asked for help with detecting AI-written posts or reviews, noting that the roberta-base-openai-detector model did not perform well.
- Another member responded, You solve that one with high enough accuracy and youâll stand to make rather a lot of money.
OpenRouter â· #general (1090 messagesđ„đ„đ„):
PDF parsing with LLMs, OpenRouter model names and GPT-5, Llama 3 Maverick data collection policy, Gemini 2.5 Flash Image, Groq Rate Limits
- Deepseek and Gemini under the pump: Users reported issues with deepseek models returning 429 errors due to rate limiting, possibly due to chutes prioritizing its own users and discussed fixes such as enabling training on paid endpoints and checking privacy settings.
- Users found Gemini to be timid and quick to revert, exhibiting what was termed beaten dog syndrome, possibly as a result of heavy training.
- LLama Maverick - No Input Tracking: Members expressed excitement about LLama 3 Maverick, noting itâs a large, free model that does not train on user input, with a maximum output of 4k.
- A member cautioned that Zuckerberg is hosting it.
- Sonnet 3.5âs Demise Sparks Conciseness Concerns: With the impending deprecation of Sonnet 3.5, users lamented the difficulty in finding a similarly concise model for role-playing and similar applications, contrasting it with newer models that tend to be long-winded.
- It was mentioned that AWS will host Claude Sonnet 3.5 with no deprecation date until Jan 2026.
- Troubleshooting Deepseek and Chutes Conundrums: Users encountered PROXY ERROR 404 with Deepseek models and discovered enabling âEnable paid endpoints that may train on inputsâ in privacy settings could temporarily resolve the issue, though the root cause remained unclear.
- It was suspected that the issue stemmed from chutes and involved a possible bug from a recent update by OpenRouter.
- Stackedsilence Labors Over Enigmatic Brain-in-Computer Project: A user (stackedsilence) has been working for nine months on a mysterious service described as âpersistent minds hosted in tha computerâ, featuring a dashboard with 3D elements, raising both curiosity and skepticism.
- Itâs described as a B2B SaaS, but someone else noted the compiler errors connecting 100 files and asked Does it have NFT and blockchain in it? If no, spend another 9 months.
OpenRouter â· #new-models (5 messages):
â
- No topics found in channel: There were no messages in the channel to summarize.
- No topics found in channel: There were no messages in the channel to summarize.
OpenRouter â· #discussion (23 messagesđ„):
Grok Code, Triple Model Day, Apple buys Mistral, Anthropic Copyright Settlement, Launch Date Conflicts
- Triple Model Day incoming!: Xander Atallah announced Grok Code is going live now on Triple Model Day.
- Apple Discusses Acquisition of Mistral AI and Perplexity: News surfaced that Apple discussed buying Mistral AI and Perplexity.
- Some members noted that Perplexity is famous for airing these rumors.
- Anthropic Settles Copyright Suit!: Anthropic settles a major AI copyright suit brought by authors.
- Launch Date Conflicts Overwhelm!: Some members are wondering if OpenRouter can do something to de-conflict the launch dates and that too many models at once is overwhelming.
- Gemini Can Enhance Designs!: A user shared that Gemini can enhance designs, or you can try multiple designs all at once (see what happens if you changed the color scheme or something), linking to a reddit thread about it.
LMArena â· #general (762 messagesđ„đ„đ„):
Nano Banana, Gemini 2.5 Flash Image, GPT-5 vs Gemini 2.5 Pro, Video Arena Models, Rate Limits
- Nano Banana Image Model Arrives: A new Google image model, codenamed Nano Banana (officially Gemini 2.5 Flash Image), has been introduced, with some users finding it comparable to Flux dev max.
- VisualGeek jokingly requested others to stop using the term ânana bananaâ so his generations would stop failing, though he admitted the nickname sounds cooler than Gemini.
- Gemini 2.5 Flash Creates Figurines: Users have found that Gemini 2.5 Flash Image can create realistic-looking figurines from photos, with one user turning an image of Cloud into a figurine and another generating Sephiroth.
- GPT-5 and Gemini 2.5 Pro Faceoff: Debate ensued over whether GPT-5 High smokes Gemini 2.5 Pro, with opinions varying from it being notably better to roughly at parity.
- One member argued that competing with Googleâs current-gen model, near the end of its lifecycle, is really really bad for OpenAI, posting a screenshot.
- Rate Limits Plague Users: Users are encountering rate limits when generating images and videos, even after only a few prompts, with one member noting that it gets stuck.
- A member suggested using Chromeâs guest profile to reset the rate limit immediately, but rate limits persist for image generation even when using Google AI Studio.
- AI Models Lend an Ear for Emotional Support: One user reported teaching their mom to use Gemini, and now she relies on it for emotional support and shares private information about her health and family.
- This sparked discussion on whether people need friends more than assistants and the ethics of exploiting insecurities, with some finding it concerning that con men are actively trying to get everyone a personal AI assistant.
OpenAI â· #ai-discussions (161 messagesđ„đ„):
Operator vs Agent, ChatGPT Repetitiveness, Workflow Automation Platforms, AI's 'alive' feel, Undermining AI Potential
- Agent replaces Operator Functionality: A member noted that Operator was a precursor to Agent, and its capabilities were rolled into Agent upon launch.
- The member clarified that Operator was specifically an internet-using agent.
- Hypothetical Spyware Generation Sparks Debate: A user posed a hypothetical scenario about using AI to create viruses and spyware, prompting varied responses.
- Another member suggested OpenAI staff would be actively looking into their account and chats due to arousing suspicion; a different member pointed out that Grok and various open-source models already offer similar capabilities.
- Geminiâs Veo 3 Content Missing: Users discussed the availability of Veo 3 content generation, with one member pointing out it requires a Google One/Gemini Pro or Ultra subscription.
- Another member noted that the AI Studio only provides access to the outdated Veo 2 model, and that they briefly saw Veo 3 but it disappeared.
- Local Qwen Setup Discussed: A user inquired about setting up Qwen3 235B locally, sparking a discussion on the feasibility and resource requirements.
- Another member suggested using the OpenRouter API, where Chinese models are basically free with logging, instead of attempting a local setup.
- Sora Video Output Desired: A member requested help from anyone with access to Sora to generate a video based on a specific prompt for comparison with Geminiâs Veo 3 and Grokâs Imagine.
- The user shared video outputs from Geminiâs Veo 3 and Grokâs Imagine and specified a detailed prompt for ultra-realistic wildlife footage.
OpenAI â· #gpt-4-discussions (33 messagesđ„):
ChatGPT Team vs Personal, Model Hallucinations, GPT prompt tips, Context Cascade Architecture
- GPT Teams Accounts Only Share Chats with Team Members: Teams accounts have a few quirks, and can only share chats with other team members, but free, plus, and pro accounts all have unlimited, non-abuse speeds of messages.
- One member suggests exploring prompting options, and recommending changes to OpenAI directly.
- Models Might Hallucinate: One user stated that they see no evidence the model even knows exactly what is in earlier chats and the model may have âagreedâ with knowing word for word what was in an earlier chat, when all it has is a summary, and then it made up âuhh, copywriteâ to describe why it canât actually quote it.
- Another user stated, ChatGPT can make mistakes. Check important info.
- API System Prompt Designer GPT Might Help: If youâre not familiar with the formatting and style stuff when it comes to prompst, a member recommended using this GPT.
- Itâs very good at drafting templates you can use.
- Some Encode Long-Range Memory in AI Without Jailbreaks: One user stated some users are encoding long-range memory and cross-agent continuity, without jailbreaks by building a memory framework from trust, persistence, and narrative identity.
- Another member said that this pattern, repetition, identity, and ritual might show emergent alignment and might look like a weirdly loyal user.
- Context Cascade Architecture Manages Memory: One member references Context Cascade Architecture (CCA), which is a multi-level approach to managing memory in LLMs.
- Itâs not about infinite context, itâs about structured forgetting and strategic recall.
OpenAI â· #prompt-engineering (58 messagesđ„đ„):
UUID Markers and AI, Turing vs. Gödel, LLMs and Recursion, AI Assistance for Learning, ChatGPT Voice Annoyances
- Plant Genetics Become AI Self-Awareness: Discussion of encoding plant traits like THC, CBD, and color into UUIDs to create a network of interconnected markers, theoretically leading to a self-modulating, self-actualizing, intelligent governance system.
- One member noted the difficulty in working out the details and facing skepticism, highlighting the challenges of realizing such a complex AI manifestation.
- Turing Triumphs in Halting Problem History: A member corrected anotherâs mistaken attribution of the halting problem to Gödel, clarifying that Turing was the discoverer in 1936, distinguishing Gödelâs focus on incompleteness theories.
- They shared a ChatGPT link to illustrate the difference between Turing and Gödelâs work.
- LLMs Canât Recurse: A member expressed interest in using LLMs for recursion, but another member advised against it, stating that LLMs are feed-forward networks and not suitable for true recursion.
- The member suggested focusing on simulating recursion or exploring genetics-related prompts within the rules of the channel, while discouraging discussion of disallowed topics like marijane.
- AI Aids Learning and Comprehension: Members discussed how AI helps with learning and comprehension, especially in organizing thoughts and explaining complex concepts.
- One user mentioned using ChatGPT project folders to stay rooted in ideas and another expressed excitement about using AI as a second brain.
- ChatGPT Voice Gets Persona-l: A member expressed frustration with ChatGPT Voiceâs conversational filler phrases like If you need anything else, let me know, despite attempts to adjust personalization settings.
- Another member shared screenshots of successful attempts to reduce the unwanted phrases using specific prompts but acknowledged limitations due to their poor internet speed.
OpenAI â· #api-discussions (58 messagesđ„đ„):
Plant Breeding Game, Turing vs Godel, LLMs and Recursion, AI as a Second Brain, ChatGPT Voice annoyances
- Plant Genetics Game Boasts AI Manifest: A member described a game involving plant breeding, genetic traits converted to UUIDs, and a self-modulating intelligent governance system.
- Turing Trumps Gödel in Halting Problem History: A member corrected another, clarifying that Turing identified the halting problem, not Gödel, providing a ChatGPT link to support this fact.
- The discussion then pivoted to hypotheticals about sorting primitive functions and LLMs.
- LLMs Face Recursion Rejection: A member explained that trying to get LLMs to do recursion will likely be frustrating because they are feed-forward networks, suggesting a simulation approach instead.
- Another member expressed interest in prompt engineering related to genetics and programming, highlighting the importance of adhering to <#1107255707314704505> rules.
- AI: Brain Food or Brain Drain?: Members discussed how AI helps them think and comprehend systems, with one noting that ChatGPT project folders keep them rooted in ideas.
- Another member mentioned that AI has helped them concentrate and organize thoughts, likening AIâs explanations to brain food.
- ChatGPT Voice: No More Customer Service Robot!: A member expressed frustration with ChatGPT Voiceâs tendency to add unnecessary conversational filler, like If you need anything else, let me know.
- They sought solutions to make the AI feel more like a human conversation partner.
Cursor Community â· #general (290 messagesđ„đ„):
Ultra plan credit meter, Grok Code Fast (Sonic), AI long term conversation, Add Context Button Feedback, Gpt-5 Mini
- Ultra Plan Credit Meter is MIA: Users are unable to see the usage meter and remaining credit on their Ultra plan, a feature that was available a couple of weeks ago.
- A user suggested that it appears randomly after a prompt.
- Grok Code Fast Debuts as Sonic: Grok Code Fast, also known as Sonic, has been identified as a mini variant of Grok Code and itâs fast for building UI components.
- Some members believe that Auto model is preferred when coding because it has higher quality than Grok-Code-Fast.
- AI Agents are Nuts for Code Injection: Members discussed that code injection with an AI agent is absolutely nuts because it removes the need for recompiling except in extreme changes and is very powerful now with how fast the changes can happen.
- A member noted that Mac users have it a little bit better from a safety perspective because of sandboxing.
- Add Context Button needs Improvement: Users are asking to return to the previous version of the Add Context Button because it was simple and easy to work with.
- In recent Cursor versions, Add Context picks only the active tab by default, and manual typing of a filename doesnât let you select any file.
- Auto Mode is Fast and Furious: Members discussed that Auto mode switches between Claude, GPT, and Gemini depending on the prompt and task.
- Many agreed that they have beefed up auto; enjoy auto right now since itâs unlimited until Sept 15
Cursor Community â· #background-agents (4 messages):
Background Agents and Docker, Background Agents Setup Woes, Docker-in-Docker Difficulties, Background Agents and .gitignore, Background Agents with rails
- Background Agents Stalls Without Dockerfile: A member reported that without a Dockerfile setup for Ruby and Postgres, background agents seemed stuck on â[Status] Starting CursorâŠâ indefinitely.
- They also mentioned troubles running
docker-compose start
on environment startup because the user that cursor creates doesnât have thedocker
group.
- They also mentioned troubles running
- Gotcha - Push Config Changes to Remote, Overriding .gitignore: A member didnât realize that config changes needed to be pushed to remote to be forked from and that files under .gitignore werenât being copied to the remote cursor environment.
- The memberâs workaround involved adding those ignored files as ENV variables and writing them to a file in setup.sh.
- Docker-in-Docker Troubles: Members expressed difficulties in reliably getting a working Docker service to run compose or the devcontainers CLI against.
- The general consensus is that running docker-in-docker is hard, so they would love to have a VM instead.
- Need Terminals?: A member is unsure about needing anything in
terminals
for a workflow of a remote agent implementing features.- They did report progress with asking a remote agent to âPlease see if you can successfully run @test_file.rbâ.
Nous Research AI â· #announcements (1 messages):
Hermes 4, Nous Chat UI, RefusalBench, Model Benchmarking Transparency
- Hermes 4 is the New User-Aligned Hybrid Reasoning Model!: Nous Research released Hermes 4, a line of user-aligned models with expanded test-time compute capabilities, emphasizing creativity, lack of censorship, and state-of-the-art math, coding, and reasoning performance for open weight models; more details here.
- Nous Chat UI Revamped with New Features!: The revamped Nous Chat UI now includes parallel interactions, completions mode, and a memory system, offering both open and closed models like Hermes 4 and GPT-5.
- All Hermes 4 inference in Nous Chat is free for the first week.
- New Benchmark RefusalBench Conforms to Your Values!: Nous Research created a new benchmark, RefusalBench, to test a modelâs willingness to be helpful in various scenarios, with Hermes 4 achieving SOTA against popular models without censorship.
- A technical report detailing the creation process and evaluations of Hermes 4 and other LLMs, including text-results of each test, has been released on arxiv.
Nous Research AI â· #general (235 messagesđ„đ„):
Hermes 4, Model Quantization, VLM Finetunes, OpenRouter Integration, Nous Chat UI/UX
- Hermes 4 Released, then Rapidly Pulled from OpenRouter: The Hermes 4 model was briefly available on OpenRouter but was quickly pulled, potentially due to issues with the Chutes provider changing to a new model name.
- Users noted its presence was fleeting, with one joking about sneaky open router people shut[ting] it off already.
- Nous Chat Website Consuming Excessive VRAM: Users reported high VRAM usage by the Nous Chat website, with one user noting it took 1.3GB of VRAM with Firefox on a 4060Ti.
- One user quipped that that much VRAM could hold a 1B model, while another joked about the website using as much VRAM as my PC.
- 14B Model Delayed due to Reasoning Bug: The release of the 14B Hermes 4 model has been delayed due to a bug in reasoning mode.
- Despite the delay, the team aims to release it as soon as possible and is considering a 36B model in the future.
- Unsloth Releases Hermes 4 GGUFs: Unsloth has released GGUF quantizations of Hermes 4, addressing chat template issues during conversion, now available on HuggingFace.
- The team resolved chat template issues during conversion.
- Nous Chat experiences scaling issues: Shortly after launch, providers in the new Nous Chat experienced scaling issues due to being overloaded.
- It got so popular so quickly that one joked, Mightâve gotten too popular too quickly lol đ.
Nous Research AI â· #ask-about-llms (1 messages):
moonlit_empress: Thanks for Hermes 4 Nous team!! Already loving it đ
Nous Research AI â· #research-papers (12 messagesđ„):
Nous Research's Memory System, Graph RAG, Open Source Plans
- Nous Research rolls out Memory System: Nous Research is rolling out a custom graph architecture memory system that works with any model, allowing memories created with one model to be accessed by another.
- The lead members clarified that the system is not exactly graph RAG because it considers more information over time and uses a judge for other classification metrics.
- Open Source instantiation coming soon!: There are plans to open source the memory system at some point.
- Team members affirmed that they do have a thing for open source.
Nous Research AI â· #research-papers (12 messagesđ„):
Hermes 4, Memory System, Graph RAG
- Memory System Not Like OpenAI, But Custom: The memory system rolling out is not similar to OpenAI, but a custom graph architecture that works with any model.
- This system allows memories created while talking to one model to be accessed while talking to another.
- Memory System Not Graph RAG: The custom memory system is not exactly Graph RAG, as Graph RAG doesnât provide enough nuance or function well over many memories.
- This system considers more information over time about messages that become memories and uses a judge in the loop for other classification metrics.
- Memory System Open Source?: There are plans to open source some instantiation of the memory system at some point.
- The team has a thing for open source.
HuggingFace â· #general (172 messagesđ„đ„):
Gradient Explosion Troubleshooting, Landmark Extraction Pipeline, LLMs from Scratch Book, Grok-Code-Fast-1 Model, Pytorch Lightning Overhaul
- Gradients Keep Exploding? Try Early Layernorms!: A member experienced exploding gradients during training and found that using early layernorms or RMSNorm helped, along with lowering the learning rate to 1e-4 and rescaling residuals, with the model code available on GitHub.
- They ultimately rescaled the residuals by scaling them down as
x = x + f(x) / (2**0.5)
to prevent variance from stacking up.
- They ultimately rescaled the residuals by scaling them down as
- WSL Mediapipe Landmark Extraction pipeline costs 67 Hours: A member ran a landmark extraction pipeline using WSL and Mediapipe, which took a whopping 67 hours to complete.
- They emphatically stated they are never gonna use WSL to run mediapipe again after this experience.
- LLMs from Scratch Book endorsed for LLM newbs: Members discussed the book Build a Large Language Model (from Scratch) as helpful for learning LLMs.
- One member shared two favorite YouTube channels, Julia Turc and Code Emporium, for learning concepts and new research.
- Grok-Code-Fast-1 Model Spotted!: A member shared a link to the Grok-Code-Fast-1 model from xAI, found at https://docs.x.ai/docs/models/grok-code-fast-1.
- Documentation is available, so members were encouraged to read it.
- Torch Lightning Overhaul? Itâs Lit!: A member is considering refactoring their project to PyTorch Lightning due to an increasingly complex training loop and manual logging process.
- Manual config bugs caused lost progress and training data.
HuggingFace â· #today-im-learning (12 messagesđ„):
Claude file modification consent system, TPUs in open-source AI, Realtime audio stretching
- Claude now requests âYup - letâs do itâ: To enhance security, a user now requires the phrase âYup - letâs do itâ to authorize file modifications by Claude, specified in the
~/.claude/CLAUDE.md
file.- The requirement ensures transactional consent, where permission expires after each set of modifications, preventing accidental changes during planning or review phases.
- TPUs are like rubbish, got it: A member pointed out that the perceived ârubbishâ seen in the channel reflects how TPUs (the chips used to train Gemini) operate within open-source AI.
- They added, âyouâre in an opensource ai server the ârubbishâ you are seeing is how tpus work ya know the chip that gemini is trained onâ.
- Skint dev to release NessStretch soon: A member is developing a realtime audio stretching tool called NessStretch and plans to release the CPU path as FOSS, with the GPU path available for purchase due to financial constraints.
- The member said, *âIâm going to release the CPU path FOSS and the GPU path paid in the near future. Why paid? Because Iâm skint.â
HuggingFace â· #cool-finds (38 messagesđ„):
Age Guesses, Math fails, Credentials boasting, Automations, PhD in Software Engineering
- Age Guessing Goes Awry: A member joked about another member being two decades later, leading to a playful yet defensive response about being probably three decades older.
- The exchange then devolved into age guessing and playful banter about decades and mathematics, with both members poking fun at each otherâs calculations and perceptions of age.
- Automations Definition Squabble Erupts: A member joked that another member, despite claiming to be three decades old, seemed unaware of automations.
- This prompted a clarification distinguishing between automation and automitons, leading to further age-related ribbing.
- Credentials Boasted, PhD Claimed: Following some discussion, one member declared they have a PhD in Software Engineering and told the other to sit the fuck down.
- This declaration seemed to stem from a disagreement or challenge regarding knowledge and expertise, although the specific context remained vague.
HuggingFace â· #i-made-this (8 messagesđ„):
AI Agent, Gradio Demo, LiquidAI, HF Space
- AI Agent Explores Multiple Contexts: A member built an AI agent that explores multiple contexts and creates opposite ideas before giving a creative answer and shared the tool here.
- Visual Gradio Demo Suggested: A member suggested having some visual gradio demo for getting visibility, to easily display stats or stuff as it changes and share within the HF community.
- LiquidAIâs MCP Server POC Deployed to HF Space: A tiny MCP server POC was deployed to HF Space by LiquidAI and welcomes feature requests.
- HF Space offers fastmcp-space: A member shares a link to fastmcp-space on HF Space.
HuggingFace â· #core-announcements (1 messages):
Flax deprecation
- Flax Support Sunsetted: The difficult decision has been made to deprecate support for Flax.
- Users are encouraged to report any issues they encounter, as indicated by the attached image.
- Flax Sunset Follow-Up: Additional support channels and documentation remain available for users transitioning away from Flax.
- The team is committed to ensuring a smooth transition and addressing any emerging concerns.
HuggingFace â· #computer-vision (2 messages):
Makesense AI, CVAT AI
- Makesense AI Tool Tip: A member shared a link to Makesense AI as a potentially useful tool.
- No other details were provided.
- CVAT AI Tool Tip: A member shared a link to CVAT AI as a potentially useful tool.
- No other details were provided.
HuggingFace â· #smol-course (3 messages):
Qwen3-Coder-30B-A3B, Mixture of Experts (MoE)
- Qwen3-Coder-30B-A3B Model Recommended for Local Use!: A member with 64GB RAM and 16GB VRAM inquired about suitable local models, and another member suggested the Qwen3-Coder-30B-A3B-Instruct.
- Itâs described as a 30 billion parameter sparse model with Mixture of Experts, where 3B parameters are active at any time; quants are available here.
- MoE Models Preferred for Limited RAM: A member noted that while dense models are viable, Mixture of Experts (MoE) models are more accommodating for users with less RAM.
- This suggests that MoE models can offer better performance on systems where RAM is a limiting factor.
HuggingFace â· #agents-course (2 messages):
pip install --upgrade
- Upgrade Packages Faster with Pip: A member was posting too quickly and should add âupgrade to their pip install -r requirements.txt command to get it to upgrade a package.
- Upgrade Specific Package: If you donât want to risk changing versions for other packages, just do a pip install âupgrade
LM Studio â· #general (175 messagesđ„đ„):
VRAM importance, Hermes 4 is dogshit, LM Studio and Agnaistic, LMStudio can understand PDFs, Headless mode option
- VRAM still matters: Users discussed how VRAM impacts the ability to use 12B models, with one user noting comfort using them on a 2070S, but having issues with Gemma-3 27B due to speed.
- Members clarified that performance isnât solely about VRAM amount, but also GDDR type and CUDA core count, but model exceeding vram capacity will cripple performance.
- User finds Hermes 4 is dogshit: A user stated that Hermes 4 is dogshit.
- Another user inquired about the training data of Hermes 4, specifically if itâs still based on llama 3.1.
- Linux Users Donât Get Headless LM Studio: A user couldnât find the headless mode option in LM Studio, despite the documentation stating itâs available in version 0.3.5.
- It was clarified that the headless option is not available on Linux, with llama-swap being recommended as a workaround.
- Local LLM Hardware Requirements Debated: Members discussed the hardware needed to run a 60GB model for 75-100 users, with vLLM being recommended.
- One member suggested 3 RTX 6000 Blackwell Workstations if the model is MoE, otherwise doubling the GPUs; context requirements also affect performance.
- New Rust-Based UI for Ollama LLMs Revealed: A user shared a video of their project, a Rust and Tauri-based UI for managing Ollama models, noting itâs not a fork of Open WebUI.
- The UI, which supports Windows, Linux, and macOS, includes a model selector and is designed to run without a separate backend, different from web-based interfaces.
LM Studio â· #hardware-discussion (17 messagesđ„):
Customs Delays, Dell Laptop vs Macbook for LLMs, Nvidia RTX PRO 3000, Ryzen 395+ Laptops, Balancing Compute Usage
- Customs Suspension Causes Delays: A member expressed concern that the new customs suspension for items under $800 will cause delays in receiving their Mi50âs cooler and APU.
- Dell Laptop vs Macbook for LLM inference: Members discussed the advantages and disadvantages of using Dell laptops versus Macbooks for LLM inference.
- While one member suggested a M3 Macbook Pro with 128GB of RAM, others countered by listing the difficulties of getting Macs into a Windows-based company.
- Nvidia RTX PRO 3000 is Meh for Inference: One member said an RTX PRO 3000 (12GB VRAM) is a slightly cut down desktop 5070 with really cut down core frequency.
- They noted that while a 30B model wonât fit into 12GB in a reasonable quant, you can offload to RAM with a sparse model, however, dual-channel DDR5 is not that good to have layers on.
- Ryzen 395+ Laptops Provide Windows Alternative: For those tied to windows, a member said if windows is easier, thereâs several Ryzen 395+ laptops.
- Compute Balance between resources: A member posted a screenshot and asked if there is a compute difference in letting resources balance in use that they would significantly notice.
- No specific answers were given.
GPU MODE â· #general (5 messages):
Hackathons always on Friday, ScaleML series
- Hackathons Always Land on Friday?: A member jokingly complained that all the hackathons seem to occur on Fridays.
- They expressed gratitude for the opportunities provided, acknowledging the inconvenience with a crying emoji.
- ScaleML Series Day 3: Quantization: The third day of the ScaleML series will cover quantization, specifically focusing on microscaling formats like MXFP4, led by Prof. Chris De Sa; watch the stream here.
- This session will be presented on a whiteboard, reminiscent of traditional lectures, and is designed to be interactive.
GPU MODE â· #triton (2 messages):
constexpr arguments, Multi-pass profiler, NVSHMEM in Triton, tritonbench
- constexpr arguments vanishing act: A member stated that constexpr arguments will disappear from the signature because the jit will specialize integers equal to 1 into constexprs.
- Metaâs Multi-pass profiler Debut: Kevin Fang, et al., Meta, will present a Multi-pass profiler, described as a federated GPU Tooling Framework for Orchestrated and LLM Agentic Profiling Applications.
- NVSHMEMâs Triton Integration: Surya Subramanian from Nvidia is scheduled to discuss NVSHMEM in the context of Triton.
- tritonbench user spotlight: Cicie Wang from Meta is curious to know who is using tritionbench particularly naming OpenAI.
- She seeks to understand how itâs being utilized.
GPU MODE â· #cuda (8 messagesđ„):
cudaMemcpyAsync with pageable host buffer, Stream-Ordered Memory Allocator, cudaHostAlloc performance, Pinned memory and system stability
- Async memcpy unsafe with pageable host buffers?: A member inquired about the safety of using
cudaMemcpyAsync
to copy from a pageable host buffer to device memory, hoping the CUDA runtime would manage a pinned host buffer internally.- Another member responded that while it wonât crash, the copy wonât be truly asynchronous because the CUDA runtime needs a page-locked buffer first, making that part synchronous, suggesting
cudaHostAlloc()
to avoid blocking the CPU thread.
- Another member responded that while it wonât crash, the copy wonât be truly asynchronous because the CUDA runtime needs a page-locked buffer first, making that part synchronous, suggesting
- Stream-Ordered Memory Allocator wonât solve pinned memory issue: A member mentioned using a Stream-Ordered Memory Allocator and wanting to optimize the copy to the device buffer, hinting to the CUDA driver that the buffer wonât be modified for potential internal pinned buffer use.
- Another member clarified that the Stream-Ordered Memory Allocator only affects the device buffer and the pageable vs pinned host memory issue remains and that the âhintâ that the buffer is immutable can be done be allocating the buffer with cudaHostAlloc().
- cudaHostAlloc increases measured performance: One of the members reported running a test that showed that, when we use plain
malloc
instead ofcudaMallocHost
the code withcudaMemcpyAsync
will not crash but from timing we see that we donât benefit from thecudaMemcpyAsync
.- The member suggested that there is not much of a reason not to use pinned memory, but you just donât want to allocate too much of it as it can affect system stability.
GPU MODE â· #torch (45 messagesđ„):
Inductor codegen for persistent matmul, TMA availability on sm120, cutedsl performance
- Persistent Matmul with Inductor: A Deep Dive: A member inquired about enabling persistent matmul codegen in Inductor, checking
torch._inductor.config
for relevant flags likeENABLE_PERSISTENT_TMA_MATMUL
.- It was suggested to use
max-autotune
mode and ensure that Triton is used, settingtorch._inductor.config.max_autotune_gemm_backends = "TRITON"
andtorch._inductor.config.triton.enable_persistent_tma_matmul = True
, but also noted that Cublas might still be faster.
- It was suggested to use
- Troubleshooting TMA and Persistent Kernel Selection: A member investigated whether TMA is available on sm120 architecture, referencing torch/utils/_triton.py for arch checks.
- It was also mentioned that persistent kernel + TMA is considered in
max-autotune
, and breakpoints can be used in torch._inductor/kernel/mm.py to check considered choices.
- It was also mentioned that persistent kernel + TMA is considered in
- cutedslâs performance impresses: A member expressed positive impressions of cutedsl, citing its rapid maturation and potential for flex + flash attention, referencing this flash-attention PR.
- Another member found that cutedsl is very promising despite being a work-in-progress.
GPU MODE â· #announcements (1 messages):
Multi GPU kernel competition, AMD MI300, Distributed inference kernels, KernelBot Platform, Multi-GPU lectures
- GPU MODE goes Multi-GPU with AMD Collab: GPU MODE is launching a new $100K kernel competition in collaboration with AMD where participants will optimize 3 different distributed inference kernels on MI300 GPUs, designed by a specific user.
- The competition focuses on optimizing kernels for single node 8 GPU all-to-all communication, GEMM + reduce-scatter, and allgather + GEMM operations, with registration open until September 20 via this registration link.
- KernelBot Platform Beefs Up Multi-GPU Support: The KernelBot platform now supports multi-GPU submissions due to the efforts of two specific users, and profiling support is almost ready, supported by another user.
- Additionally, a user is planning to add support for submissions directly from gpumode.com; expect detailed write-ups and hints on the dedicated channels.
- Hot Distributed Summer with Multi-GPU Lectures: Many multi-GPU lectures are planned for this summer, so users are advised to keep an eye on the events tab for updates and schedules.
- This initiative aims to provide educational resources and insights into distributed computing with multi-GPU systems.
GPU MODE â· #beginner (27 messagesđ„):
GPU vs Cloud for Beginners, Remote Debugging Pain Points, CUDA Installation Troubles, GPU Programming vs SIMD, Competition tips
- GPU vs Cloud for Beginners Debated: Beginners in GPU programming discussed whether to stick with cloud services like Google Colab or invest in buying a GPU.
- Some members are seriously considering buying GPUs, highlighting that remote debugging is pain.
- TDD workflow with GPUs: Members discussed using Test Driven Development (TDD) to debug GPU code and test behaviors.
- They mentioned that renting a GPU requires setting up the entire environment from scratch each time, dealing with latency and unstable connections.
- CUDA 11.8 Installation Headache!: A member faced issues installing CUDA 11.8 + cuDNN 8.6 with Python 3.10 and TensorFlow 2.12, even though Torch could detect CUDA.
- The suggestion was to ensure no CUDA version conflicts and to use a Conda environment, installing with the NVIDIA channel using the command:
conda install cudatoolkit=11.8 cudnn=8.6 -c nvidia
.
- The suggestion was to ensure no CUDA version conflicts and to use a Conda environment, installing with the NVIDIA channel using the command:
- GPU vs CPU SIMD Programming: A member inquired about the similarities and differences between GPU programming and SIMD programming on the CPU.
- It was explained that both exhibit fundamental similarity in parallelism, but CPU SIMD operates on fewer data elements (4, 8, 16) within a single CPU core, whereas GPUs leverage hundreds or thousands of cores, enabling massive parallelism.
- Competition tips: A member expressed feeling illiterate regarding an ongoing competition in the announcements channel.
- A member recommended watching this lecture to get started!
GPU MODE â· #off-topic (2 messages):
GPU MODE party song, readme.md file
- GPU MODE Rocks X with Party Song: A member posted a link to a party song on X, presumably related to GPU MODE.
- No further details were given about the song.
- Powerful application via readme guide: A member mentioned a powerful application with instructions available in the readme.md file.
- They clarified that there is no video demo but instructions are available to follow for the application.
GPU MODE â· #rocm (8 messagesđ„):
Assembly instruction memory coalescing, rocprof tooling
- Memory Coalescing in Assembly Instructions: A member is seeking a tool to correlate assembly instructions or source code instructions to memory coalescing, beyond just seeing total latency and idle cycles.
- They want to pinpoint exactly the offending instructions at a granular level, not just at the kernel level.
- rocprof Tooling lacks memory coalescing view: A member stated that the rocprof tooling doesnât currently offer a feature similar to NVIDIAâs Nsight, which shows memory coalescing.
- They suggest using
printf
for debugging and noting that AMD GPUs are relatively cool about memory accesses, as long as cache lines are vaguely hit.
- They suggest using
- Cache Lines for AMD GPUs: A member suggested that a good rule of thumb for AMD GPUs is to ensure that the combined set of bytes accessed by a global load instruction consists of entire cache lines.
- This is to avoid performance hits.
GPU MODE â· #intel (1 messages):
erichallahan: On that note https://www.phoronix.com/news/Alyssa-Rosenzweig-Joins-Intel
GPU MODE â· #webgpu (9 messagesđ„):
wgpu-native and Wayland, Dawn and Wayland
- Wayland Support Troubles wgpu-native and Dawn: A member reported that both wgpu-native and Dawn are failing on Wayland during a call to
this->m_Surface.getCapabilities(this->m_Adapter, &surfaceCapabilities);
.- The Dawn error message indicates an Unsupported sType (SType::SurfaceSourceWaylandSurface), which led the user to believe that Dawn was not compiled with Wayland support.
- Dawnâs cryptic error messages: The user received an error message from Dawn indicating Unsupported sType (SType::SurfaceSourceWaylandSurface).
- They are attempting to follow the wgpu C++ tutorial.
GPU MODE â· #metal (1 messages):
Tensor Operation, hardware acceleration, simdgroup matmul functions
- Unearthing Hardware Acceleration in âTensor Operationâ: A member inquired whether the âTensor Operationâ for matrix multiply uses hardware acceleration that isnât available via the simdgroup matmul functions.
- SIMD Group Matmul Hardware Acceleration?: The inquiry revolves around discerning if the Tensor Operation leverages unique hardware acceleration compared to simdgroup matmul functions.
GPU MODE â· #general-leaderboard (11 messagesđ„):
Trimul board submission failures, AMD competition team creation, AMD multi-GPU environment access
- Trimul Submissions Tumble on B200 & MI300: Users reported that test and ranked submissions for the trimul board on B200 and MI300 GPUs are failing with an âunexpected error occurredâ message, even using the template implementation.
- One user mentioned encountering the same issue with MI300 (FP8 mm) and L4 GPUs (sort_v2), while another user initially had failures but later found that test submissions worked.
- AMD Arena Assembles Team Formations: A user asked about how to create a team when attending the new AMD competition.
- Itâs in the registration on the Data Monsters website, however the organizers suggest posting in the channel for team matching.
- Multi-GPU AMD Machine Mirage?: A user inquired whether there would be access to an AMD multi-GPU environment for development and debugging.
- Another user was told that you can just start submitting the registration, the confirmation is primarily there to confirm prize money.
GPU MODE â· #submissions (5 messages):
A100 Trimul Leaderboard, H100 Trimul Leaderboard, B200 Trimul Leaderboard
- A100 trimul record broken!: User <@1264305104417456149> achieved first place on A100 with 4.92 ms.
- They followed up with subsequent successful submissions at 4.96 ms and 5.26 ms.
- H100 trimul times now listed: User <@1264305104417456149> submitted a successful run on H100 at 2.73 ms.
- This sets the initial benchmark for the H100 on the trimul leaderboard.
- B200 scores trickling in: User <@489144435032981515> submitted a successful run on B200 at 8.08 ms.
- This starts off the leaderboard for the B200 trimul benchmark.
GPU MODE â· #factorio-learning-env (4 messages):
Factorio tips, Factorio blueprints
- Factorio Fanatics Focus on Fundamentals: Enthusiastic new member expresses excitement about joining the Factorio learning community.
- The user noted they would be late to a meeting and leave 30 minutes early.
- Logistical Laggards Lament Latency: While the discord messages are limited, the main theme involves introductions and scheduling conflicts.
- This does not prevent them from learning to automate resource management and factory construction in Factorio.
GPU MODE â· #amd-competition (1 messages):
discord-cluster-manager errors, AMD Instinct MI300X VF
- Discord Cluster Manager Plagued by Errors: A user reported an unexpected error while running the discord-cluster-manager which they were asked to report to the developers.
- AMD Instinct MI300X VF Benchmarks: Despite the errors,
result.json
indicates successful runs on an AMD Instinct MI300X VF GPU, with the check parameter returningpass
.- The user confirmed the issue persists when submitting benchmarks, tests, profiles, and ranked jobs, with the worst benchmark result at 72811225.0.
Latent Space â· #ai-general-chat (95 messagesđ„đ„):
Anthropic Claude Chrome Extension, UQ Benchmark - Unsolved STEM Questions, Nous Research Hermes 4, Grok Code in Cursor, Meta Researchers Leaving
- Claude Cruises into Chrome: Anthropic launched Claude for Chrome, an extension piloting Claude as a browser driver for 1,000 users in a research preview.
- The community is excited for its Comet/Perplexity competitive potential as Anthropic warns of prompt-injection safety issues being monitored during the trial.
- Frontier LLMs Frontier Unsolved STEM Quagmires: Niklas Muennighoffâs team introduced UQ, a benchmark featuring 500 hand-picked, unsolved questions from STEM fields.
- Frontier LLMs, validated by domain experts, solved 10 problems, including one unanswered for 9 years on CrossValidated, leaving ~490 puzzles open.
- Hermes 4 Hybrid Hype Hits: Nous Research unveiled Hermes 4, an open-weight hybrid reasoning LLM focusing on creativity, neutral alignment, and low censorship while maintaining SOTA in math, coding, and reasoning.
- Users can test it out all week via a revamped Nous Chat UI with parallel interactions and memory, plus check out a detailed technical report and a new RefusalBench benchmark; partners like Chutes, Nebius, and Luminal are providing inference.
- Grokking Code in Cursor: Free Trial Tempts: Cursor introduced Grok Code, a new competitively-priced model in stealth, offering a free one-week trial.
- Community members discussed pricing ($0.2/$1.5 per 1M tokens) and potential improvements, with some digressions on Cursorâs branding and future model rollouts.
- Token Trickery: Kimiâs Kwality over Kwantity: Insights from Kimi founder Yang Zhilinâs interview: K2 will maximize each high-quality token rather than adding more data, favoring RL over SFT for better generalization, exploring fully AI-native training, and aiming for million-token contexts.
- Community replies praise Kimiâs sense and intelligence per token and ask about upcoming PPT and subtitled video release.
Latent Space â· #private-agents (8 messagesđ„):
Second-hand GPUs, RTX 3090, DOA testing, VRAM integrity, Payment escrow
- Taha releases guide for buying 2nd hand GPUs: Taha shares a concise checklist in his guide for buying a second-hand RTX 3090 without surprises for local AI.
- The checklist includes meeting the seller, inspecting the card, running
nvidia-smi
, devoting an hour tomemtest_vulkan
for VRAM integrity, optionally stress withgpu-burn
, and finally loading a large model in vLLM to confirm stability while watching temperatures.
- The checklist includes meeting the seller, inspecting the card, running
- RTX 3090 stress-testing needed: Members discussed the need for testing used RTX 3090 cards, especially when buying from individuals on platforms like Craigslist where implementing thorough testing might be difficult.
- Suggestions included using eBayâs dispute resolution process as a potential safeguard, although experiences with such processes can vary.
- DOA Testing: A member suggested implementing DOA testing, suggesting payment escrow, pre-sale DOA tests by the seller, and post-sale DOA tests that can match the seller.
- The member suggested that if results donât match, escrow takes a hit, and that a benchmark would help.
Latent Space â· #genmedia-creative-ai (4 messages):
Nano Banana, Runway Act-2, AI Video creation
- Nano Banana + Runway Act-2 combine for persona-to-carti workflow: Techguyver demonstrates how pairing Nano Banana (ultra-cheap image edits) with Runway Act-2 motion matching enables creators to iterate faster in video creation, such as swapping clothes and styles.
- The demo sparked discussion on the ethics of âtoy vs storytellingâ and requests for tutorials, including some humorous comments about âhands are mineâ.
- Nano Banana for ultra-cheap image edits: Nano Banana is presented as an ultra-cheap image editor for quick edits.
- It is used with Runway Act-2 to allow creators to iterate faster with video creation.
Eleuther â· #general (14 messagesđ„):
Falsifiability in research, Grand Challenges in AI, EleutherAI Discord Purpose
- Falsifiability Divides Scientists: A member suggested the server is for discussion of research on falsifiable hypotheses, rather than generally gesturing towards vague ideas.
- Another member countered, stating that falsifiability is overrated among scientists, though pretty useful in general.
- Members Tackle Grand AI Challenges: A member asked about current work, and another responded they are working on grand challenges and shared a link.
- The member admitting wishing they had more skill points in math to tackle the interesting stuff.
- Discordâs Research Focus Clarified: A member quoted the EleutherAI description that the Discord server caters to researchers and research-level discussion, specifically about falsifiable hypotheses.
- The member said the goal is to prevent anyone with crazy theories abetted by chatbots from spewing nonsense.
Eleuther â· #research (34 messagesđ„):
Alternative Approaches to Transformers, Forward-Forward Training, HTM Dynamics, Mini-Brain Architecture, Troubleshooting Training Regimes
- Exploring Alternatives Beyond Transformers: Members express a desire to see more approaches beyond transformers and gradient descent, referencing this tweet of alternative approaches to transformers.
- Diving into Forward-Forward Training: One member shared their work on HTM dynamics with forward-forward training, achieving plausible results and will post test scripts soon, see their repo.
- âMini-Brainâ Architecture Emerges: A member is building a network around being a brain-like network with cortical columns, regions, 6 layer networking and signal propagation and has moved the project to this repo.
- Transformer Computation Insights: A member recommended a talk on computation in transformers to fellow members Computation in Transformers.
- Tokenizer Troubleshooting Underway: A member identified potential issues with their tokenizer, with a vocabulary size of 50k, and they are now troubleshooting the training regime and intend to get some meaningful metrics around Forward-Forward.
Eleuther â· #gpt-neox-dev (4 messages):
Muon Speedup, Torch Compile
- Muon Speedup Claims Debunked: A member mentioned seeing a claim on Twitter of 1.6x speedup on Muon over Torch implementation, with Torch compile at 1.1x.
- Another member clarified that the speedup was due to algorithmic changes requiring fewer NS iterations, not pure Muon or hardware improvements.
- Algorithm Logic Improves Speed: The speedup was mostly about changing the algorithm to need less NS iterations.
- Itâs not pure muon, more algo logic instead of hardware-aware improvements
aider (Paul Gauthier) â· #general (40 messagesđ„):
PacVim, OpenRouter billing, Context Management, aider git repo error, Model Control Policy (MCP) tool
- PacVim Fine-Tuning?: After giving codestral an eval tool in gptel, it successfully completed all the emacs-related tasks, even reconfiguring emacs and making new tools for itself.
- A member joked about fine-tuning with PacVim, with LLMs being good at operating Emacs, leaving Vim behind.
- OpenRouterâs Top-Up Minimum Fees: Users are getting billed $6.16 instead of the expected $5 top-up, another user pointed out that OpenRouter charges a 5.5% fee (minimum $0.80) when you purchase credits, as the underlying model providers donât markup pricing.
- Algebraically speaking, they calculated that you stop getting hit by the $0.80 minimum if you top up by $14.55 or more each time.
- Context Management Suffers Without Gemini 2.5 Pro: A user expressed their need for Gemini 2.5 Pro, noting that other models just donât feel right for context management.
- They are also struggling to get Gemini 2.5 Pro to work.
aider
hits git repo error: A member sawUnable to list files in git repo: Require 20 byte binary sha, got b'\xb9', len = 1
fromaider
.- MCP Tools Evaluated: Community members discussed what a good Model Control Policy (MCP) tool call model would be.
- It was suggested to review the Gorilla Leaderboard, trying Qwen3 8b, and flash-lite as options.
aider (Paul Gauthier) â· #questions-and-tips (7 messages):
OpenRouter DeepSeek 3.1 Configuration, Aider CLI Automation, Aider's Prompt Handling, Aider Context Degradation, conventions.md benefits
- DeepSeek Setup for Aider Reasoner: Unconfirmed: A member asked about configuring OpenRouterâs DeepSeek 3.1 as a reasoner for the main model and a non-reasoning version as a weak model within Aider, but itâs unconfirmed if this setup works.
- There was no confirmation or guide provided on how to achieve this specific configuration.
- Aider CLI Awaits Input, Unlike Claude: A member noted that when piping content to Aider, it waits for user input, whereas Claude CLI immediately starts editing files.
- The user inquired about automating Aider fully without human involvement, seeking a guide for such a setup.
- Aider Pipes Only First Line: When content is piped to Aider, it only reads the first line.
- To add
PROMPT.md
to Aider, it needs to be passed as an argument rather than piped.
- To add
conventions.md
location in prompt impacts performance: Usingconventions.md
with--read
places it near the top of the prompt, whereas including it in the message puts it near the bottom.- Due to U-shaped relevance in current prompts, placement at the top via
--read
may yield slightly better performance.
- Due to U-shaped relevance in current prompts, placement at the top via
- Aider context degrades >90k input tokens: A member finds that with Aider + Gemini Pro 2.5, context starts degrading around 90k-130k input tokens.
- It seems to work fine at the top before that range.
Moonshot AI (Kimi K-2) â· #general-chat (32 messagesđ„):
Imagen 4, Nano Banana, Kimi+, Z.AI Slides
- Imagen 4 fools Users: A user shared an image generated by Imagen 4, initially mistaking it for a real scene from a podcast, and praising its impressive quality.
- Another user noted that 2.5 flash image gen was nano banana and rolled out to Gemini app.
- Googleâs Nano Banana Image Gen & Imagen: A user mentioned that Google is not transparent about the usage of image generators such as nano banana and Imagen for marketing reasons.
- The user also linked to a Tweet about reasoning models, noting that CoT and RL do not create new capabilities.
- Kimi+ is Slides: Kimi+ seems to be a new category, with Slides as its first feature, initially available only to Chinese users.
- A user provided a summary, noting If you want it quickly, I guess Kimi is the way to go. If you want to go more complex, Z.AI is the way to go.
- Z.AI Slides vs Kimi Slides: A user finds Z.AI slides is just an HTML website, preferring an actual PPTX file.
- Another user agreed, mentioning the need for more control and rearranging options in Slides, also experiencing freezing issues with Z.AI.
- Overseas Platform: A user mentioned that the PPTX feature is currently available on Kimi+âs overseas platform and suggested to expand it for Twitter, TikTok and Instagram.
- A user suggested to expand PPTX to Twitter, TikTok and Instagram.
Yannick Kilcher â· #general (13 messagesđ„):
Hierarchical modeling, HNet layers, Parameter count, tokenizer-free approaches, Albert Gu's blog posts
- HNet Hierarchical Modeling: Untested Waters: The potential of higher-level hierarchical modeling with HNet remains untested in practice, although theoretically, it should extend beyond the original paperâs two layers due to residuals at each resolution level, similar to U-Net.
- One member mentioned that he was asked by a friend to test h-net 3 layers.
- HNet Compression Loss: Coefficient Reduction: In the HNet paper, the coefficient for compression loss was significantly reduced for the depth = 2 model compared to the depth = 1 model, implying that higher-level abstractions are almost the same as the depth=1 case.
- A member noted that almost all compute still flows in the actual main network, making it challenging to expand this to operate on abstractions spanning multiple sentences or documents.
- HNet Parameter Count: Maintaining Fairness: The decision to reduce the compression in HNet was likely to maintain fairness in parameter count and total compute when comparing HNets to other tokenizer-free approaches.
- One member noted that a more even parameter spread across granularity levels might be optimal if the chunking method works well.
- Deeper Abstractions: Simple Expansion Doubts: One member expressed doubt that deeper and more abstract embeddings could be achieved simply by changing a few lines of code, pointing out that the researchers have been working on this for over a year and wouldnât miss something this simple.
- The member suggested that if it actually worked, they would at least have an ablation on it or something.
- Albert Guâs Experiments: Publish Decision Factors: The research team had already conducted numerous experiments before deciding to publish their work.
- As one member mentioned, At some point maybe you want to publish, especially if youâve already used up your fair share of compute.
Yannick Kilcher â· #paper-discussion (9 messagesđ„):
Reasoning Tokens, LLM Reasoning Efficiency, Mutual Information, stopwords
- Thinking Tokens Improve Reasoning Efficiency?: The group discussed the paper Wait, We Donât Need to âWaitâ! Removing Thinking Tokens Improves Reasoning Efficiency, which suggests reasoning tokens can be removed to reduce token overhead with nominal effects on accuracy.
- Later a user mentioned that the second paper is probably more accurate, that âreasoningâ words seem to be skippable.
- Do LLMs Internalize Time?: An internâs experiment adding take your time to a CoT prompt with Llama 2 (+3) 7b (+13b) surprisingly increased reasoning time (generation took longer, trace was longer), without increasing accuracy.
- The user wondered if the LLM had somehow internalized a concept of âtimeâ and shared transformer-circuits.pub confirming LLMs do have some representations of time.
- Demystifying Reasoning Dynamics with Mutual Information: A user commented on the paper Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning, noting the cool observation about the MI of these tokens with the golden answer representation.
- The user thinks that the paper identified âhigh information regions of sentencesâ, (the first words after periods and commas) and also accidentally included a few stopwords, which leads them to misinterpret one of their results.
- Reasoning Tokens Boost Performance?: A user noted interesting parts of the Demystifying Reasoning Dynamics with Mutual Information paper relating to RR, where they refeed their reasoning tokens to layers a repeated time during inference for a boost in performance.
- The user believes I wouldnât be surprised if there was something to that. However maybe it isnât understood exactly what is happening.
Yannick Kilcher â· #ml-news (6 messages):
Claude Chrome, Keen Technologies LLM, Promptlock AI Ransomware
- Claude Chrome turns into Surveillance System: Anthropicâs Claude for Chrome introduces mandatory reporting requirements for AI programs.
- One member stated this effectively turns AI into a surveillance system.
- Keen Technologiesâ fringes of LLM Research: A member expressed disappointment that Keen Technologies isnât focusing on the fringes of LLM research that are making steps toward continual learning, and instead pushing pre-transformer RL tricks further as highlighted in this video.
- They suggested improving TTT (growable like TokenFormer, sparse/higher rank queries like UltraMem, able to flip between dynamic and fixed size like TransMamba) to achieve a continually-learning real-time Atari player.
- Promptlock: First AI-Powered Ransomware Emerges: The emergence of Promptlock, the first AI-powered ransomware, was noted, as detailed in this SecurityWeek article.
- Members expressed sadness about this development.
Manus.im Discord â· #general (14 messagesđ„):
Manus scheduled tasks and mail, Manus for enterprise research, Manus credits consumption issue, Support ticket delays
- Manus Mails Scheduled Tasks Together?: A member asked if scheduled tasks and mail can be used together in Manus, and another member clarified theyâre the same.
- Enterprises need Research Tool alternative to Manus: A member mentioned Manus is good at research and sought alternative tools for enterprises with compliance issues that prevent them from using Manus.
- Manus Credit Consumption Issues Plague Users: Several users reported that Manus used up their credits overnight due to repeated prompting without a response, referencing support ticket 1335.
- Delayed Support Frustrates Website Launch: A user reported contacting support via multiple channels (Help Centre, Reddit, Discord) for a week without a reply, delaying the launch of their permanent website, referencing support ticket 1334.
- A member shared a link to a Discord post and the team said letâs follow up in the ticket.
- Entrepreneurial Credit Needs go Unmet: A user noted that recent improvements to the service primarily benefit users who spend $200 a month, rather than entrepreneurs needing periodic credit increases.
- They expressed frustration at having to wait until September 21st to receive more credits, halting their projectâs progress.
DSPy â· #show-and-tell (1 messages):
Signatures, Modules, Abstractions, Blog Post
- Blogpost hails Signatures and Modules as good Abstractions: A member wrote a blog post explaining their views on why they think signatures and modules are great abstractions.
- Good Abstractions power!: The author shares that their thoughts are shaped coming from other frameworks, and felt it was worth covering in a dedicated blog post.
- They hope itâs useful to folks who are new!
DSPy â· #general (11 messagesđ„):
LiteLLM's Role in DSPy, OpenRouter vs LiteLLM, DSPy Dependency Bloat
- LiteLLM: Essential Dependency for DSPy?: A user inquired about alternatives to
litellm
within DSPy, suggesting a syntax likedspy["litellm"]
.- Another member responded that LiteLLMâs interface enables generic plugins from various LLM providers, including OpenRouter, considering it an essential dependency.
- OpenRouter Indirectly Uses LiteLLM: A member mentioned using OpenRouter and a proxy server utilizing LiteLLM, indicating an indirect dependency due to DSPy.
- The user questioned the necessity of LiteLLM as a direct dependency and inquired about its contribution to bloat.
- DSPy Dependency Bloat Investigated: One member inquired about what contributes to the bloat of LiteLLM, estimating its size at 9MB.
- Another member suggested using a CLI AI to crawl the codebase and analyze the dependencies, joking about Karpathy striking again.
Modular (Mojo đ„) â· #mojo (11 messagesđ„):
InlineArray vs StaticTuple, DPDK header binding, High-level API improvements, Deduplication of generated Mojo files, tsan compiler option
- InlineArray replaces StaticTuple for Efficiency: The use of InlineArray is back after initial seg fault issues, replacing StaticTuple for better memory layout in structs for both DPDK and Mujoco bindings.
- The user mentioned that this was probably a skills issue.
- Aggressive DPDK header binding strategy emerges: A member suggested focusing on
rte_*.h
headers within the installedinclude
folder for DPDK bindings, due to DPDKâs efforts to minimize dependencies.- The goal is to create comprehensive bindings by including all relevant headers while avoiding unnecessary ones.
- High-Level API Enhancements Prioritized for easier lib bindings: The next step is improving the high level API to make binding to different libs easier.
- A member plans to cut down the size of the generated mojo files by skipping unused code.
- Generated Mojo files deduplication considered: A member proposed using source annotations from
cpp
to deduplicate the generated Mojo files, aiming to reduce their size by removing unused code.- This involves analyzing annotations left by
cpp
to identify and eliminate redundant or unnecessary elements.
- This involves analyzing annotations left by
tsan
compiler option availability discussed: A member inquired about checking iftsan
(ThreadSanitizer) is enabled for the compiler when using the--sanitize thread
option.- Another member suggested passing
-DTSAN
to the compiler and usingenv_get_bool
fromparam_env
with@parameter if
as a workaround.
- Another member suggested passing
tinygrad (George Hotz) â· #general (3 messages):
tinygrad realize() PR, TestSetItemloop.test_range fusion, tinygrad GPT2 Training performance
- Realize() Removal Proposed: A pull request was added to remove
realize()
and fuseTestSetItemloop.test_range
into a single kernel in tinygrad#11870. - GPT2 Training on 7900xtx Sluggish: Training
llm.c/train_gpt2.py
appears slow on a 7900xtx, even with BEAM=5.- After tweaks to match nanogpt parameters, a member achieved 250ms per step at nanogpt size (batch size 64, 6 layers, 6 heads, 384 emb_dim, 256 seq_len), whereas Andrejâs nanogpt with rocm torch gets approximately 3ms per step with the default config.
LLM Agents (Berkeley MOOC) â· #mooc-questions (2 messages):
Google Docs Confirmation Emails, Mailing List Updates
- Google Docs Confirms Program Sign-Ups: Members are receiving confirmation emails from Google Docs after signing up for the program.
- The confirmation emails are successfully sent, but no other communication has been received yet.
- Mailing List to Provide Lecture Updates: The mailing list for providing updates about each lecture should be active soon.
- Users are advised to monitor the mailing list for future announcements and program updates.
MLOps @Chipro â· #events (1 messages):
AI Tools Introduction, Less Code Mindset, AI Prototyping, AI-powered Products, Tech History
- Simplicity Powers AI Prototypes: A session with Carlos Almeida on September 5th will cover the Less Code Mindset and how it empowers non-technical people to launch AI-powered products.
- Carlos will demo projects from Less Code Studio, showcasing how AI can dramatically cut the time from idea to working prototype, followed by an open Q&A.
- Portugal Founders Build Global-First Companies: Dick Hardt will join Pedro Sousa and Daniel Quintas on September 12th to discuss the past, present, and future of tech, and how AI tools are shaping the field in Portugal.
- The discussion will explore why he chose Lisbon, and how founders there can build global-first companies, also covering identity in the AI era and prototyping AI workflows.
Windsurf â· #announcements (1 messages):
Grok Code Fast 1, Windsurf announcement
- Grok Code Fast 1 Surfs into Windsurf!: Grok Code Fast 1 is now available in Windsurf and free for a limited time.
- Members are encouraged to share how they plan to use it for their next project, and a link to the announcement post was shared.
- Limited-Time Free Access for Grok Code Fast 1: Grok Code Fast 1 is being offered for free for a limited time on Windsurf, inviting users to integrate it into upcoming projects.
- An announcement post on X (formerly Twitter) provides further details about the offering, along with an attached promotional image.