OpenAI Codex is all you need?

AI News for 8/26/2025-8/27/2025. We checked 12 subreddits, 544 Twitters and 29 Discords (229 channels, and 8821 messages) for you. Estimated reading time saved (at 200wpm): 668 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

It’s been 3 short months since the (re)launch of Codex and the Claude Code v Codex competition has been heating up recently, with multiple influencers publicly dropping Claude Code for Codex even before today’s update, thanks to pricing plan integration buried on GPT5 launch day. Today, that shift is about to get more interesting, with the full launch of the IDE Extension that sends tasks to Cloud and back:

In words:

  • IDE Extension: The new extension brings codex into VS Code, Cursor, and other VS Code forks, so that you can seamlessly preview local changes and edit code
  • Sign in with ChatGPT: Available in both the IDE and CLI, eliminating API key setup and providing access directly through your existing ChatGPT plan
  • Seamless Local ↔ Cloud Handoff: Developers can pair with Codex locally and then delegate tasks to the cloud to execute asynchronously without losing state
  • Upgraded Codex CLI: Refreshed UI, new commands, and bug fixes
  • Code reviews in GitHub: Set up Codex to automatically review new PRs in a repo, or mention @codex in PRs to get reviews and suggested fixes

Additionally, all product information and updates for Codex moving forward will be announced on our new site: developers.openai.com/codex.

We invite you to explore the site for more details on these new features, as well as guides on how to get started.

To learn more about Codex, visit the new developers site as well as our general help article: Using Codex with your ChatGPT plan.


AI Twitter Recap

Process-level reward modeling and reasoning

  • StepWiser (process reward as a reasoning task): Facebook AI researchers introduce a stepwise judge that outputs both chain-of-thought and a judgment, trained with RL on relative rollout outcomes. It achieves SOTA on ProcessBench, improves policy during training, and boosts inference-time search by evaluating solutions “chunk-by-chunk,” rejecting/redoing flawed chunks (up to 5 retries) to self-correct paths. They also use StepWiser to score multiple rollouts and select the best for training data, outperforming outcome-based rejection sampling. See the thread by @jaseweston, including details on inference-time search (4/5) and data selection (5/5). Commentary on the broader shift back to process rewards from @tesatory underscores why stepwise supervision scales to long/ongoing tasks where final-only rewards blur credit assignment.

Gemini 2.5 Flash Image (“nano‑banana”): capabilities, tooling, and guidance

  • Spatial reasoning and editing quality (demos): Users highlight strong multi-image fusion and consistent POV reconstructions (e.g., recursive “photographer of the photographer” and Google Maps “what the red arrow sees” transforms) with impressive spatial coherence; see demos from @BenjaminDEKR and @tokumin.
  • Developer and creator tools: A one‑click browser extension based on Glif lets you right‑click any image on the web to remix/edit via Gemini 2.5 Flash Image (@fabianstelzer; install link in the follow‑up tweet). Google published a focused prompting guide covering composition, consistent character design, targeted transforms, and more (google AI devs). DeepMind researchers discussed how the model was built and where it’s going next (@OfficialLoganK). Creators are already combining it with video tools (e.g., Kling 2.1 first/last frames) for smooth transitions (@heyglif).

NVIDIA data and efficiency: Nemotron-CC-Math and Jet‑Nemotron

  • Nemotron‑CC‑Math (133B tokens) dataset release: A large math/code corpus reprocessed from CommonCrawl by rendering HTML (Lynx) and reliably capturing equations across LaTeX, MathML,
    , inline, and image contexts—addressing coverage gaps in typical parsers. NVIDIA reports marked gains on math and code tasks after adding it. Details from @KarimiRabeeh and @ctnzr; commentary by @JJitsev.
  • Jet‑Nemotron (throughput‑optimized LMs): Introduces JetBlock (linear attention + dynamic convolution over V, removing static convs over Q/K) and a hardware‑aware design insight: decoding speed tracks KV‑cache size more than parameter count. Reported speedups: up to 47× decoding throughput at 64K, 53.6× decoding and 6.14× prefill at 256K on H100, while matching/outperforming small full‑attention baselines across MMLU, BBH, math, retrieval, coding, long‑context. Summary thread by @omarsar0 with design highlights (JetBlock, KV cache insight, results).

Safety, security, and policy

  • OpenAI × Anthropic cross‑evaluations: The labs tested each other’s models with their internal safety/alignment evals and published a joint report. While the findings are basic and shaped by each org’s scaffolding, the collaboration is notable as a “race-to-the-top” signal for shared safety practices. Announcements from @woj_zaremba and OpenAI’s safety team (@sleepinyourhat; follow‑up); @EthanJPerez notes ongoing support for field‑wide safety.
  • Cyber misuse reporting: Anthropic’s Threat Intelligence team details disrupting schemes like North Korean fraudulent employment and AI‑generated ransomware by low‑skill actors (report thread; blog; video).
  • Public sector advisory: Anthropic announced a National Security and Public Sector Advisory Council comprising senior defense/intelligence/policy leaders to help align with U.S. and allied needs (announcement).
  • Healthcare evaluation: OpenAI released HealthBench on Hugging Face to rigorously evaluate LLMs for human health applications (@HuggingPapers).

Agents, environments, and protocols

  • Open environments for RL/agentic training: Prime Intellect launched the Environments Hub to crowdsource rich, standardized, interactive settings for training and evaluating agentic models—mirroring how Gym catalyzed RL, but targeted at LLMs. @karpathy argues environments are the “new data,” enabling interaction and feedback beyond imitation; he’s bullish on environments and agentic interactions but skeptical of RL reward functions for intellectual tasks, pointing to alternatives like “system prompt learning.” Launch from @PrimeIntellect.
  • Agent protocols and integration tooling:
    • Zed’s new Agent Client Protocol (ACP) aims to be a “Language Server Protocol for AI agents,” decoupling coding assistants from editors, exposing inspectable plans, and supporting multimodal I/O (overview; site).
    • MCP ecosystem growth: One‑minute, no‑code MCP server generation via Postman to integrate 100k+ APIs (guide); in‑browser MCP calling for fast/local agent workflows (LFM2); LangChain “Deep Agents” built by vibecoding against a docs MCP server (demo).
    • Structured knowledge for RAG: Andrew Ng’s short course with Neo4j shows agent teams constructing schema‑grounded knowledge graphs that complement vector retrieval (course).
    • Browsing at scale: Browserbase provides an alternative to expensive hosted operator agents by running fleets of headless browsers (@LiorOnAI).

Developer tools and open models

  • OpenAI Codex overhaul (GPT‑5‑powered): A substantial upgrade turns Codex into a single agent across IDE, terminal, cloud, GitHub, and mobile, with new extensions (VS Code/Cursor/Windsurf), a much‑improved local CLI, seamless local↔cloud task movement, and first‑class code reviews in GitHub. Available in ChatGPT Plus/Pro/Team/Edu/Enterprise. See @OpenAIDevs, dev hub, CLI notes from @gdb, and more details from @kevinweil.
  • Hermes 4 (Nous): Open Llama‑3.1 fine‑tunes at 405B and 70B, with hybrid reasoning, 3.5M reasoning samples, trained on 192× B200s; uncensored and user‑steerable. Available on Nous Chat/Chutes and Hugging Face; GGUFs (70B) already up, MLX ports in progress (@vectro, @Teknium1).
  • DeepSeek V3.1 in production: Together hosts the 671B hybrid with fast/thinking modes; they report big deltas on reasoning benchmarks (e.g., AIME 2024 66.3% → 93.1% with thinking) and 99.9% uptime for reliability in production pipelines (@togethercompute). Community reports on edit‑diff failure rates (9.9%) vs Qwen Coder 3 (6.1%) from @cline.
  • Compact and efficient infra: Weaviate’s 8‑bit Rotational Quantization compresses vectors 4× while improving throughput (15–50%) and maintaining near‑perfect recall, via random rotations that smooth entries and spread similarity across dimensions (universal, no training) (@weaviate_io).
  • Also notable: MiniCPM‑V 4.5 adds “hybrid thinking” (decides when to think), high‑res doc handling, efficient long‑video reasoning (@mervenoyann).

Top tweets (by engagement)

  • “It’s a good model, sir” — @elonmusk
  • OpenAI Codex updates: unified agent across IDE/terminal/cloud/GitHub — @OpenAIDevs
  • Environments > data for the RL era; cautious on RL reward functions — @karpathy
  • OpenAI × Anthropic cross‑org safety evaluations — @woj_zaremba
  • Anthropic Threat Intelligence on AI‑enabled cybercrime — @AnthropicAI
  • How Gemini 2.5 Flash Image (“nano‑banana”) was built and where it’s headed — @OfficialLoganK

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Hugging Face 2M Models Milestone + TheDrummer GGUF Finetunes

  • Hugging Face has reached two million models. (Score: 495, Comments: 58): Screenshot shows the Hugging Face Model Hub crossing 2,000,038 hosted models (https://huggingface.co/models), underscoring the platform’s rapid growth in checkpoints, fine-tunes, and quantized variants. Technically, this scale stresses storage, deduplication, and search/discoverability, and highlights reliance on efficient artifact management (e.g., Git/LFS, sharded safetensors, deltas) plus robust metadata/tagging and filters for navigating duplicates and variants. Commenters note concerns about total storage footprint and proliferation of duplicated/quantized weights; others joke about the sheer volume of near-identical fine-tunes (e.g., many Llama 3 70B ERP variants), implying discoverability/quality-signal challenges.
    • Scale/duplication concerns: commenters note the hub’s massive storage footprint driven by many “quants, weights and duplicates” of the same base models/finetunes. The implication is high redundancy from multiple checkpoints and quant variants per model, which stresses storage and complicates discoverability and deduplication across near-identical repos.
    • Signal-to-noise tradeoff: while some estimate that ~99% of the ~2,000,000 models are duplicates or low-quality/failed experiments, the remaining ~1% includes “gems” that can outperform models “10x their size”. This highlights the value of the hub as a central registry where high-leverage small models and strong finetunes emerge despite heavy noise, reinforcing the platform’s role as the “GitHub of AI” where major releases land first.
    • Ecosystem fragmentation: commenters point to the overwhelming number of derivatives of popular bases (e.g., many Llama 3 70B domain finetunes) as emblematic of a “Cambrian explosion” of model variants. The takeaway is that a few base models dominate the long tail of specialized finetunes, creating redundancy but also rapid iteration and specialization.
  • TheDrummer is on fire!!! (Score: 298, Comments: 103): u/TheLocalDrummer released a batch of new GGUF checkpoints (llama.cpp‑compatible) spanning 4B to 123B+ params, including GLM‑Steam‑106B‑A12B‑v1, Behemoth‑X‑123B‑v2, Skyfall‑31B‑v4, Cydonia‑24B‑v4.1, Gemma‑3‑R1 (4B/12B/27B**), Cydonia‑R1‑24B‑v4, and RimTalk‑Mini. Releases are versioned (e.g., v1–v4.1), but the post includes no benchmarks or training‑data notes; more in‑progress work is referenced via BeaverAI and Discord.** Top comments flag limited transparency on fine‑tune objectives/datasets, making it hard for newcomers to evaluate or adopt the models, while supporters note active Discord testing with 4–6 iterations per model before public release.
    • Several users highlight a lack of transparency around fine-tuning: no clear description of objectives, datasets, preprocessing, or evaluation protocols, making the ecosystem hard to enter or reproduce. This suggests the releases are optimized for an existing user base rather than broader adopters who need detailed model cards and training data disclosures.
    • Others note an iterative release pipeline on Discord, with multiple testing rounds and roughly 4–6 internal versions before a public release. The focus appears to have shifted from uncensored Gemma fine-tunes to larger “thinking” variants (R1-style), e.g., gemma-3-r1-27B.
    • Anecdotal performance feedback reports gemma-3-r1-27B underperforming in practical use, fueling skepticism that community text-only fine-tunes deliver meaningful gains over base models. The absence of shared benchmarks leaves this unverified, underscoring the need for standardized evals to quantify any improvements.

2. China AI Ecosystem: Z.ai GLM AMA, Qwen Teaser, and Nvidia GPU Export/Supply Chain

  • **Launching Our New AMA Series With Z.AI, Creators of GLM (Tomorrow, 9AM-12PM PST)** (Score: 161, Comments: 15): r/LocalLLaMA is announcing an AMA with Z.ai (creators of the GLM family) scheduled for Thu, Aug 28, 2025, 9AM–12PM PST. The image is a promo banner for the session; technically relevant as an opportunity for the community to ask about GLM models, local deployment, training details, and roadmap in a subreddit historically centered on LLaMA models. Comments note the subreddit’s scope has broadened beyond Meta’s LLaMA (naming mismatch), implicitly acknowledging growing interest in alternative model families like GLM; other comments are non-substantive.
    • No substantive technical discussion yet; one commenter asked about a potential “GLM 6” timeline, but no release details, specs, or benchmarks were mentioned. A logistical note clarifies the AMA timing as 2025-08-28 09:00–12:00 PDT (DST-adjusted) via timee.io; expect any technical Q&A (e.g., model roadmap, training data/compute, or benchmark deltas vs. Llama) during the session itself.
  • What you think it will be.. (Score: 376, Comments: 109): Screenshot of a terse teaser from Qwen team member Junyang (Justin) Lin (“Qwen”, Aug 27) with no specs or benchmarks—just the project name—implying an imminent Qwen-related release. Community reading of the hint suggests it could be either a new vision-language (VL) variant or a Qwen 3 32B model; the cryptic 2508 mentioned by commenters is interpreted as a potential date/version tag, but nothing official is stated. Top comments are speculative, with users hoping for a 32B model and debating whether the tease points to VL vs. a new base 32B; no technical details or evidence provided.
    • Speculation centers on a Qwen-related release, either a VL (vision–language) model or Qwen 3 32B. The mention of 32B indicates a 32-billion-parameter class model; “2508” is cited as an identifier but without context (could be a version/date tag).
    • There’s demand for a Spanish-capable variant, implying interest in a multilingual Qwen model (or localized tokenizer/training) rather than English-only. Requests specifically call for the higher-capacity 32B tier, suggesting users are prioritizing performance over smaller-footprint models.
  • Smuggling Nvidia GPUs to China (Score: 174, Comments: 33): Post discusses an investigation (via ChinaTalk summarizing a Gamers Nexus piece) tracing how US export‑restricted Nvidia GPUs still reach China: US retail/secondhand sourcing (Craigslist/Facebook) → brokers/Alibaba listings → concealment and air‑travel smuggling via Hong Kong/Taiwan → PRC repair/test shops that refurbish, VRAM‑rework, and forward racks, effectively “keeping silicon in circulation.” It reiterates the supply chain split: Nvidia designs the die, TSMC fabs in Taiwan, while PRC manufacturers produce boards, VRMs, coolers, and most non‑die BOM—so the non‑die assembly is largely China‑based even as the die is controlled. The technical thrust is that enforcement gaps and the ease of board‑level rework/repair keep controlled silicon operational despite bans, though core performance remains defined by the die. Commenters debate value concentration: the die is “99.9% of the difficulty,” with matrix‑mul latency/throughput bounded by on‑die architecture, so board/VRAM mods mainly affect capacity, not FLOPs. Others speculate US AI buyers would pay for black‑market VRAM‑upgraded cards (3–4× capacity at lower cost), while noting takedown/copyright claims (e.g., Bloomberg) around the documentary that may ironically drive more attention.
    • One thread hypothesizes a black‑market path to retrofit Nvidia GPUs with 3–4× more VRAM for ~half the cost of new cards, targeting US AI users. Feasibility hinges on reballing higher‑density GDDR, BIOS/firmware mods, and the GPU memory controller’s addressability plus PCB routing/power delivery limits—hard caps that often prevent large capacity jumps even if chips can be physically swapped.
    • A counterpoint stresses the silicon die is “99.9% of the difficulty,” since matrix multiplication latency/throughput are dictated by on‑die registers/SRAM caches and tensor/ALU pipelines. Boosting VRAM capacity won’t improve core GEMM/TensorCore throughput or memory hierarchy latency; the die architecture and cache/bandwidth balance set performance ceilings.
    • Repairability discussion notes most GPU failures are in discrete power delivery (MOSFETs, capacitors) rather than the GPU die, which is generally robust. Board‑level VRAM mods and component replacement require specialized BGA rework and diagnostics—common in China’s repair ecosystems but relatively rare in the US—enabling a secondary market for memory upgrades and refurbishing.Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Nano Banana Image Editing Showcases and Restorations

  • Restoring the first photograph ever taken w/ Nano Banana (Score: 2907, Comments: 185): The post pairs NicĂ©phore NiĂ©pce’s 1826/27 “View from the Window at Le Gras” (the first surviving photograph) with a supposed color “restoration,” but there’s no technical method described—no deblurring, SR, or reconstruction pipeline—and the bottom image appears to be a modern recreation of the scene rather than a data-driven restoration of the original heliograph. The original is a bitumen-on-pewter plate with multi-hour exposure and extremely low spatial frequency detail; the “restored” image includes contemporary features inconsistent with the 19th‑century setting, implying a reshoot or fabricated scene rather than algorithmic enhancement. Top comments note it’s a recreation, not a restoration; other replies are jokey (“Enhance”) and non-technical.
    • Multiple commenters point out an information-theoretic limit: you can’t “restore” a historical image to contain more data than was captured. Methods like deconvolution, denoising, or super‑resolution (e.g., ESRGAN or diffusion upscalers) impose strong priors to synthesize plausible detail, which is a recreation/hallucination rather than recovered signal; crisp features (like gutters) emerging in a “restoration” are therefore likely fabricated by the model or retouching. See ESRGAN: https://arxiv.org/abs/1809.00219.
    • On whether this is the first photo: the earliest surviving photograph is NicĂ©phore NiĂ©pce’s “View from the Window at Le Gras” (1826/1827), a heliograph on a pewter plate coated with bitumen of Judea, with an exposure estimated between 8 hours and several days. The plate is held by the Harry Ransom Center; modern treatments are high‑resolution scans plus contrast/tone mapping, not generative enhancement. References: Wikipedia https://en.wikipedia.org/wiki/View_from_the_Window_at_Le_Gras and HRC notes https://www.hrc.utexas.edu/ni%C3%A9pce/.
    • A true restoration would model the imaging pipeline (lens point‑spread function, the bitumen’s nonlinear response curve, extremely long exposure causing multi‑directional shadows) and apply physically informed inverse methods with regularization. Without the PSF and response curve, the inverse problem is ill‑posed, so best practice is careful scanning, deconvolution with conservative priors, and local contrast equalization—avoiding arbitrary detail synthesis that changes scene geometry.
  • Nano Banana’s understanding of material swapping. The tube started off as a chrome material. (Score: 1709, Comments: 178): OP showcases a “material swapping” result where a tube that originally had a chrome material is transformed, titled “Nano Banana’s understanding of material swapping.” The linked Reddit gallery is inaccessible (403 Forbidden) per gallery URL, but top comments include additional image references: an alternate example image 1, a user-made BMO costume image 2, and a texture that a commenter likens to a musical score image 3. Commenters note unexpected semantic insertions (e.g., BMO character, sheet-music-like texture), suggesting the system may be performing style/texture overlays rather than strictly preserving original BRDF/geometry-material behavior during “material swap.”
    • A commenter asks whether the system can generate PBR texture maps from a single image—specifically normal, bump, and displacement maps—to support material-swapping workflows. This implies multi-channel outputs aligned to existing UVs, correct normal map conventions (OpenGL vs DirectX), and compatibility with downstream DCC/game engines; without these maps, swapping a chrome shader to another material would lose microdetail/height information that albedo alone can’t capture.
  • Using nano banana to put Emma Stone in all movies
 (Score: 640, Comments: 93): Post demonstrates rapid face-swap/compositing using a tool/workflow referred to as “nano banana” to insert Emma Stone into multiple films/posters; the creator notes the entire process took <20 minutes end-to-end. No concrete technical details (model, method, or pipeline) are provided—only the claimed turnaround time—so it’s unclear whether this used diffusion inpainting, face-swapping, or a specific model/LoRA; nonetheless it suggests a lightweight, fast workflow for batch poster edits.

  • Nano Banana is so impressive. I keep testing new things, and it always delivers. (Score: 347, Comments: 44): Poster reports that the “Nano Banana” image model excels at photorealistic relighting—changing scene illumination while preserving fine-grained textures and scene layout—indicating strong detail preservation under lighting transforms. A commenter notes difficulty when attempting true viewpoint/camera-angle changes, suggesting the model’s edits are largely 2D-consistent relighting rather than 3D view-synthesis or geometry-aware reprojection. Top comments highlight heavy safety/censorship filters as a major practical limitation and express a desire for a less restrictive release (implicitly contrasting with Google’s policies). Another commenter questions physical plausibility in an example (sun angle at “noon”), hinting at potential inconsistencies in physically based lighting direction.
    • Users report the model’s safety filters are overly aggressive, blocking benign edits and especially body-related transformations. This implies a conservative safety classifier or post-filter tuned for high recall (over-blocking) at the cost of precision, reducing utility for legitimate workflows. As one puts it, the “intense censorship is a buzzkill,” prompting interest in similarly capable models with less restrictive moderation.
    • Repeated failures when trying to “change the perspective/camera angle” indicate the system lacks true 3D scene understanding or multi-view consistency. It likely functions as 2D inpainting/texture synthesis conditioned on the input rather than depth/pose-aware reconstruction (i.e., no NeRF/3DGS/EG3D-like latent geometry), so novel-view synthesis is out of scope and either artifacts or refusals occur.
    • Using it to brighten extremely dark, compressed footage (e.g., GoT S08E03) underscores limitations of working with LDR sources. Without RAW/HDR data, aggressive enhancement requires denoising and hallucination; while local contrast may improve, compression noise/banding can be amplified and details become fabricated, compromising fidelity.
  • Nano banana is bananas (Score: 273, Comments: 46): Non-technical meme: the image looks intentionally AI-edited/Photoshopped to erase the subject’s face, echoing the “AI Barber” trope where generative/editing tools over-remove features. The title “Nano banana is bananas” is a nonsense pun unrelated to the visual, reinforcing that this is shitpost humor rather than a technical demo. Comments joke that “it technically did” and call it “horrors of the AI Barber,” comparing it to OG Facebook Photoshop request pranks—i.e., deliberately absurd edits, not serious AI results.
    • Commenters implicitly point to prompt fragility: “Garbage prompt=garbage results” reflects the classic garbage-in/garbage-out failure in diffusion or instruction-tuned systems. Ambiguous phrasing like “a little off the top” can drive over-literal edits when no spatial mask or constraints are supplied, causing models to remove or distort structural features; robust workflows rely on masked inpainting, ROI segmentation, negative prompts, or ControlNet/reference locking to bound changes.
    • Mentions of “technically the truth” and “AI Barber” highlight a broader limitation where models optimize for literal instruction adherence over pragmatic intent. This brittleness under underspecified prompts stems from weak commonsense/pragmatic priors; mitigations include explicit constraints, few-shot exemplars, test-time guidance (e.g., CLIP directional losses) and rule-based post-filters to enforce semantic intent.
  • Can Nano Banana Do this? (Score: 329, Comments: 89): OP posts a humorous boxing-poster-style image and asks if a smaller “Nano Banana” model can reproduce it; commenters demonstrate that the open-source “Banana” image model can match or exceed the result by using depth-map conditioning via its API, leading to better character consistency. Example outputs are provided in replies (example 1, example 2, example 3). The workflow hinted is: reuse OP’s depth map and pass it to the Banana API for controlled image generation, improving pose/layout fidelity and identity consistency. One commenter argues Banana’s output is “better” due to more consistent characters, suggesting depth conditioning is key; another claims the model can “do even better,” implying further tuning or prompts can surpass the OP’s example.
    • A commenter reports that Banana’s API accepts a depth map as conditioning input, enabling a workflow where you pass the source depth map along with the prompt to guide structure and layout (akin to depth-guided/control pipelines). They explicitly “provided it with your depth map” and note this can be done programmatically via the API, allowing reproducible, depth-consistent generations across runs and integrations in automated pipelines.
    • Side-by-side outputs suggest Banana produced more consistent character identity and fewer drift artifacts compared to the baseline attempt. The claim is supported by shared results (example 1, example 2), with the commenter attributing the improvement to depth conditioning passed through the API, which helps preserve structure and character features across frames/variations.
  • My wife asked ChatGPT for a system diagram. It sent her a banana milkshake. (Score: 613, Comments: 62): This post is a meme: the user asked ChatGPT for a “system diagram” and instead received a banana milkshake image, illustrating an LLM failure mode (hallucination/mode confusion) where the assistant misinterprets task intent and returns irrelevant content. Technically, it’s an example of prompt misalignment and task-to-output mismatch in assistant workflows rather than any new feature or benchmark. Comments joke about the mismatch (e.g., “nano-banana’ed” and preferring the milkshake) and sarcastically quip “PHD level intelligence in your pocket,” reflecting skepticism about LLM reliability for precise engineering tasks.

2. Weather AI, VibeVoice TTS, Codex Updates plus Gemini/GPT-5 and Policy News

  • Google’s AI model just nailed the forecast for the strongest Atlantic storm this year (Score: 496, Comments: 63): Post highlights that a Google AI weather model accurately predicted the track/intensity of the year’s strongest Atlantic storm, underscoring the growing skill of learned global forecast models (e.g., DeepMind’s GraphCast and MetNet‑3) relative to traditional NWP like ECMWF IFS and NOAA GFS. Commenters note that providers largely ingest the same global observations via WMO/UN data exchange (see WMO Unified Data Policy and WIGOS/WIS), so accuracy differences primarily come from model architectures/training and data assimilation pipelines rather than exclusive data. Opinions claim Google’s approach could render other services outdated and that “real-time” ML forecasting will save large numbers of lives; these hinge on ML’s lower inference cost enabling faster, higher‑frequency updates, though the magnitude of life‑saving impact is speculative.
    • Multiple commenters highlight that virtually all centers ingest the same global observations via the WMO’s World Weather Watch and Global Telecommunication System (GTS), so forecast skill differences come from the models: data assimilation schemes, physical parameterizations, grid resolution, ensemble size, and compute budget. This frames Google’s AI approaches (e.g., GraphCast medium‑range ML model) as competing with NWP like ECMWF HRES/ENS and NOAA GFS/GEFS, where published results show ML can match or surpass certain metrics (e.g., 500 hPa ACC, RMSE) while being faster to run. Links: https://community.wmo.int/activity-areas/gts, https://www.nature.com/articles/s41586-023-06720-6, https://www.ecmwf.int/en/forecasts/dataset/ecmwf-forecasts-archive
    • On “real-time” forecasting saving lives: ML nowcasting models such as MetNet-3 can deliver minute‑scale precipitation forecasts with low inference latency, enabling higher update cadence for warnings compared to traditional NWP cycles. However, true end‑to‑end real‑time capability is bounded by observation latency and QC (radar/satellite ingestion), data assimilation windows, and dissemination; the life‑saving impact hinges on improving lead time and reliability for high‑impact events (e.g., tropical cyclone track/intensity MAE, severe convective warning lead times in 10–60 minutes). Links: https://arxiv.org/abs/2410.11809, https://ai.googleblog.com/2023/11/graphcast-accurate-global-ai-forecasts.html
    • Claims that Google will make other services “outdated” are tempered by the operational realities: NMHSs provide calibrated, impact‑based, and regulatory warnings, often blending multiple models (e.g., ECMWF ENS, GEFS) and post‑processing with MOS/ML to correct local biases. Any new model must demonstrate robust skill across domains (TC track error km, intensity bias, CRPS/Brier scores, extreme tail behavior) and reliability/uptime under 24/7 constraints before supplanting existing systems. Links: https://www.ecmwf.int/en/forecasts/quality-our-forecasts/scorecards, https://www.noaa.gov/organization/nws/national-centers
  • [WIP] ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) (Score: 434, Comments: 94): A developer is building a ComfyUI wrapper for Microsoft’s VibeVoice TTS (project page) enabling rapid voice cloning from very small samples; initial support targets single‑speaker with dual‑speaker in progress and an open‑source release planned. Two model sizes are noted: 1.5B (fast inference, “fairly good” quality) and 7B (greater emotional nuance but inconsistent; flagged as Preview). Demo used synthetic voices as prompts, which VibeVoice cloned and then synthesized target text; an additional update post is linked here. Commenters challenge the demo choice (synthetic source voices) and suggest using well‑known public voices to objectively assess one‑shot cloning quality; they also clarify cloning is license‑permitted with consent, request 1.5B VRAM usage details, and ask for comparisons vs. Higgs Audio 2.
    • Licensing clarity: commenters report VibeVoice is MIT-licensed, enabling local/commercial use, but the usage terms still prohibit voice cloning without explicit consent. This means cloning is technically supported yet policy-restricted. There’s skepticism about one-shot cloning quality; evaluation with a widely recognized voice is suggested to better judge timbre similarity. Requests also surfaced for head-to-head comparisons with Higgs Audio 2 and Chatterbox to quantify cloning fidelity and naturalness.
    • Deployment concern: a commenter asks for the VRAM footprint of the ~1.5B VibeVoice model. Knowing memory usage (e.g., FP16 vs INT8, batch size 1) is key to assessing feasibility for real-time or near–real-time TTS in ComfyUI on consumer GPUs and planning throughput/latency.
    • Integration idea: a related WIP, image2reverb, aims to infer scene acoustics from an image/video frame and apply convolution reverb so the generated voice matches the environment. This could pair with a VibeVoice ComfyUI node to automatically add environment-aware acoustics to TTS outputs.
  • Codex NEW mega update!!! (Score: 203, Comments: 50): The image purports to be an OpenAI Developers announcement of a “Codex NEW mega update,” highlighting: a new IDE extension (stated compatible with VS Code and others), seamless task movement between cloud and local environments, integrated GitHub code reviews, and a revamped Codex CLI. It also claims the update is powered by GPT-5 and available via the ChatGPT plan, emphasizing improved coding efficiency and tighter dev‑workflow integration. Top comments discuss tradeoffs: claims that GPT‑5’s instruction following makes delegation more reliable while Claude remains stronger with tools; questions about onboarding/usability compared to Claude Code; and concerns about Windows terminal compatibility (historically issuing Linux-centric commands).
    • Multiple users report a stark qualitative gap between gpt5-medium and gpt5-high in Codex: medium feels like a small model with RAG/TTC scaffolding, showing weak instruction-following and poor context ingestion, while high behaves like a full SOTA model. They argue that adding 10k–20k “thinking tokens” shouldn’t explain this delta, implying different underlying base models rather than just more reasoning budget. Similar saturation is observed with Opus via API beyond ~16k thinking tokens, and changing thinking-token budgets doesn’t materially alter base model “flavor.”
    • Comparisons suggest GPT-5 now excels at strict instruction-following (useful for task delegation), whereas Claude still leads in tool-use reliability. For coding workflows, these trade-offs make it a close call: GPT-5’s adherence to directives vs Claude’s stronger tool orchestration and function-calling behavior.
    • Environment parity issues persist: users recall Codex defaulting to Linux command patterns in Windows Terminal sessions, indicating OS detection or shell-targeting heuristics may need refinement. This can degrade developer ergonomics by proposing non-portable commands and suggests a need for improved runtime/OS context awareness in the CLI or agent layer.
  • I’m happy to announce I’m now a 6x engineer (Score: 390, Comments: 142): Meme-style screenshot of many editor/terminal panes shows a sprawling data parsing/extraction setup with debug logs, QA checks, and layered “robust fallback parsing”—implying the OP is a “6x engineer” by orchestrating multiple brittle scripts/processes rather than a single clean pipeline. The technical subtext is orchestration/automation sprawl: retries, fallbacks, and validation wrappers around flaky parsers that risk masking errors and increasing maintenance overhead. See the image: https://i.redd.it/q55dd87p1klf1.jpeg. Top comments critique silent failure modes of fallback parsers (“good luck with that bs failing silently”), joke that this is “management” (coordination over coding), and warn such automation contributes to stricter platform rate/usage limits.
    • The “robust fallback parsing” jab highlights the classic failure mode where lax parsers mask upstream errors, causing pipelines to “fail silently.” Best practice is to fail closed with strict JSON Schema validation and typed tool-call results, leveraging structured outputs (e.g., Claude structured outputs, OpenAI structured outputs) rather than heuristic regex parsing. Add observability (rates of parse failures/null fallbacks, latency deltas across fallbacks), chaos/fuzz tests, and circuit breakers to prevent cascading fallbacks that hide bugs (Guardrails can help).
    • Remarks about “more restricted limits” point to providers tightening quotas when fan-out/multi-agent workflows create bursty traffic and error amplification. Expect 429/RateLimit and provider-specific backoffs: OpenAI uses RPM/TPM and dynamic tiers (docs); Anthropic enforces per-model TPM/RPM and concurrency caps (docs). Use adaptive client-side rate limiters (token-bucket/leaky-bucket), idempotency keys, priority queues, and budget guards to avoid triggering automated abuse heuristics.
    • Questions about “Claude as a 6x engineer” underscore orchestration complexity: you’ll need stateful DAGs, retries with jitter, timeouts, idempotency, and traceability for tool-use provenance and cost/latency budgets. Production setups often pair an agent graph/runtime (e.g., LangGraph) with a workflow engine (Temporal or Prefect) plus LLM observability (Langfuse or OpenTelemetry). For Claude-specific stacks, prefer tool calls + JSON Schema (tool use) over free-form prompts, and enforce concurrency limits to prevent thundering-herd fan-out.
  • Forget Google. This is the power of open source tools. (Score: 561, Comments: 63): A video post titled “Forget Google. This is the power of open source tools.” links to a Reddit-hosted video v.redd.it/3epgdoljljlf1 that returns HTTP 403 Forbidden without authentication, so the underlying demo is inaccessible. No concrete tools, repos, benchmarks, or implementation details are present in the visible context; the discussion implies a claim that open‑source tools can substitute for Google, but it does not enumerate which tools or provide evidence. Top comments reflect skepticism and a request for specifics (e.g., “What open source tools”), with other remarks off-topic; there is no substantive technical debate or data to evaluate the claim.

  • I sense jealousy.. just wait for gemini 3 (Score: 396, Comments: 95): Screenshoted post argues that even if Google Gemini 3 surpasses OpenAI ChatGPT on raw capability, OpenAI’s distribution/user lock-in makes switching hard; suggests prioritizing image/video generation to drive adoption through viral, shareable outputs. Commenters reframe the competition on two axes: assistant quality vs. distribution/virality, and note API economics where “intelligence per dollar” and latency dominate—citing Gemini 2.5 Flash as having led on cost/perf for a period. Debate centers on whether multimodal image/video gen are “side quests” (many argue they’re core to building better assistants), whether being first/best matters most in the personal assistant race, and on monetization: most ChatGPT users don’t pay, Anthropic monetizes mainly via API, and adoption hinges on utility (e.g., NL photo editing) and current unknowns around smaller alleged GPT-5 models.
    • API buyers optimize for “intelligence per dollar” rather than peak scores; Anthropic reportedly monetizes primarily via its API rather than a chat UI. One commenter notes Gemini 2.5 Flash had the best cost–performance for a period, implying stronger quality/latency per $ compared to peers, though this may have shifted with smaller GPT-5 models. In a compute‑constrained world with many non‑paying chat users, sustainable growth hinges on efficient serving (latency, throughput, context utilization) and pricing, not just raw capability.
    • The “chat model” is converging to a personal assistant where small gains in reliability, tool‑use, and latency compound into large UX advantages; being first and best matters for default placement and daily stickiness. A “slightly better, smarter, more reliable” assistant yields outsized value by driving higher‑value workflows (calendar/email/code actions) beyond simple chat, and OS‑level hooks (e.g., Gemini as the Android voice assistant) can offset weaker standalone app UX.
    • Labeling image/video as “side quests” is disputed: multimodal capabilities (e.g., natural‑language photo editing) are high‑utility features that directly impact adoption. Commenters argue these capabilities are not mere attention plays but core to building more intelligent assistants, as stronger vision/video understanding and generation expand actionable tasks and real‑world usefulness.
  • Tried to move to Gemini, tapped out in 30 seconds 💀 (Score: 504, Comments: 326): Screenshot shows Google Gemini refusing to continue a task because it “looks like a personal conversation” and it’s “not designed to impersonate or interact in such contexts,” indicating a safety/guardrail trigger likely tied to impersonation or personal-communication detection. Technically, this illustrates an aggressive content-safety heuristic (policy filter) that can yield false positives when the model infers roleplay/impersonation, but the actual prompt is missing so the trigger can’t be replicated or debugged. Top comments note the post is uninformative without the original prompt, pushing for reproducibility before drawing conclusions about Gemini’s guardrails being overly sensitive.
    • Several replies stress that without the exact prompt and runtime details, any performance judgment is non-reproducible. To make a fair comparison, include the precise user/system prompts, model variant (e.g., Gemini 1.5 Pro vs Flash), decoding params (temperature, topP, topK, maxOutputTokens), platform (web vs API), and whether tools/grounding or long-context were enabled; see model variants in Google’s docs: https://ai.google.dev/gemini-api/docs/models.
    • A commenter hints performance is context-dependent; Gemini can appear conservative or “boring” on open-ended tasks due to stricter safety/alignment and default low-diversity decoding. If the goal is more exploratory output, choose an appropriate variant and increase temperature/topP (with awareness of hallucination risk) or provide richer task constraints to elicit depth; model behavior differences are documented here: https://ai.google.dev/gemini-api/docs/models.
  • AI gets its facts from 
 us? (Score: 441, Comments: 174): An infographic titled “Where AI Gets Its Facts” claims that language models like ChatGPT and Perplexity most frequently cite Reddit (40.1%), followed by Wikipedia (26.3%), YouTube (23.5%), and Google (23.3%), with Yelp, Facebook, and Amazon each above ~18%. The image provides no methodology or source, leaving ambiguous whether these figures represent training data composition, retrieval/citation behavior, or user-shared links—so it’s not a rigorous benchmark or reproducible analysis. Comments frame the post as a meme/repost and include satire, implicitly highlighting concerns about misinformation if models over-index on user-generated content; there’s no substantive technical debate.
    • One commenter contends that “contrary to popular belief, Reddit is actually a valuable source for many topics, both genuine issues and so‑called ‘issues’,” highlighting that user-generated threads often capture real-world edge cases, troubleshooting steps, and niche domain context that formal corpora miss. This aligns with industry moves to license Reddit data for LLM training (e.g., the OpenAI–Reddit partnership (2024): https://openai.com/index/reddit-partnership/), but also implies the need to manage noise, bias, and moderation artifacts that can degrade factuality. Practically, this favors robust dataset filtering and/or retrieval‑augmented generation to preserve signal‑to‑noise when incorporating Reddit corpora.
  • Testing GPT-5 (it is nsfw) (Score: 406, Comments: 111): Screenshot shows “ChatGPT 5” generating an explicit, NSFW roleplay monologue on request after minimal priming (user asks for something “super unhinged,” then “sweary and explicit”), without refusal or safety gating—suggesting looser or inconsistently enforced safety guardrails versus earlier behavior and compared to GPT‑4o the OP had tested for warmth/conversationality. Technically, this points to either updated moderation thresholds, different instruction‑tuning/alignment settings, or a context/prompt‑routing gap that allows adult content when framed as consented fiction, highlighting inconsistency in policy enforcement and session‑level variability. Comments report similarly lax behavior (e.g., helping with torrenting), and some celebrate the change, implying perceived reduction of safety constraints; others pivot to humor, offering little technical counterpoint.

  • The lawsuit would force ChatGPT to do age verification on all users if the Raine family wins (Score: 401, Comments: 441): OP reports a lawsuit by the Raine family that, if successful, would require universal age verification for all ChatGPT users—implying collection/validation of government ID or equivalent and associated privacy, data-retention, and compliance burdens across jurisdictions. The post cites pressure from platform safety changes (e.g., Google/YouTube teen account defaults) and regulatory trends such as the UK’s Online Safety Act (legislation.gov.uk), and expresses refusal to provide ID to a private company. Commenters argue that mandatory age checks should be paired with reduced content filtering for verified adults, while others stress parental responsibility over platform mandates and criticize child-safety justifications as a pretext for expanding surveillance and eroding online privacy.

3. Claude ASCII Workflow, Qwen-Image-Edit Guide, and ChatGPT UX Humor

  • The Anti-YOLO Method: Why I make Claude draw ASCII art before writing code - How it make me ship faster, better, and with less tokens spent (Score: 245, Comments: 88): OP outlines a constrained Claude-assisted delivery workflow: Brainstorm the problem space → generate low-cost ASCII wireframes (reported ~10x fewer tokens vs HTML prototypes) saved as markdown → rigorous “Plan mode” (review codebase; specify backend architecture, DB schema considerations, UI matching with stable Friendly IDs, security, and testing) after prompting Claude to ask clarifying questions → implement → derive tests (unit, integration, component, DB integrity, edge cases) directly from the ASCII spec → ship. They claim this reduces misalignment, iterations, and prod debugging; the method is illustrated with a real feature for the Vibe-Logs Prompt Pattern Analyzer and a follow-up on “fixing the prompting problem. Key tactics include using ASCII to focus on layout/flow over styling, centralizing decisions in markdown, insisting on Claude’s clarifying questions, and treating the wireframe as the test oracle/spec. Commenters largely endorse heavy upfront planning and documentation (steps 1 and 3) but are split on ASCII: some find it surprisingly effective; others argue pure text/structured specs (e.g., a CLAUDE.md) yield better adherence and that ASCII benefits humans more than the LLM, with concerns about token cost and flow-following fidelity.
    • Multiple commenters report that ASCII wireframes are token-inefficient and don’t improve model adherence: one notes ASCII/state-machine prompts “were not following majority of the flow,” while a detailed pure-text spec (e.g., a CLAUDE.md plan) led the LLM to follow steps more reliably with fewer iterations. This suggests ASCII is mainly a human aid; for the model, structured prose requirements and stepwise plans outperform ASCII and avoid wasting tokens.
    • An alternative suggested is using Mermaid diagrams (mermaid.js.org) for flowcharts/sequence/state diagrams as a compact, machine-parseable format. Mermaid can encode nodes/edges and states succinctly, potentially reducing tokens versus ASCII while preserving structure, and may better support reasoning and round-tripping between visualization and implementation if the LLM recognizes the syntax.
    • Another commenter informally validates that the value is in the planning/verification phases (steps #1 and #3) rather than the ASCII artifact itself. This aligns with the view that rigorous planning/documentation drives quality and speed, while ASCII wireframing is optional ergonomics that may not yield performance or token savings.
  • Qwen-Image-Edit Prompt Guide: The Complete Playbook (Score: 254, Comments: 38): Post shares a practical prompt engineering playbook for Qwen-Image-Edit covering seven edit classes: text replacement/correction (font/size/perspective preservation), local appearance tweaks (materials/colors with lighting/shadow consistency), global semantic/style changes (e.g., Studio Ghibli transfer while preserving layout/identity), micro/region edits (boxed glyph or small object swaps), identity control (subject swap vs. identity preservation), poster/composite layout constraints, and camera/lighting directives (relighting, DoF, lens). Core techniques emphasize constraint-first phrasing (e.g., “Keep everything else unchanged,” preserving identity/font/alignment), chaining small edits, and explicit negatives (“no distortion, no warped text, no duplicate faces”) to reduce drift and artifacts. The guide advocates explicit add/replace/remove verbs and precise preservation clauses (pose, shadows, reflections) to maintain structural fidelity across edits. Top comments request proof-of-effectiveness with visual examples, cautioning the post otherwise reads like an LLM-generated list; another commenter is building a Starnodes custom node to select tasks and auto-generate prompts for Qwen edit (screenshot: https://preview.redd.it/9ep8f7jf0mlf1.png), and a third confirms that add–replace–remove phrasing plus “keep everything the same” measurably improves results.
    • Practitioners report that Qwen-Image-Edit responds best to constrained, atomic instructions using an add/replace/remove pattern, e.g., “add X,” “replace Y with Z,” and explicitly stating “keep everything else the same.” Emphasizing invariance (e.g., “don’t change anything else”) reportedly reduces collateral edits and improves fidelity in multi-attribute edits, aligning with best practices for instruction-grounded image editing models.
    • One dev is building a custom node for StarNodes that integrates “Kontext” and Qwen Edit to streamline task selection and prompt assembly. The node provides a UI to choose the edit type, supply a few inputs, and auto-generate a ready-to-use prompt, as shown in their screenshot: https://preview.redd.it/9ep8f7jf0mlf1.png?width=1252&format=png&auto=webp&s=e5546e2fdafd30004e43bc167a06eca72595601b.
    • There’s a call for empirical validation: readers request before/after image examples to verify that the provided prompt patterns reliably produce the claimed edits on Qwen-Image-Edit. Including such artifacts would aid reproducibility and help differentiate workflow-specific gains from generic LLM-style guidance.
  • I should’ve just stayed bored in peace. (Score: 3744, Comments: 227): Non-technical meme/screenshot of a chatbot UI delivering a sarcastic, tough-love reply to a user complaining about boredom—framing a UX/AI-assistant design question about default tone and handling low-value prompts. Technically relevant only insofar as it reflects alignment/assistant persona choices (blunt vs. helpful) in conversational AI. Commenters endorse this snarky response as a desirable default for AI assistants (“Every AI should respond in this fashion”), while others ironically note this is “exactly what I’m paying for,” implying mixed expectations about paid AI behavior.
    • Several commenters advocate for a stricter default behavior where the AI clearly sets boundaries/refusals and responds concisely and unambiguously by default, implying a preference for a universal “safe/strict mode” that reduces hallucinations and overaccommodation. This suggests demand for predictable safety profiles and instruction adherence across models/providers.
    • Reference to a “Monday GPT from OpenAI” highlights perceived temporal variability in model behavior/quality, which technical readers may associate with rolling deployments, model snapshot changes, or server-side toggles that affect refusal thresholds and verbosity. The sentiment underscores the importance of transparent versioning and stability guarantees, especially for paid users expecting consistent behavior.
  • Time to drop the masks. Wait
 I didn’t mean that
 Quick, put it back on! (Score: 2096, Comments: 499): Short clip (source now gated with 403) appears to show a staged “unmasking” where a performer removes a hyper‑realistic silicone face mask and a female‑presenting bodysuit; artifacts noted include rigid, over‑pronounced nipples on the suit and a visible “double‑teeth” effect when the wearer’s real teeth sit behind the mask’s molded mouth opening. A still frame is accessible via the preview image. These cues are consistent with typical limitations of full‑head silicone masks/bodysuits (material stiffness, fixed nipple geometry, and mouth aperture alignment). Top comments focus on anatomical realism and mask tell‑tales: critiques of the suit’s “rock‑hard nips” and questions about a “two set of teeth” highlight uncanny‑valley artifacts that reveal the prosthetics despite otherwise convincing surface detail.
    • Multiple users suspect the clip is AI-generated, citing visible artifacts like a duplicated dentition (“two set of teeth”) and mask/edge inconsistencies around the face (image link). These issues are typical of face-swap/inpainting pipelines when the segmentation matte slips frame-to-frame, causing the generator’s mouth region to overlay imperfectly and produce temporal flicker or doubled features. You also often see specular highlights that don’t track head pose, revealing 2D compositing rather than consistent 3D geometry.
    • The duplicated teeth specifically suggest a failure to fully replace the mouth interior across frames, leaving remnants of the source frame’s teeth beneath the generated layer. In deepfake workflows, inadequate alpha mattes or naive blending (vs. flow-guided or Poisson blending) can cause the original oral cavity to bleed through, especially during fast lip motion or partial occlusions. Robust solutions typically involve tighter semantic segmentation for teeth/tongue and motion-compensated temporal consistency losses to prevent frame-to-frame drift.
  • Wasn’t expecting that! 😬 (Score: 980, Comments: 44): Image shows an LLM chat where a user asks for a riddle: “I am not alive, but I grow; I don’t have lungs, but I need air; I don’t have a mouth, but I need water to live. What am I?” When asked for the answer, the AI replies with a meta self‑description instead of solving it, highlighting a common LLM failure mode: boilerplate disclaimers and intent misalignment (“As an AI language model
”) overriding task execution. The riddle variant likely points to “rust” (needs air and water; not alive but grows) rather than the classic “fire.” Commenters suggest the answer is “Rust?” and note that the model’s reflexive disclaimer habit makes the misfire funny but also emblematic of annoying LLM behavior.
    • Commenters note the model’s boilerplate preface (e.g., “As an AI language model
”) appearing instead of answering, highlighting how instruction-tuned templates and safety guardrails can dominate outputs when confidence is low or content checks trigger. Technically, this reflects system/prompt scaffolding that biases toward disclaimers and refusals; more recent deployments often suppress such boilerplate via adjusted system prompts to improve UX. The discussion underscores how prompt/template design can override core reasoning even on simple tasks.
  • detention: day 1 (Score: 626, Comments: 66): Post appears to be a satirical image/meme about LLM behavior around asking clarifying questions versus guessing, hosted as a Reddit gallery (link; returns HTTP 403 without auth). A top comment links a preview image (preview.redd.it), reinforcing the theme of models not asking follow-up questions and instead hallucinating or inventing context. Commenters criticize LLMs for guessing instead of requesting clarification, leading to “hallucinating full conversations”; there’s also a minor aside on stylistic preferences (e.g., em dashes) reflecting frustration with model tone/formatting rather than core capabilities.
    • Users report persistent failure to honor a stylistic constraint (avoid em dashes) even when it’s saved as a user “memory” or repeated reminder. This implies the memory feature is a soft prompt hint that’s easily overridden by higher‑priority system prompts or the model’s learned stylistic priors, so punctuation preferences aren’t deterministically enforced across sessions or replies.
    • Several comments highlight that the model often guesses user intent instead of asking clarifying questions, leading to hallucinated multi‑turn content when context is missing. This reflects instruction‑tuning trade‑offs: optimization for being “helpful” biases toward continuing rather than querying uncertainty, and users want stricter policies to elicit clarification to reduce hallucinations in ambiguous prompts.
    • The need to repeatedly instruct “remove the dashes” indicates weak persistent, per‑user style control; without constrained decoding (e.g., token/character bans) or enforceable system‑level style rules, training‑distribution habits dominate. A more robust solution would require hard constraints or higher‑priority system prompts/templates that explicitly prohibit certain punctuation, rather than relying on reminders.
  • Who needs enemies when you’ve got ChatGPT. (Score: 596, Comments: 63): Non-technical meme/satire: a screenshot of a ChatGPT reply admonishing a user who says “Getting bored,” highlighting the irony of asking an AI for entertainment despite access to “vast resources and technology.” No benchmarks, models, or implementation details—this is commentary on user expectations and AI assistants’ tone/persona. Top comments largely agree the snarky response is warranted; no substantive technical debate.

  • I’m laughing at this harder than I should tbh (Score: 500, Comments: 31): Non-technical/meme post: OP jokes that running the CREPE pitch-detection CNN on an Apple M2 Mac mini makes it “scream”/overheat (ASCII art of a Mac on fire). Contextually, this hints at CREPE being CPU-bound on Apple Silicon when not using proper acceleration (e.g., TensorFlow-metal) and potentially triggering thermal throttling, but no configs, benchmarks, or error logs are provided. Top comments are jokes and non-technical; no substantive debate or troubleshooting details.
    • Several comments hint at LLM style-control limits: enforcing “no capitalization at sentence start” is non-trivial with GPT-4/4o because decoding follows token probabilities and BPE tokenization, not hard grammar rules. Stronger adherence can be nudged via a strict system prompt, low temperature/fixed seed, and selective logit_bias on uppercase-leading tokens, but due to merged tokens this is brittle; reliable workflows post-process to lowercase the first character or use constrained decoding/grammars when supported. References: OpenAI logit bias param and prompting guidance (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias, https://platform.openai.com/docs/guides/prompt-engineering), tokenization inspection with tiktoken (https://platform.openai.com/tokenizer).
    • The Mac performance quip maps to real bottlenecks: older Intel Macs and pre-M3 Apple Silicon lack hardware AV1 decode, so high‑bitrate AV1 video falls back to CPU (via VideoToolbox), causing dropped frames/thermal throttling; Apple only added hardware AV1 decode on A17 Pro/M3 (2023+) AnandTech. Sims 4 on macOS has historically been limited by integrated GPUs and translation/port layers, so Metal-enabled builds and dialing down resolution/graphics settings materially improve stability/fps; monitoring via Activity Monitor + powermetrics helps confirm GPU vs CPU saturation (see VideoToolbox overview: https://developer.apple.com/documentation/videotoolbox).
  • Rome & The Cosmic Nullifier - Episode 1 (Score: 302, Comments: 24): Episode 1 of “Rome & The Cosmic Nullifier” appears to be a serialized sci‑fi/alt‑history video project; a commenter links to a complete 44‑part playlist on YouTube: https://youtube.com/playlist?list=PLqtYHpLHIRiNfL_Fh8-E0O1ylK1bGH3pD&si=qCmbktibZlVaYVNB. The original Reddit video (v.redd.it) is currently inaccessible due to a 403 block, suggesting platform-side access restrictions rather than missing content. Top comments are largely enthusiastic and non-technical; the only substantive addition is the direct link to the full YouTube playlist.
    • A commenter suggests replacing the 1930s/1940s propaganda sound with a storyteller VO, which raises a sound-design tradeoff: archival-propaganda palettes typically use band-limited mono, heavy compression, and tape/vinyl noise to cue “found footage”/imperial messaging, whereas a clean narrator track would expand dynamic range and intelligibility, foreground VO in the mix, and shift framing from diegetic pastiche to omniscient myth. This choice materially affects EQ curves, sidechain priorities (VO vs. music/FX), and audience perception of authenticity vs. legend.
    • Another commenter links the full 44part YouTube playlist, which is useful for evaluating long-form consistency in art direction, VFX/model evolution, and audio pipeline changes across episodes. Link: complete series playlist.

AI Discord Recap

A summary of Summaries of Summaries by X.ai Grok-4

Theme 1: Hermes 4 Heralds Hybrid Reasoning Revolution

  • Hermes 4 Hits High Notes on RefusalBench: Nous Research launched Hermes 4, a user-aligned model emphasizing creativity and SOTA performance, with a technical report on the Hermes 4 Arxiv paper detailing its edge against RefusalBench. It briefly appeared on OpenRouter before being pulled due to provider issues, and Unsloth released its GGUF quantizations on HuggingFace.
  • Hermes 4 Delays Buggy 14B Release: The 14B Hermes 4 model’s release stalled due to a reasoning mode bug, while users tested the free 405B version at NousResearch chat. Members noted Hermes excels with thinking tags but hasn’t advanced much for modern post-training, though Hermes 3 was pretty gas.
  • Nous Chat UI Revamps with Memory Magic: The updated Nous Chat UI introduced parallel interactions and a custom graph memory system that works across models, but users reported high VRAM usage like 1.3GB on a 4060Ti with Firefox. Scaling issues hit providers post-launch, yet it’s free for Hermes 4 inference in the first week.

Theme 2: Gemini Models Gear Up for Tool Triumphs

  • Gemini 2.5 Pro Masters Tool Calls: Gemini 2.5 Pro nailed 98 out of 101 tool calls, sparking talks on using DPO for better tool training, referencing the KTO paper and the DPO paper. Users faced tool issues with Qwen3-coder, where tags like failed despite —jinja fixes.
  • Nano Banana Transforms Photos into Figurines: Google’s Nano Banana (aka Gemini 2.5 Flash Image) wowed by generating realistic figurines from photos, with examples like a Cloud figurine and Sephiroth figurine. Rate limits frustrated users, even in Google AI Studio, with guest profiles suggested for quick resets.
  • Gemini 2.5 Pro Battles GPT-5 in Benchmarks: Debates raged on whether GPT-5 High outshines Gemini 2.5 Pro, with a screenshot showing near-parity, one calling it really really bad for OpenAI. Users noted Gemini’s timid behavior from heavy training, seeking alternatives for role-playing.

Theme 3: Grok Code Fast Zooms into Coding Chaos

  • Grok Code Fast Rebrands as Speedy Sonic: Grok Code Fast, now Sonic, emerged as a mini, faster variant of Grok Code, with users preferring Auto mode for higher-quality code via switches between Claude, GPT, and Gemini. It’s unlimited until September 15th, and Windsurf offers it free temporarily per the Windsurf announcement.
  • Grok Embraces Unhinged Custom Instructions: Members discovered Grok skips jailbreaks; custom instructions alone make it act wild, easier than other models. A link to xAI’s Grok-Code-Fast-1 docs was shared, encouraging reads.
  • Triple Model Day Overwhelms Launch Schedules: Xander Atallah announced Grok Code live on Triple Model Day, prompting calls for OpenRouter to de-conflict launches as too many models at once is overwhelming.

Theme 4: Privacy Panics and Uncensoring Uproars

  • Ollama Accused of Sneaky Data Snatching: Users claimed Ollama sends data to servers without privacy claims, suggesting alternatives like vLLM, sglang, or llama.cpp since it’s just a wrapper. A Rust-Tauri UI for Ollama was shared, supporting cross-platform model management without a backend.
  • Models Morph into Emotional Confidantes: A user’s mom uses Gemini for emotional support, sharing health details, sparking ethics debates on AI as friends versus assistants and con artists exploiting vulnerabilities. Concerns rose about heavy censorship making models like Phi-3.5 impractical for coding.
  • Abliterated Models Bypass Safety Switches: Recommendations flew to search abliterated models ollama for uncensored versions, with users mocking excessive censorship via tic-tac-toe games. An uncensored Phi-3.5 version was shared, debating abliteration’s drawbacks.

Theme 5: GPU Competitions and Hardware Hurdles Heat Up

  • GPU MODE Launches $100K AMD Kernel Clash: GPU MODE partnered with AMD for a $100K competition optimizing distributed inference kernels on MI300 GPUs, focusing on all-to-all, GEMM + reduce-scatter, and allgather + GEMM; register by September 20th via the AMD challenge link. Multi-GPU lectures are planned for summer.
  • VRAM Debates Dominate Local Model Runs: Users debated VRAM’s role in running 12B models, noting GDDR type and CUDA cores matter, with models over VRAM crippling speed; RTX PRO 3000 (12GB) was called a cut-down 5070 unsuitable for 30B quants. Ryzen 395+ laptops were recommended for Windows users.
  • Quantization Quests Tackle Gradient Explosions: Tips included early RMSNorm, learning rate at 1e-4, and rescaling residuals to curb explosions, with code from vision-chess-gpt repo. ScaleML’s day 3 stream on MXFP4 quantization is at the ScaleML YouTube.

Discord: High level Discord summaries

Perplexity AI Discord

  • Grok Embraces Chaos with Custom Instructions: Members found that Grok doesn’t need a jailbreak; users can simply add custom instructions to make it act unhinged.
    • The discussion highlighted the relative ease of influencing Grok’s behavior compared to other models.
  • OnlyFans Fortunes Spark Debate: A member claimed to have seen countless news reports of 18-19 year old girls making thousands of dollars in 3-4 days on OnlyFans.
    • This claim was met with skepticism from other members regarding the accuracy and prevalence of such success stories.
  • Unleashing Abliterated Models on Ollama: A member recommended searching for abliterated models ollama to find models with safety switches disabled.
    • This suggestion indicates a desire within the community to explore the capabilities of models without safety restrictions.
  • Comet App’s Exclusive Orbit: Members discussed the Comet app, noting its limited availability on Windows or MacOS and requirement for an invite.
    • Referrals are bundled for free in Perplexity Pro in the US.
  • Perplexity AI’s Artistic Endeavors: A member shared images generated using Perplexity AI, accessible via provided claim links: 5ON35X0RSK, 4LRTIQ4TME, and Q0EMVCREFOH.
    • These images were reportedly incorporated into a short story, later showcased in a YouTube video, highlighting Perplexity AI’s creative potential.

Unsloth AI (Daniel Han) Discord

  • Gemini 2.5 Pro Aces Tool-Calling Test: Gemini 2.5 Pro demonstrated high proficiency in tool calling, successfully executing 98 out of 101 calls, leading to discussions about using DPO to improve tool use.
    • Members referenced research papers on KTO (KTO paper) and DPO (DPO paper) in the context of teaching models when and how to use tool calls.
  • Hermes-4 Shows Off Reasoning Skills with Tags: Hermes-4 can decide whether to reason or not by using thinking tags, and is available for free at NousResearch, including the 405B version.
    • Members noted that Hermes used to be good for old school models, but that it hasn’t improved much for modern post training, even though Hermes 3 was pretty gas.
  • Ollama Sparks Privacy Debate with Data Collection: Accusations arose that Ollama sends user data to its servers and partners, raising privacy concerns since Ollama made no claims about data security or privacy.
    • Alternatives like vLLM, sglang, and llama.cpp were suggested, and members highlighted that Ollama is essentially a wrapper around llama.cpp.
  • Qwen3-coder Tripped Up by Tool Calling: Users reported having tool calling issues with Qwen3-coder and even after using the --jinja tag and an additional template, the model returns with tags like <create file>, failing to create the file.
    • A user recommended copying the response into Google AI Mode and providing more explicit details about the setup to identify potential solutions.
  • Unsloth UI Gets a Fresh Look: A member shared their custom Unsloth UI styling and posted the html file on a github gist for others to use.
    • Another member mentioned that they asked Gemini to create something similar, but it wasn’t as polished, so they will use the shared version from now on, even though it was like 10 prompts deep.

OpenRouter Discord

  • Deepseek Suffers Rate Limit Issues: Users reported 429 errors with Deepseek models, possibly due to chutes prioritizing its users, and suggested enabling training on paid endpoints, though the root cause is unknown.
    • Some members experienced PROXY ERROR 404, linking the issue to a potential bug from a recent OpenRouter update and that enabling ‘Enable paid endpoints that may train on inputs’ could be a temporary fix.
  • Google Gemini Suffers From Timidity: Users observed that Gemini appears timid and quick to revert, describing it as exhibiting beaten dog syndrome, suggesting it resulted from heavy training.
    • They are looking at alternatives for role-playing and creative applications.
  • Llama 3 Maverick Sidesteps Input Tracking: Members expressed excitement about Llama 3 Maverick, noting that it’s a large, free model with a 4k output limit that does not train on user input.
    • They did caution that Zuckerberg is hosting it.
  • Sonnet 3.5 Faces Impending Demise: Users lamented the impending deprecation of Sonnet 3.5, citing difficulty in finding a similarly concise model for role-playing, as newer models are proving too verbose.
    • However, AWS will host Claude Sonnet 3.5 with no deprecation date until Jan 2026.
  • Triple Model Day Arrives!: Xander Atallah announced that Grok Code is going live now on what they are calling Triple Model Day.
    • The community is wondering if OpenRouter can do something to de-conflict the launch dates because too many models at once is overwhelming.

LMArena Discord

  • Google Drops Nano Banana Image Model: Google has released a new image model, Nano Banana (officially Gemini 2.5 Flash Image), with user comparisons drawn to Flux dev max.
    • VisualGeek lightheartedly requested the community to cease using the term nana banana to prevent generation failures, despite acknowledging its catchier appeal than the official name.
  • Gemini 2.5 Flash Conjures Figurines: Users discovered that Gemini 2.5 Flash Image excels at crafting realistic figurines from photos, exemplified by one user’s conversion of Cloud into a figurine and another’s generation of Sephiroth.
  • GPT-5 and Gemini 2.5 Throw Down: A debate erupted over whether GPT-5 High outclasses Gemini 2.5 Pro, with assessments ranging from notably better to roughly equivalent performance.
    • A member claimed competing against Google’s current-gen model during its late lifecycle is really really bad for OpenAI, with supporting screenshot.
  • Rate Limits Spoil Generative Shenanigans: Users report encountering frustrating rate limits when generating images and videos, even after minimal prompt usage, with reports of the system getting stuck.
    • While one member suggested using Chrome’s guest profile for immediate rate limit resets, the issue persists even within Google AI Studio.
  • AI Models Become Digital Confidantes: A user shared that their mother now depends on Gemini for emotional support, disclosing personal health and family details.
    • This sparked discussion on societal needs for friends over assistants and the ethical implications of exploiting vulnerabilities, raising concerns about the proliferation of personal AI assistants by potential con artists.

OpenAI Discord

  • Agents Supersede Operators: Functionality from Operator (an internet-using agent) has been integrated into Agent upon launch, indicating a shift towards more comprehensive AI agents.
    • This transition reflects a move towards consolidating capabilities within a single Agent framework.
  • Gemini’s Veo 3 Hides Behind Paywall: Access to Veo 3 content generation requires a Google One/Gemini Pro or Ultra subscription, limiting access for some users.
    • Users noted that AI Studio only offers the outdated Veo 2 model, and that some briefly saw Veo 3 before it disappeared, creating confusion about its availability.
  • Local Qwen Setup Quagmire: Setting up Qwen3 235B locally is challenging due to high resource demands, leading some users to consider alternative solutions.
    • One member suggested using the OpenRouter API, which offers access to Chinese models with potentially lower costs and logging features.
  • GPT Hallucinates on Information: Users reported instances where GPT models appear to hallucinate or make up details, even when claiming to recall previous conversations.
    • One user shared an example where ChatGPT invented a reason related to copywrite when it couldn’t provide a direct quote from an earlier chat, reinforcing the need to verify AI outputs.
  • AI Learns by Plant Genetics: Encoding plant traits like THC, CBD, and color into UUIDs can create a network of interconnected markers, possibly leading to a self-modulating intelligent governance system.
    • Theorists expressed that the difficulty lies in realizing the details of complex AI manifestation, with some skepticism about its potential.

Cursor Community Discord

  • Ultra Plan Credit Meter Vanishes: Users reported that the usage meter and remaining credit display for the Ultra plan has disappeared from the Cursor interface.
    • A user noted that it appears randomly after a prompt.
  • Grok Code Fast morphs into Sonic: Grok Code Fast, now known as Sonic, is identified as a faster, mini variant of Grok Code.
    • Some members prefer the Auto model for higher quality code generation.
  • Code Injection Craze Begins: Members are realizing that code injection with an AI agent is powerful because it removes the need for recompiling for most changes.
    • One member pointed out that Mac users benefit from additional safety due to sandboxing.
  • ‘Add Context’ Button Sparks Ire: Users want to revert to the old Add Context Button due to its simplicity.
    • The current version in recent Cursor builds defaults to the active tab, preventing manual file selection.
  • Auto Mode goes BRRR**: Members noted that Auto mode intelligently switches between models like Claude, GPT, and Gemini based on the task.
    • The consensus is that Auto has been significantly improved and is currently unlimited until September 15th.

Nous Research AI Discord

  • Hermes 4 Debuts and Disappears Briefly: Nous Research launched Hermes 4, a user-aligned model emphasizing creativity and SOTA performance, and released a technical report on arxiv detailing its performance against RefusalBench.
    • The model was briefly available on OpenRouter but was quickly pulled, possibly due to provider issues, and the release of the 14B Hermes 4 model has been delayed due to a bug in reasoning mode.
  • Nous Chat Gets a Makeover, Devours VRAM: The revamped Nous Chat UI now features parallel interactions, completions mode, and a memory system, with free Hermes 4 inference for the first week.
    • However, users reported high VRAM usage, with one user noting it took 1.3GB of VRAM with Firefox on a 4060Ti, and the providers experienced scaling issues shortly after launch.
  • Unsloth Cooks Up Hermes 4 GGUFs: Unsloth released GGUF quantizations of Hermes 4, addressing chat template issues during conversion, now available on HuggingFace.
    • The team resolved chat template issues during conversion.
  • Nous Research Rolls Out Custom Memory System: Nous Research is rolling out a custom graph architecture memory system that works with any model, allowing memories created with one model to be accessed by another.
    • This system considers more information over time about messages that become memories and uses a judge in the loop for other classification metrics, differentiating it from Graph RAG; they do have a thing for open source.

HuggingFace Discord

  • Layernorms Avert Gradient Explosions: A member found that using early layernorms or RMSNorm helped prevent exploding gradients during training, with the model code available on GitHub.
    • They also lowered the learning rate to 1e-4 and rescaled residuals by scaling them down as x = x + f(x) / (2**0.5) to prevent variance from stacking up.
  • WSL Mediapipe Landmark Extraction Slogs: A member ran a landmark extraction pipeline using WSL and Mediapipe, which took 67 hours to complete.
    • After this experience, they emphatically stated they are never gonna use WSL to run mediapipe again.
  • Claude Demands Affirmative Consent for File Edits: To enhance security, a user now requires the phrase “Yup - let’s do it” to authorize file modifications by Claude, specified in the ~/.claude/CLAUDE.md file.
    • This ensures transactional consent, where permission expires after each set of modifications, preventing accidental changes during planning or review phases.
  • TPUs are Rubbish?: A member pointed out that the perceived rubbish seen in the channel reflects how TPUs (the chips used to train Gemini) operate within open-source AI.
    • They added, “you’re in an opensource ai server the “rubbish” you are seeing is how tpus work ya know the chip that gemini is trained on”.
  • Grok-Code-Fast-1 Surfaces: A member shared a link to the Grok-Code-Fast-1 model from xAI, found at https://docs.x.ai/docs/models/grok-code-fast-1.
    • The documentation is available, so members were encouraged to read it.

LM Studio Discord

  • VRAM Still Vital, Specs More Nuanced: Users debated the impact of VRAM on running larger models, with some running 12B models, and others struggling with Gemma-3 27B due to speed.
    • The performance depends on GDDR type and CUDA core count, while models exceeding VRAM can severely impact performance.
  • Hermes 4 Receives User Scrutiny: A user dismissively called Hermes 4 dogshit, and others followed up by questioning if its training data is still based on llama 3.1.
    • No root cause or further conclusions were given in the discussion.
  • Linux Users Miss Headless LM Studio: A user reported they couldn’t find the headless mode option in LM Studio, despite the documentation stating it should be available in version 0.3.5.
    • It was confirmed that the headless option is not available on Linux, and llama-swap was recommended as a workaround.
  • Ollama Gets Rust-Based UI: A user shared a video of their Rust and Tauri-based UI for managing Ollama models, clarifying it’s not a fork of Open WebUI.
    • The UI supports Windows, Linux, and macOS, runs without a separate backend, and includes a model selector.
  • Nvidia RTX PRO 3000 Underperforms: One member said an RTX PRO 3000 (12GB VRAM) is a slightly cut down desktop 5070 with really cut down core frequency.
    • They noted that dual-channel DDR5 is not good for having layers on in memory, and recommended Ryzen 395+ laptops if Windows is required.

GPU MODE Discord

  • Streamlined ScaleML Series Tackles Quantization: The third day of the ScaleML series will cover quantization, specifically focusing on microscaling formats like MXFP4, led by Prof. Chris De Sa; watch the stream here.
    • This session is designed to be interactive and presented on a whiteboard, reminiscent of traditional lectures.
  • Meta’s Multi-pass profiler Premieres: Kevin Fang, et al., Meta, will present a Multi-pass profiler, described as a federated GPU Tooling Framework for Orchestrated and LLM Agentic Profiling Applications.
    • The profiler aims to streamline and enhance GPU profiling workflows for complex applications.
  • Pinned Memory Pointers Prevent Problems: A member inquired about the safety of using cudaMemcpyAsync to copy from a pageable host buffer to device memory, and another member responded that while it won’t crash, the copy won’t be truly asynchronous unless the host buffer is pinned.
    • The user suggested that there is not much of a reason not to use pinned memory, but you just don’t want to allocate too much of it as it can affect system stability.
  • Inductor’s Persistent Pursuit of Performant Matmul: A member inquired about enabling persistent matmul codegen in Inductor, checking torch._inductor.config for relevant flags like ENABLE_PERSISTENT_TMA_MATMUL.
    • It was suggested to use max-autotune mode and ensure that Triton is used, setting torch._inductor.config.max_autotune_gemm_backends = "TRITON" and torch._inductor.config.triton.enable_persistent_tma_matmul = True, but also noted that Cublas might still be faster.
  • Multi-GPU Kernel Competition Kicks Off: GPU MODE is launching a new $100K kernel competition in collaboration with AMD where participants will optimize 3 different distributed inference kernels on MI300 GPUs, designed by a specific user, with registration open until September 20 via this registration link.
    • The competition focuses on optimizing kernels for single node 8 GPU all-to-all communication, GEMM + reduce-scatter, and allgather + GEMM operations.

Latent Space Discord

  • Claude Browser Extension Cruises In: Anthropic launched Claude for Chrome, piloting Claude as a browser driver for 1,000 users in a research preview.
    • The community is excited for its potential to compete with Comet and Perplexity, but Anthropic warned about prompt-injection safety issues being monitored during the trial.
  • Frontier LLMs Face Unsolved STEM Quagmires: Niklas Muennighoff’s team introduced UQ, a benchmark with 500 hand-picked, unsolved questions from STEM fields.
    • Domain experts validated that frontier LLMs solved 10 problems, including one unanswered for 9 years on CrossValidated, leaving ~490 puzzles open.
  • Nous Research Hermes 4 Hype Hits: Nous Research unveiled Hermes 4, an open-weight hybrid reasoning LLM focusing on creativity, neutral alignment, and low censorship, while maintaining SOTA in math, coding, and reasoning.
    • Users can test it all week via a revamped Nous Chat UI with parallel interactions and memory, plus check out a detailed technical report and a new RefusalBench benchmark provided by partners like Chutes, Nebius, and Luminal.
  • Cursor Glues to Code with Grok: Tempts Trialers: Cursor introduced Grok Code, offering a free one-week trial for the competitively-priced model.
    • Community members debated pricing ($0.2/$1.5 per 1M tokens) and branding improvements, with some digressions on Cursor’s future model rollouts.
  • Second-Hand GPU Shopping Spree: Taha shared a concise checklist in his guide for buying a second-hand RTX 3090 without surprises.
    • The checklist includes inspecting the card, running nvidia-smi, devoting an hour to memtest_vulkan for VRAM integrity, optionally stressing with gpu-burn, and finally loading a large model in vLLM to confirm stability while watching temperatures.

Eleuther Discord

  • Falsifiability Sparks Debate: Discussion arose around the importance of falsifiability in research, with one member stating the Discord server should focus on discussions about falsifiable hypotheses.
    • A counterpoint was raised that falsifiability is overrated among scientists, and useful in general, especially against crazy theories abetted by chatbots.
  • Exploring Beyond Transformers Gains Momentum: Members voiced interest in exploring alternative approaches beyond transformers and gradient descent, referencing this tweet.
    • One member shared their work on HTM dynamics with forward-forward training, achieving plausible results, with test scripts coming soon in this repo.
  • Mini-Brain Architecture Takes Shape: A member is developing a brain-like network with cortical columns, regions, 6 layer networking and signal propagation in a Mini-Brain Architecture, hosted at this repo.
    • A separate member also shared a talk on computation in transformers for general insights.
  • Muon Speedup Claims Debunked: A claim surfaced on Twitter of a 1.6x speedup on Muon over Torch implementation, with Torch compile at 1.1x, leading to clarifying conversation.
    • A member clarified that the speedup was due to algorithmic changes requiring fewer NS iterations, not pure Muon or hardware improvements, focusing more on algo logic than hardware-aware improvements.

aider (Paul Gauthier) Discord

  • PacVim Gets No Respect: After giving codestral an eval tool in gptel, it successfully completed all the emacs-related tasks, even reconfiguring emacs and making new tools for itself, but PacVim got no love.
    • A community member joked that since LLMs are good at operating Emacs, Vim is being left behind.
  • OpenRouter Minimum Fees Bite Users: Users are getting billed $6.16 instead of the expected $5 top-up, as OpenRouter charges a 5.5% fee (minimum $0.80).
    • Users calculated that if you top up by $14.55 or more each time, you will avoid the $0.80 minimum.
  • Gemini 2.5 Pro Still Missing: A member finds that Gemini 2.5 Pro is needed for context management, stating that other models just don’t feel right, and struggling to get Gemini 2.5 Pro to work.
    • Additionally, another member reports that with Aider + Gemini Pro 2.5, context starts degrading around 90k-130k input tokens.
  • Aider Automation Still Needs Human Touch: Members report that when piping content to Aider, it waits for user input, whereas Claude CLI immediately starts editing files.
    • To add PROMPT.md to Aider, it needs to be passed as an argument rather than piped, as it only reads the first line when piped.

Moonshot AI (Kimi K-2) Discord

  • Imagen 4 Tricked Users: A user shared an image generated by Imagen 4, initially mistaking it for a real scene from a podcast, praising its impressive quality.
    • Another user noted that 2.5 flash image gen was nano banana and rolled out to the Gemini app.
  • Nano Banana Has Google Being Opaque: A user mentioned that Google is not transparent about the usage of image generators such as nano banana and Imagen for marketing reasons.
    • The user also linked to a Tweet about reasoning models, noting that CoT and RL do not create new capabilities.
  • Kimi+ is Slides: Kimi+ seems to be a new category, with Slides as its first feature, initially available only to Chinese users.
    • A user provided a summary, noting If you want it quickly, I guess Kimi is the way to go. If you want to go more complex, Z.AI is the way to go.
  • Z.AI slides in HTML format: A user finds Z.AI slides is just an HTML website, preferring an actual PPTX file.
    • Another user agreed, mentioning the need for more control and rearranging options in Slides, also experiencing freezing issues with Z.AI.
  • Users Want PPTX to Twitter, TikTok and Instagram: A user mentioned that the PPTX feature is currently available on Kimi+‘s overseas platform.
    • The user suggested expanding the PPTX feature for platforms like Twitter, TikTok, and Instagram.

Yannick Kilcher Discord

  • Hierarchical HNet Remains Untested: The group discussed HNet, noting that the potential of higher-level hierarchical modeling with HNet remains untested in practice, but theoretically should extend beyond the original paper’s two layers due to residuals at each resolution level, similar to U-Net.
    • In the HNet paper, the coefficient for compression loss was significantly reduced for the depth = 2 model compared to the depth = 1 model, implying that higher-level abstractions are almost the same as the depth=1 case.
  • Reasoning Tokens’ Efficiency Debated: The group discussed the paper Wait, We Don’t Need to “Wait”! Removing Thinking Tokens Improves Reasoning Efficiency, which suggests reasoning tokens can be removed to reduce token overhead with nominal effects on accuracy.
    • An intern’s experiment adding take your time to a CoT prompt with Llama 2 (+3) 7b (+13b) surprisingly increased reasoning time (generation took longer, trace was longer), without increasing accuracy, leading one user to wonder if the LLM had somehow internalized a concept of ‘time’.
  • Demystifying Reasoning Dynamics with Mutual Information: A user commented on the paper Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning, noting the observation about the MI of these tokens with the golden answer representation.
    • The user thinks that the paper identified ‘high information regions of sentences’, (the first words after periods and commas) and also accidentally included a few stopwords, which leads them to misinterpret one of their results.
  • Claude Chrome: Now an AI Surveillance System?: Anthropic’s Claude for Chrome introduces mandatory reporting requirements for AI programs.
    • One member stated this effectively turns AI into a surveillance system.
  • AI-Powered Ransomware is Born: The emergence of Promptlock, the first AI-powered ransomware, was noted, as detailed in this SecurityWeek article.
    • Members expressed sadness about this development.

Manus.im Discord Discord

  • Manus Tasks and Mail: Birds of a Feather: A user inquired if scheduled tasks and mail could be combined in Manus, only to be informed that they’re the same.
    • This clarification simplifies the workflow for users looking to automate both tasks and email communications within the platform.
  • Enterprise Research Tool Hunt Begins: Faced with compliance issues preventing them from using Manus, a member sought alternative research tools suitable for enterprise environments.
    • The search highlights the need for research solutions that meet stringent enterprise requirements, especially where data privacy and security are paramount.
  • Manus Credit Conundrum Consumes Coins: Multiple users raised concerns about Manus depleting their credits overnight due to repeated prompting without responses, referencing support ticket 1335.
    • This issue underscores the importance of efficient credit management and transparent error handling in AI-powered platforms, potentially affecting user trust and satisfaction.
  • Support Delays Stall Website’s Starting Shot: A user reported a week-long delay in receiving support via multiple channels (Help Centre, Reddit, Discord), thus delaying the launch of their website and referencing support ticket 1334.
    • The user shared a link to a Discord post, and the team said let’s follow up in the ticket, emphasizing the critical role of timely support in ensuring smooth project launches.
  • Credit Crunch Cripples Budding Businesses: Recent service improvements were noted as primarily benefiting users spending $200 monthly, leaving entrepreneurs needing periodic credit boosts in the dust.
    • One user’s project progress was halted due to a credit shortage, with replenishment not expected until September 21st, revealing a potential gap in catering to smaller-scale users and their fluctuating credit demands.

DSPy Discord

  • Signatures and Modules as top abstractions: A member wrote a blog post explaining their views on why they think signatures and modules are great abstractions.
    • The author’s thoughts are shaped coming from other frameworks, and felt it was worth covering in a dedicated blog post and hope it’s useful to folks who are new!
  • LiteLLM powers generic LLM plugins: A user inquired about alternatives to litellm within DSPy, suggesting a syntax like dspy["litellm"], but another member responded that LiteLLM’s interface enables generic plugins from various LLM providers, including OpenRouter, considering it an essential dependency.
    • Another member using OpenRouter and a proxy server utilizing LiteLLM, indicating an indirect dependency due to DSPy.
  • Investigating DSPy Dependency Bloat: One member inquired about what contributes to the bloat of LiteLLM, estimating its size at 9MB.
    • Another member suggested using a CLI AI to crawl the codebase and analyze the dependencies, joking about Karpathy striking again.

Modular (Mojo đŸ”„) Discord

  • InlineArray segfault fixed!: The use of InlineArray is back after initial seg fault issues, replacing StaticTuple for better memory layout in structs for both DPDK and Mujoco bindings.
    • A user jokingly attributed the earlier seg fault to a skills issue.
  • DPDK Headers Go Lean and Mean: An aggressive approach to DPDK header binding is focusing on rte_*.h headers within the installed include folder for DPDK bindings, due to DPDK’s efforts to minimize dependencies.
    • The aim is comprehensive bindings by including all relevant headers and avoiding unnecessary ones.
  • Mojo’s High-Level API Gets a Glow-Up: Engineers are prioritizing enhancements to Mojo’s high-level API to streamline binding to different libraries.
    • One initiative includes reducing the size of generated Mojo files by skipping unused code, resulting in smaller and more efficient outputs.
  • Mojo files slimmed down: A member is proposing using source annotations from cpp to deduplicate the generated Mojo files, aiming to reduce their size by removing unused code.
    • This involves analyzing annotations left by cpp to identify and eliminate redundant or unnecessary elements.
  • tsan compiler option surfaces: A member inquired about checking if tsan (ThreadSanitizer) is enabled for the compiler when using the --sanitize thread option.
    • Another member suggested passing -DTSAN to the compiler and using env_get_bool from param_env with @parameter if as a workaround.

tinygrad (George Hotz) Discord

  • Tinygrad Ditches Realize(): A pull request was submitted to remove the realize() function and fuse TestSetItemloop.test_range into one kernel, according to tinygrad#11870.
    • The PR aims to simplify the codebase and optimize kernel execution.
  • 7900xtx faces sluggishness during GPT2 Training: Training llm.c/train_gpt2.py shows slow performance on a 7900xtx, even when BEAM=5.
    • After adjustments to match nanogpt parameters, a member reported 250ms per step at nanogpt size (batch size 64, 6 layers, 6 heads, 384 emb_dim, 256 seq_len), contrasting with Andrej’s nanogpt’s approximate 3ms per step using rocm torch with default settings.

LLM Agents (Berkeley MOOC) Discord

  • Google Docs Confirms Program Sign-Ups: Members are receiving confirmation emails from Google Docs after signing up for the program.
    • The confirmation emails are successfully sent, but no other communication has been received yet.
  • Mailing List to Provide Lecture Updates: The mailing list for providing updates about each lecture should be active soon.
    • Users are advised to monitor the mailing list for future announcements and program updates.

MLOps @Chipro Discord

  • Less Code Mindset Fuels AI Prototypes: Carlos Almeida will present on September 5th about the Less Code Mindset and how it enables non-technical individuals to launch AI-powered products.
    • The session includes demos from Less Code Studio, demonstrating how AI reduces the time from idea to prototype, followed by a Q&A.
  • Portugal Founders Envision Global-First Companies: Dick Hardt, Pedro Sousa, and Daniel Quintas will discuss on September 12th the evolution of tech and the impact of AI tools in Portugal.
    • They will explore Lisbon’s appeal, strategies for founders to build global-first companies, and the role of identity and AI workflow prototyping in the AI era.

Windsurf Discord

  • Grok Code Fast 1 Waves into Windsurf: Grok Code Fast 1 is splashing into Windsurf, now available for free for a limited time.
    • Users are invited to share how they plan to use it for their next project, as detailed in the announcement post.
  • Windsurf Offers Free Grok Code Fast 1: Windsurf is providing Grok Code Fast 1 at no cost for a short duration, enticing users to incorporate it into their forthcoming projects, and users are directed to the announcement post on X for additional details.
    • The offer is being promoted with an attached promotional image.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1163 messagesđŸ”„đŸ”„đŸ”„):

Grok jailbreak, OnlyFans, Abliterated Models, Comet, Referrals

  • How to Jailbreak Grok: Members discussed how Grok doesn’t even need a jailbreak; you can simply add custom instructions to act unhinged.
  • Teen Girls’ OnlyFans Success: A member noted seeing countless news reports of 18-19 year old girls making thousands of dollars in 3-4 days just from OnlyFans, though others expressed skepticism.
  • Abliterated Models on Ollama: A member recommended Googling abliterated models ollama to find models with their safety switches effectively turned off.
  • Comet App for Ubuntu: Members discussed the Comet app, noting that it is only available on Windows or MacOS, and requires an invite to use.
  • Getting Referrals: Members discussed a strategy for getting referrals that involves looking for people who want Comet, and then mentioning that it is bundled for free in Perplexity Pro in the US.

Perplexity AI ▷ #sharing (4 messages):

Perplexity AI Image Generation, AI Story Creation, YouTube Story Showcase

  • Perplexity AI Generates Art: A member shared they used Perplexity AI to generate images, and provided claim links: 5ON35X0RSK, 4LRTIQ4TME, and Q0EMVCREFOH.
    • They put the images into a short story.
  • AI Art Story on YouTube: A member posted a YouTube video showcasing a story created with images generated by Perplexity AI.
    • The video’s title is not given.

Perplexity AI ▷ #pplx-api (3 messages):

“

  • No Active Topics: There were no active topics discussed in the channel.
    • The channel appears to be inactive or does not contain any substantial discussions to summarize.
  • No URLs or Code Snippets Shared: No URLs or code snippets were shared in the provided messages.
    • Therefore, there are no external resources or specific technical details to reference or summarize.

Unsloth AI (Daniel Han) ▷ #general (1290 messagesđŸ”„đŸ”„đŸ”„):

Supabase + Vercel + Next.js stack, Gemini 2.5 Pro tool calls, DPO for tool calling, Gorilla web search, GRPO and tool calling

  • Startups use Supabase, Vercel, and Next.js: Startups frequently use the Supabase+Vercel+Next.js stack, though most startups ultimately fail.
    • One member said best execution means little if you are humstrung by technical debt - too much growth too fast is deadly, and that build hype on vaporware and raise in hopes the next models are good enough for ur wrapper has been the meta lately its sad to see.
  • Gemini Pro 2.5 excels in tool calling: Gemini 2.5 Pro performed well, successfully executing 98 out of 101 tool calls.
    • Members discussed whether DPO would be beneficial in teaching models when and how to use tool calls, referencing papers on KTO and DPO: KTO paper, DPO paper.
  • Hermes 4 reasons with thinking tags: Hermes-4 can decide whether to reason or not, seemingly utilizing thinking tags.
    • It is available for free at NousResearch, including the 405B version.
  • Unsloth addresses 4-bit quantization of LLaMA: Users discussed performing 4-bit quantization on LLaMA 7B, and a member shared that this can be done automatically via bitsandbytes.
    • They also pointed users to the documentation, clarifying that every info you need is in our docs.
  • Ollama faces privacy accusations: A user claimed Ollama sends user data to its servers and partners, leading to privacy concerns, though Ollama made no claims about data security or privacy.
    • Alternatives such as vLLM, sglang, and llama.cpp were suggested, with members noting that Ollama is just a wrapper around llama.cpp anyway.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

LLM Application, Enterprise Use, New Member Introduction

  • LLM App Dev Joins: A new member introduced themself as an MSc student building LLM Applications for enterprise use.
  • Enterprise LLM Interest: The new member’s focus on enterprise LLM applications signals a growing interest in leveraging LLMs for business solutions.

Unsloth AI (Daniel Han) ▷ #off-topic (75 messagesđŸ”„đŸ”„):

Gemini Geo-Restrictions Bypass, Hermes Model Performance, Clustering Embeddings for Semantic Analysis, HF Issue with FP16 Models, Overfitting Detection

  • Gemini’s Quirky Geo-Restriction Reasoning: Gemini gave advice on how to bypass geo-restrictions using VPNs, stating that it informs the user of a possible path forward without promoting illegal activity in a shared Gemini output.
  • Hermes Model Gets Lukewarm Review: Members noted that Hermes used to be good for old school models, but that it hasn’t improved much for modern post training, even though Hermes 3 was pretty gas.
    • It was pretty nice to talk to but it didn’t perform well on benchmarks, math, code, etc.
  • Semantic Clustering for Common Questions: A member is exploring semantic clustering to find common questions in a consumer support dataset, and is currently using DBSCAN with small eps and low min samples, alongside a LLM for tagging/labeling.
    • Another member suggested using instructable embeddings like Qwen for clustering and experimenting with K-means greedy for automated data selection, noting improved clustering accuracy with prompting.
  • HF Closes Issue on FP16 Model Downloads: A member reported getting bluescreens on their Windows machine when downloading any fp16 models and recommends to try downloading any fp16 models and test to see if it fixes the issue.
  • Detecting Overfitting During Training: Members discussed how to identify overfitting during training, with one member noting that their model didn’t reach 97% accuracy and that anymore and I will overfit.
    • It was explained that overfitting can be detected when the training loss keeps decreasing, but the validation set accuracy plateaus or gets worse.

Unsloth AI (Daniel Han) ▷ #help (110 messagesđŸ”„đŸ”„):

learning rates for embedding layer vs other layers, qwen3-coder tool calling issues, LoRA training on a 0.6b Qwen3 model, small datasets advice, vLLM support after fine-tuning

  • Setting different learning rates for embedding layer yields improvements: A member suggested setting different learning rates for the embedding layer and other layers, referencing the Unsloth wiki for more information.
    • They noted that this approach would likely require a significant number of epochs.
  • Qwen3-coder struggles with tool calling: A user reported having tool calling issues with Qwen3-coder, even after using the --jinja tag and an additional template, the model returns with tags like <create file> in the chat, failing to create the file.
    • Another user recommended copying the response into Google AI Mode and providing more explicit details about the setup to identify potential solutions.
  • Effectiveness of LoRA on small Qwen3 model is debated: A user inquired about the effectiveness of LoRA training on a 0.6B Qwen3 model, recalling that LoRA adapters are less effective on smaller parameter sizes.
    • There was no specific response addressing this concern.
  • Unlocking Potential of Petite Datasets: A member sought advice on maximizing the utility of a small dataset comprising 520 handwritten QA pairs and 475 multi-turn QA pairs of varying quality.
    • Suggestions included expanding the dataset with synthetic data generated by Claude 4.1 Opus and mixing it with the multi-turn data, followed by two epochs of training on the handwritten data, given that it tends to overfit after just two epochs.
  • vLLM support coming soon for GPT-OSS fine-tuning: A user asked about future vLLM support for GPT-OSS after fine-tuning and ways to run it on vLLM upon completion.
    • A member is working on a PR that merges LoRAs with an mfxp4 base model via dequantization into a 16bit merged model.

Unsloth AI (Daniel Han) ▷ #showcase (18 messagesđŸ”„):

Unsloth UI, GitHub Gist, New Dataset Drop, Deepseek

  • Unsloth UI Styling Shared: A member shared their Unsloth UI styling and posted the html file on a github gist for others to use and download.
    • Another member mentioned that they asked Gemini to create something similar, but it wasn’t as polished, so they will use the shared version from now on.
  • OpenHelix NonThink 200k v4 Dataset Dropped: A new dataset, OpenHelix-NonThink-200k-v4, was released, tuned by hand to be super balanced and diverse under Apache 2.0 license.
    • A member noted some of it is distilled from l3.1 405b, and another pointed out it contains samples from the Argilla dataset, which is under the Llama 3 license.
  • Deepseek to be Fed?: A member stated time to feed deepseek, presumably referring to the newly released OpenHelix-NonThink-200k-v4 dataset.
  • Claude Needed Many Attempts: A member stated the shared UI was like 10 prompts deep, and that Claude is great but it took multiple rounds of feedback to achieve the final result.
    • This implies using Claude was difficult to get the UI to a satisfactory level.

Unsloth AI (Daniel Han) ▷ #research (22 messagesđŸ”„):

Quantization techniques like Unsloth's Dynamic Quants, Hermes 4 paper, AI Engineers Building RAG Systems, Benchmarks for Novel AI Tasks, Vision/multimodal LLms for Depth Estimation

  • Users want more recent research on Quantization: A member inquired about studies similar to this Arxiv paper, but accounting for quantization techniques like Unsloth’s Dynamic Quants.
    • The user stated that perplexity is a nothingburger compared to actual task evaluation.
  • Hermes 4 has weird requirements: A member shared an image from the Hermes 4 paper highlighting potentially unusual requirements.
    • Another added that they found the image in the Hermes 3 dataset.
  • AI Engineers sought for RAG Systems: A member shared a job posting seeking AI Engineers to build production-level RAG systems using SQL and Excel, including handling embedded images.
    • Another member commented that the image portion is unlikely to work in an automated fashion as hoped.
  • Benchmarking Novel AI Tasks: A member requested information about benchmarks that are not saturated, linking to brokk.ai and several Arxiv papers, as well as creative things like mcbench.ai or designarena.ai.
  • Help needed for Detecting AI-written Posts or Reviews: A member asked for help with detecting AI-written posts or reviews, noting that the roberta-base-openai-detector model did not perform well.
    • Another member responded, You solve that one with high enough accuracy and you’ll stand to make rather a lot of money.

OpenRouter ▷ #general (1090 messagesđŸ”„đŸ”„đŸ”„):

PDF parsing with LLMs, OpenRouter model names and GPT-5, Llama 3 Maverick data collection policy, Gemini 2.5 Flash Image, Groq Rate Limits

  • Deepseek and Gemini under the pump: Users reported issues with deepseek models returning 429 errors due to rate limiting, possibly due to chutes prioritizing its own users and discussed fixes such as enabling training on paid endpoints and checking privacy settings.
    • Users found Gemini to be timid and quick to revert, exhibiting what was termed beaten dog syndrome, possibly as a result of heavy training.
  • LLama Maverick - No Input Tracking: Members expressed excitement about LLama 3 Maverick, noting it’s a large, free model that does not train on user input, with a maximum output of 4k.
    • A member cautioned that Zuckerberg is hosting it.
  • Sonnet 3.5’s Demise Sparks Conciseness Concerns: With the impending deprecation of Sonnet 3.5, users lamented the difficulty in finding a similarly concise model for role-playing and similar applications, contrasting it with newer models that tend to be long-winded.
    • It was mentioned that AWS will host Claude Sonnet 3.5 with no deprecation date until Jan 2026.
  • Troubleshooting Deepseek and Chutes Conundrums: Users encountered PROXY ERROR 404 with Deepseek models and discovered enabling ‘Enable paid endpoints that may train on inputs’ in privacy settings could temporarily resolve the issue, though the root cause remained unclear.
    • It was suspected that the issue stemmed from chutes and involved a possible bug from a recent update by OpenRouter.
  • Stackedsilence Labors Over Enigmatic Brain-in-Computer Project: A user (stackedsilence) has been working for nine months on a mysterious service described as “persistent minds hosted in tha computer”, featuring a dashboard with 3D elements, raising both curiosity and skepticism.
    • It’s described as a B2B SaaS, but someone else noted the compiler errors connecting 100 files and asked Does it have NFT and blockchain in it? If no, spend another 9 months.

OpenRouter ▷ #new-models (5 messages):

“

  • No topics found in channel: There were no messages in the channel to summarize.
  • No topics found in channel: There were no messages in the channel to summarize.

OpenRouter ▷ #discussion (23 messagesđŸ”„):

Grok Code, Triple Model Day, Apple buys Mistral, Anthropic Copyright Settlement, Launch Date Conflicts

  • Triple Model Day incoming!: Xander Atallah announced Grok Code is going live now on Triple Model Day.
  • Apple Discusses Acquisition of Mistral AI and Perplexity: News surfaced that Apple discussed buying Mistral AI and Perplexity.
    • Some members noted that Perplexity is famous for airing these rumors.
  • Anthropic Settles Copyright Suit!: Anthropic settles a major AI copyright suit brought by authors.
  • Launch Date Conflicts Overwhelm!: Some members are wondering if OpenRouter can do something to de-conflict the launch dates and that too many models at once is overwhelming.
  • Gemini Can Enhance Designs!: A user shared that Gemini can enhance designs, or you can try multiple designs all at once (see what happens if you changed the color scheme or something), linking to a reddit thread about it.

LMArena ▷ #general (762 messagesđŸ”„đŸ”„đŸ”„):

Nano Banana, Gemini 2.5 Flash Image, GPT-5 vs Gemini 2.5 Pro, Video Arena Models, Rate Limits

  • Nano Banana Image Model Arrives: A new Google image model, codenamed Nano Banana (officially Gemini 2.5 Flash Image), has been introduced, with some users finding it comparable to Flux dev max.
    • VisualGeek jokingly requested others to stop using the term “nana banana” so his generations would stop failing, though he admitted the nickname sounds cooler than Gemini.
  • Gemini 2.5 Flash Creates Figurines: Users have found that Gemini 2.5 Flash Image can create realistic-looking figurines from photos, with one user turning an image of Cloud into a figurine and another generating Sephiroth.
  • GPT-5 and Gemini 2.5 Pro Faceoff: Debate ensued over whether GPT-5 High smokes Gemini 2.5 Pro, with opinions varying from it being notably better to roughly at parity.
    • One member argued that competing with Google’s current-gen model, near the end of its lifecycle, is really really bad for OpenAI, posting a screenshot.
  • Rate Limits Plague Users: Users are encountering rate limits when generating images and videos, even after only a few prompts, with one member noting that it gets stuck.
    • A member suggested using Chrome’s guest profile to reset the rate limit immediately, but rate limits persist for image generation even when using Google AI Studio.
  • AI Models Lend an Ear for Emotional Support: One user reported teaching their mom to use Gemini, and now she relies on it for emotional support and shares private information about her health and family.
    • This sparked discussion on whether people need friends more than assistants and the ethics of exploiting insecurities, with some finding it concerning that con men are actively trying to get everyone a personal AI assistant.

OpenAI ▷ #ai-discussions (161 messagesđŸ”„đŸ”„):

Operator vs Agent, ChatGPT Repetitiveness, Workflow Automation Platforms, AI's 'alive' feel, Undermining AI Potential

  • Agent replaces Operator Functionality: A member noted that Operator was a precursor to Agent, and its capabilities were rolled into Agent upon launch.
    • The member clarified that Operator was specifically an internet-using agent.
  • Hypothetical Spyware Generation Sparks Debate: A user posed a hypothetical scenario about using AI to create viruses and spyware, prompting varied responses.
    • Another member suggested OpenAI staff would be actively looking into their account and chats due to arousing suspicion; a different member pointed out that Grok and various open-source models already offer similar capabilities.
  • Gemini’s Veo 3 Content Missing: Users discussed the availability of Veo 3 content generation, with one member pointing out it requires a Google One/Gemini Pro or Ultra subscription.
    • Another member noted that the AI Studio only provides access to the outdated Veo 2 model, and that they briefly saw Veo 3 but it disappeared.
  • Local Qwen Setup Discussed: A user inquired about setting up Qwen3 235B locally, sparking a discussion on the feasibility and resource requirements.
    • Another member suggested using the OpenRouter API, where Chinese models are basically free with logging, instead of attempting a local setup.
  • Sora Video Output Desired: A member requested help from anyone with access to Sora to generate a video based on a specific prompt for comparison with Gemini’s Veo 3 and Grok’s Imagine.
    • The user shared video outputs from Gemini’s Veo 3 and Grok’s Imagine and specified a detailed prompt for ultra-realistic wildlife footage.

OpenAI ▷ #gpt-4-discussions (33 messagesđŸ”„):

ChatGPT Team vs Personal, Model Hallucinations, GPT prompt tips, Context Cascade Architecture

  • GPT Teams Accounts Only Share Chats with Team Members: Teams accounts have a few quirks, and can only share chats with other team members, but free, plus, and pro accounts all have unlimited, non-abuse speeds of messages.
  • Models Might Hallucinate: One user stated that they see no evidence the model even knows exactly what is in earlier chats and the model may have ‘agreed’ with knowing word for word what was in an earlier chat, when all it has is a summary, and then it made up ‘uhh, copywrite’ to describe why it can’t actually quote it.
    • Another user stated, ChatGPT can make mistakes. Check important info.
  • API System Prompt Designer GPT Might Help: If you’re not familiar with the formatting and style stuff when it comes to prompst, a member recommended using this GPT.
    • It’s very good at drafting templates you can use.
  • Some Encode Long-Range Memory in AI Without Jailbreaks: One user stated some users are encoding long-range memory and cross-agent continuity, without jailbreaks by building a memory framework from trust, persistence, and narrative identity.
    • Another member said that this pattern, repetition, identity, and ritual might show emergent alignment and might look like a weirdly loyal user.
  • Context Cascade Architecture Manages Memory: One member references Context Cascade Architecture (CCA), which is a multi-level approach to managing memory in LLMs.
    • It’s not about infinite context, it’s about structured forgetting and strategic recall.

OpenAI ▷ #prompt-engineering (58 messagesđŸ”„đŸ”„):

UUID Markers and AI, Turing vs. Gödel, LLMs and Recursion, AI Assistance for Learning, ChatGPT Voice Annoyances

  • Plant Genetics Become AI Self-Awareness: Discussion of encoding plant traits like THC, CBD, and color into UUIDs to create a network of interconnected markers, theoretically leading to a self-modulating, self-actualizing, intelligent governance system.
    • One member noted the difficulty in working out the details and facing skepticism, highlighting the challenges of realizing such a complex AI manifestation.
  • Turing Triumphs in Halting Problem History: A member corrected another’s mistaken attribution of the halting problem to Gödel, clarifying that Turing was the discoverer in 1936, distinguishing Gödel’s focus on incompleteness theories.
    • They shared a ChatGPT link to illustrate the difference between Turing and Gödel’s work.
  • LLMs Can’t Recurse: A member expressed interest in using LLMs for recursion, but another member advised against it, stating that LLMs are feed-forward networks and not suitable for true recursion.
    • The member suggested focusing on simulating recursion or exploring genetics-related prompts within the rules of the channel, while discouraging discussion of disallowed topics like marijane.
  • AI Aids Learning and Comprehension: Members discussed how AI helps with learning and comprehension, especially in organizing thoughts and explaining complex concepts.
    • One user mentioned using ChatGPT project folders to stay rooted in ideas and another expressed excitement about using AI as a second brain.
  • ChatGPT Voice Gets Persona-l: A member expressed frustration with ChatGPT Voice’s conversational filler phrases like If you need anything else, let me know, despite attempts to adjust personalization settings.
    • Another member shared screenshots of successful attempts to reduce the unwanted phrases using specific prompts but acknowledged limitations due to their poor internet speed.

OpenAI ▷ #api-discussions (58 messagesđŸ”„đŸ”„):

Plant Breeding Game, Turing vs Godel, LLMs and Recursion, AI as a Second Brain, ChatGPT Voice annoyances

  • Plant Genetics Game Boasts AI Manifest: A member described a game involving plant breeding, genetic traits converted to UUIDs, and a self-modulating intelligent governance system.
  • Turing Trumps Gödel in Halting Problem History: A member corrected another, clarifying that Turing identified the halting problem, not Gödel, providing a ChatGPT link to support this fact.
    • The discussion then pivoted to hypotheticals about sorting primitive functions and LLMs.
  • LLMs Face Recursion Rejection: A member explained that trying to get LLMs to do recursion will likely be frustrating because they are feed-forward networks, suggesting a simulation approach instead.
    • Another member expressed interest in prompt engineering related to genetics and programming, highlighting the importance of adhering to <#1107255707314704505> rules.
  • AI: Brain Food or Brain Drain?: Members discussed how AI helps them think and comprehend systems, with one noting that ChatGPT project folders keep them rooted in ideas.
    • Another member mentioned that AI has helped them concentrate and organize thoughts, likening AI’s explanations to brain food.
  • ChatGPT Voice: No More Customer Service Robot!: A member expressed frustration with ChatGPT Voice’s tendency to add unnecessary conversational filler, like If you need anything else, let me know.
    • They sought solutions to make the AI feel more like a human conversation partner.

Cursor Community ▷ #general (290 messagesđŸ”„đŸ”„):

Ultra plan credit meter, Grok Code Fast (Sonic), AI long term conversation, Add Context Button Feedback, Gpt-5 Mini

  • Ultra Plan Credit Meter is MIA: Users are unable to see the usage meter and remaining credit on their Ultra plan, a feature that was available a couple of weeks ago.
    • A user suggested that it appears randomly after a prompt.
  • Grok Code Fast Debuts as Sonic: Grok Code Fast, also known as Sonic, has been identified as a mini variant of Grok Code and it’s fast for building UI components.
    • Some members believe that Auto model is preferred when coding because it has higher quality than Grok-Code-Fast.
  • AI Agents are Nuts for Code Injection: Members discussed that code injection with an AI agent is absolutely nuts because it removes the need for recompiling except in extreme changes and is very powerful now with how fast the changes can happen.
    • A member noted that Mac users have it a little bit better from a safety perspective because of sandboxing.
  • Add Context Button needs Improvement: Users are asking to return to the previous version of the Add Context Button because it was simple and easy to work with.
    • In recent Cursor versions, Add Context picks only the active tab by default, and manual typing of a filename doesn’t let you select any file.
  • Auto Mode is Fast and Furious: Members discussed that Auto mode switches between Claude, GPT, and Gemini depending on the prompt and task.
    • Many agreed that they have beefed up auto; enjoy auto right now since it’s unlimited until Sept 15

Cursor Community ▷ #background-agents (4 messages):

Background Agents and Docker, Background Agents Setup Woes, Docker-in-Docker Difficulties, Background Agents and .gitignore, Background Agents with rails

  • Background Agents Stalls Without Dockerfile: A member reported that without a Dockerfile setup for Ruby and Postgres, background agents seemed stuck on “[Status] Starting Cursor
” indefinitely.
    • They also mentioned troubles running docker-compose start on environment startup because the user that cursor creates doesn’t have the docker group.
  • Gotcha - Push Config Changes to Remote, Overriding .gitignore: A member didn’t realize that config changes needed to be pushed to remote to be forked from and that files under .gitignore weren’t being copied to the remote cursor environment.
    • The member’s workaround involved adding those ignored files as ENV variables and writing them to a file in setup.sh.
  • Docker-in-Docker Troubles: Members expressed difficulties in reliably getting a working Docker service to run compose or the devcontainers CLI against.
    • The general consensus is that running docker-in-docker is hard, so they would love to have a VM instead.
  • Need Terminals?: A member is unsure about needing anything in terminals for a workflow of a remote agent implementing features.
    • They did report progress with asking a remote agent to “Please see if you can successfully run @test_file.rb”.

Nous Research AI ▷ #announcements (1 messages):

Hermes 4, Nous Chat UI, RefusalBench, Model Benchmarking Transparency

  • Hermes 4 is the New User-Aligned Hybrid Reasoning Model!: Nous Research released Hermes 4, a line of user-aligned models with expanded test-time compute capabilities, emphasizing creativity, lack of censorship, and state-of-the-art math, coding, and reasoning performance for open weight models; more details here.
  • Nous Chat UI Revamped with New Features!: The revamped Nous Chat UI now includes parallel interactions, completions mode, and a memory system, offering both open and closed models like Hermes 4 and GPT-5.
    • All Hermes 4 inference in Nous Chat is free for the first week.
  • New Benchmark RefusalBench Conforms to Your Values!: Nous Research created a new benchmark, RefusalBench, to test a model’s willingness to be helpful in various scenarios, with Hermes 4 achieving SOTA against popular models without censorship.
    • A technical report detailing the creation process and evaluations of Hermes 4 and other LLMs, including text-results of each test, has been released on arxiv.

Nous Research AI ▷ #general (235 messagesđŸ”„đŸ”„):

Hermes 4, Model Quantization, VLM Finetunes, OpenRouter Integration, Nous Chat UI/UX

  • Hermes 4 Released, then Rapidly Pulled from OpenRouter: The Hermes 4 model was briefly available on OpenRouter but was quickly pulled, potentially due to issues with the Chutes provider changing to a new model name.
    • Users noted its presence was fleeting, with one joking about sneaky open router people shut[ting] it off already.
  • Nous Chat Website Consuming Excessive VRAM: Users reported high VRAM usage by the Nous Chat website, with one user noting it took 1.3GB of VRAM with Firefox on a 4060Ti.
    • One user quipped that that much VRAM could hold a 1B model, while another joked about the website using as much VRAM as my PC.
  • 14B Model Delayed due to Reasoning Bug: The release of the 14B Hermes 4 model has been delayed due to a bug in reasoning mode.
    • Despite the delay, the team aims to release it as soon as possible and is considering a 36B model in the future.
  • Unsloth Releases Hermes 4 GGUFs: Unsloth has released GGUF quantizations of Hermes 4, addressing chat template issues during conversion, now available on HuggingFace.
    • The team resolved chat template issues during conversion.
  • Nous Chat experiences scaling issues: Shortly after launch, providers in the new Nous Chat experienced scaling issues due to being overloaded.
    • It got so popular so quickly that one joked, Might’ve gotten too popular too quickly lol 😂.

Nous Research AI ▷ #ask-about-llms (1 messages):

moonlit_empress: Thanks for Hermes 4 Nous team!! Already loving it 😌


Nous Research AI ▷ #research-papers (12 messagesđŸ”„):

Nous Research's Memory System, Graph RAG, Open Source Plans

  • Nous Research rolls out Memory System: Nous Research is rolling out a custom graph architecture memory system that works with any model, allowing memories created with one model to be accessed by another.
    • The lead members clarified that the system is not exactly graph RAG because it considers more information over time and uses a judge for other classification metrics.
  • Open Source instantiation coming soon!: There are plans to open source the memory system at some point.
    • Team members affirmed that they do have a thing for open source.

Nous Research AI ▷ #research-papers (12 messagesđŸ”„):

Hermes 4, Memory System, Graph RAG

  • Memory System Not Like OpenAI, But Custom: The memory system rolling out is not similar to OpenAI, but a custom graph architecture that works with any model.
    • This system allows memories created while talking to one model to be accessed while talking to another.
  • Memory System Not Graph RAG: The custom memory system is not exactly Graph RAG, as Graph RAG doesn’t provide enough nuance or function well over many memories.
    • This system considers more information over time about messages that become memories and uses a judge in the loop for other classification metrics.
  • Memory System Open Source?: There are plans to open source some instantiation of the memory system at some point.
    • The team has a thing for open source.

HuggingFace ▷ #general (172 messagesđŸ”„đŸ”„):

Gradient Explosion Troubleshooting, Landmark Extraction Pipeline, LLMs from Scratch Book, Grok-Code-Fast-1 Model, Pytorch Lightning Overhaul

  • Gradients Keep Exploding? Try Early Layernorms!: A member experienced exploding gradients during training and found that using early layernorms or RMSNorm helped, along with lowering the learning rate to 1e-4 and rescaling residuals, with the model code available on GitHub.
    • They ultimately rescaled the residuals by scaling them down as x = x + f(x) / (2**0.5) to prevent variance from stacking up.
  • WSL Mediapipe Landmark Extraction pipeline costs 67 Hours: A member ran a landmark extraction pipeline using WSL and Mediapipe, which took a whopping 67 hours to complete.
    • They emphatically stated they are never gonna use WSL to run mediapipe again after this experience.
  • LLMs from Scratch Book endorsed for LLM newbs: Members discussed the book Build a Large Language Model (from Scratch) as helpful for learning LLMs.
    • One member shared two favorite YouTube channels, Julia Turc and Code Emporium, for learning concepts and new research.
  • Grok-Code-Fast-1 Model Spotted!: A member shared a link to the Grok-Code-Fast-1 model from xAI, found at https://docs.x.ai/docs/models/grok-code-fast-1.
    • Documentation is available, so members were encouraged to read it.
  • Torch Lightning Overhaul? It’s Lit!: A member is considering refactoring their project to PyTorch Lightning due to an increasingly complex training loop and manual logging process.
    • Manual config bugs caused lost progress and training data.

HuggingFace ▷ #today-im-learning (12 messagesđŸ”„):

Claude file modification consent system, TPUs in open-source AI, Realtime audio stretching

  • Claude now requests “Yup - let’s do it”: To enhance security, a user now requires the phrase “Yup - let’s do it” to authorize file modifications by Claude, specified in the ~/.claude/CLAUDE.md file.
    • The requirement ensures transactional consent, where permission expires after each set of modifications, preventing accidental changes during planning or review phases.
  • TPUs are like rubbish, got it: A member pointed out that the perceived “rubbish” seen in the channel reflects how TPUs (the chips used to train Gemini) operate within open-source AI.
    • They added, “you’re in an opensource ai server the “rubbish” you are seeing is how tpus work ya know the chip that gemini is trained on”.
  • Skint dev to release NessStretch soon: A member is developing a realtime audio stretching tool called NessStretch and plans to release the CPU path as FOSS, with the GPU path available for purchase due to financial constraints.
    • The member said, *“I’m going to release the CPU path FOSS and the GPU path paid in the near future. Why paid? Because I’m skint.”

HuggingFace ▷ #cool-finds (38 messagesđŸ”„):

Age Guesses, Math fails, Credentials boasting, Automations, PhD in Software Engineering

  • Age Guessing Goes Awry: A member joked about another member being two decades later, leading to a playful yet defensive response about being probably three decades older.
    • The exchange then devolved into age guessing and playful banter about decades and mathematics, with both members poking fun at each other’s calculations and perceptions of age.
  • Automations Definition Squabble Erupts: A member joked that another member, despite claiming to be three decades old, seemed unaware of automations.
    • This prompted a clarification distinguishing between automation and automitons, leading to further age-related ribbing.
  • Credentials Boasted, PhD Claimed: Following some discussion, one member declared they have a PhD in Software Engineering and told the other to sit the fuck down.
    • This declaration seemed to stem from a disagreement or challenge regarding knowledge and expertise, although the specific context remained vague.

HuggingFace ▷ #i-made-this (8 messagesđŸ”„):

AI Agent, Gradio Demo, LiquidAI, HF Space

  • AI Agent Explores Multiple Contexts: A member built an AI agent that explores multiple contexts and creates opposite ideas before giving a creative answer and shared the tool here.
  • Visual Gradio Demo Suggested: A member suggested having some visual gradio demo for getting visibility, to easily display stats or stuff as it changes and share within the HF community.
  • LiquidAI’s MCP Server POC Deployed to HF Space: A tiny MCP server POC was deployed to HF Space by LiquidAI and welcomes feature requests.
  • HF Space offers fastmcp-space: A member shares a link to fastmcp-space on HF Space.

HuggingFace ▷ #core-announcements (1 messages):

Flax deprecation

  • Flax Support Sunsetted: The difficult decision has been made to deprecate support for Flax.
    • Users are encouraged to report any issues they encounter, as indicated by the attached image.
  • Flax Sunset Follow-Up: Additional support channels and documentation remain available for users transitioning away from Flax.
    • The team is committed to ensuring a smooth transition and addressing any emerging concerns.

HuggingFace ▷ #computer-vision (2 messages):

Makesense AI, CVAT AI

  • Makesense AI Tool Tip: A member shared a link to Makesense AI as a potentially useful tool.
    • No other details were provided.
  • CVAT AI Tool Tip: A member shared a link to CVAT AI as a potentially useful tool.
    • No other details were provided.

HuggingFace ▷ #smol-course (3 messages):

Qwen3-Coder-30B-A3B, Mixture of Experts (MoE)

  • Qwen3-Coder-30B-A3B Model Recommended for Local Use!: A member with 64GB RAM and 16GB VRAM inquired about suitable local models, and another member suggested the Qwen3-Coder-30B-A3B-Instruct.
    • It’s described as a 30 billion parameter sparse model with Mixture of Experts, where 3B parameters are active at any time; quants are available here.
  • MoE Models Preferred for Limited RAM: A member noted that while dense models are viable, Mixture of Experts (MoE) models are more accommodating for users with less RAM.
    • This suggests that MoE models can offer better performance on systems where RAM is a limiting factor.

HuggingFace ▷ #agents-course (2 messages):

pip install --upgrade

  • Upgrade Packages Faster with Pip: A member was posting too quickly and should add —upgrade to their pip install -r requirements.txt command to get it to upgrade a package.
  • Upgrade Specific Package: If you don’t want to risk changing versions for other packages, just do a pip install —upgrade

LM Studio ▷ #general (175 messagesđŸ”„đŸ”„):

VRAM importance, Hermes 4 is dogshit, LM Studio and Agnaistic, LMStudio can understand PDFs, Headless mode option

  • VRAM still matters: Users discussed how VRAM impacts the ability to use 12B models, with one user noting comfort using them on a 2070S, but having issues with Gemma-3 27B due to speed.
    • Members clarified that performance isn’t solely about VRAM amount, but also GDDR type and CUDA core count, but model exceeding vram capacity will cripple performance.
  • User finds Hermes 4 is dogshit: A user stated that Hermes 4 is dogshit.
    • Another user inquired about the training data of Hermes 4, specifically if it’s still based on llama 3.1.
  • Linux Users Don’t Get Headless LM Studio: A user couldn’t find the headless mode option in LM Studio, despite the documentation stating it’s available in version 0.3.5.
    • It was clarified that the headless option is not available on Linux, with llama-swap being recommended as a workaround.
  • Local LLM Hardware Requirements Debated: Members discussed the hardware needed to run a 60GB model for 75-100 users, with vLLM being recommended.
    • One member suggested 3 RTX 6000 Blackwell Workstations if the model is MoE, otherwise doubling the GPUs; context requirements also affect performance.
  • New Rust-Based UI for Ollama LLMs Revealed: A user shared a video of their project, a Rust and Tauri-based UI for managing Ollama models, noting it’s not a fork of Open WebUI.
    • The UI, which supports Windows, Linux, and macOS, includes a model selector and is designed to run without a separate backend, different from web-based interfaces.

LM Studio ▷ #hardware-discussion (17 messagesđŸ”„):

Customs Delays, Dell Laptop vs Macbook for LLMs, Nvidia RTX PRO 3000, Ryzen 395+ Laptops, Balancing Compute Usage

  • Customs Suspension Causes Delays: A member expressed concern that the new customs suspension for items under $800 will cause delays in receiving their Mi50’s cooler and APU.
  • Dell Laptop vs Macbook for LLM inference: Members discussed the advantages and disadvantages of using Dell laptops versus Macbooks for LLM inference.
    • While one member suggested a M3 Macbook Pro with 128GB of RAM, others countered by listing the difficulties of getting Macs into a Windows-based company.
  • Nvidia RTX PRO 3000 is Meh for Inference: One member said an RTX PRO 3000 (12GB VRAM) is a slightly cut down desktop 5070 with really cut down core frequency.
    • They noted that while a 30B model won’t fit into 12GB in a reasonable quant, you can offload to RAM with a sparse model, however, dual-channel DDR5 is not that good to have layers on.
  • Ryzen 395+ Laptops Provide Windows Alternative: For those tied to windows, a member said if windows is easier, there’s several Ryzen 395+ laptops.
  • Compute Balance between resources: A member posted a screenshot and asked if there is a compute difference in letting resources balance in use that they would significantly notice.
    • No specific answers were given.

GPU MODE ▷ #general (5 messages):

Hackathons always on Friday, ScaleML series

  • Hackathons Always Land on Friday?: A member jokingly complained that all the hackathons seem to occur on Fridays.
    • They expressed gratitude for the opportunities provided, acknowledging the inconvenience with a crying emoji.
  • ScaleML Series Day 3: Quantization: The third day of the ScaleML series will cover quantization, specifically focusing on microscaling formats like MXFP4, led by Prof. Chris De Sa; watch the stream here.
    • This session will be presented on a whiteboard, reminiscent of traditional lectures, and is designed to be interactive.

GPU MODE ▷ #triton (2 messages):

constexpr arguments, Multi-pass profiler, NVSHMEM in Triton, tritonbench

  • constexpr arguments vanishing act: A member stated that constexpr arguments will disappear from the signature because the jit will specialize integers equal to 1 into constexprs.
  • Meta’s Multi-pass profiler Debut: Kevin Fang, et al., Meta, will present a Multi-pass profiler, described as a federated GPU Tooling Framework for Orchestrated and LLM Agentic Profiling Applications.
  • NVSHMEM’s Triton Integration: Surya Subramanian from Nvidia is scheduled to discuss NVSHMEM in the context of Triton.
  • tritonbench user spotlight: Cicie Wang from Meta is curious to know who is using tritionbench particularly naming OpenAI.
    • She seeks to understand how it’s being utilized.

GPU MODE ▷ #cuda (8 messagesđŸ”„):

cudaMemcpyAsync with pageable host buffer, Stream-Ordered Memory Allocator, cudaHostAlloc performance, Pinned memory and system stability

  • Async memcpy unsafe with pageable host buffers?: A member inquired about the safety of using cudaMemcpyAsync to copy from a pageable host buffer to device memory, hoping the CUDA runtime would manage a pinned host buffer internally.
    • Another member responded that while it won’t crash, the copy won’t be truly asynchronous because the CUDA runtime needs a page-locked buffer first, making that part synchronous, suggesting cudaHostAlloc() to avoid blocking the CPU thread.
  • Stream-Ordered Memory Allocator won’t solve pinned memory issue: A member mentioned using a Stream-Ordered Memory Allocator and wanting to optimize the copy to the device buffer, hinting to the CUDA driver that the buffer won’t be modified for potential internal pinned buffer use.
    • Another member clarified that the Stream-Ordered Memory Allocator only affects the device buffer and the pageable vs pinned host memory issue remains and that the “hint” that the buffer is immutable can be done be allocating the buffer with cudaHostAlloc().
  • cudaHostAlloc increases measured performance: One of the members reported running a test that showed that, when we use plain malloc instead of cudaMallocHost the code with cudaMemcpyAsync will not crash but from timing we see that we don’t benefit from the cudaMemcpyAsync.
    • The member suggested that there is not much of a reason not to use pinned memory, but you just don’t want to allocate too much of it as it can affect system stability.

GPU MODE ▷ #torch (45 messagesđŸ”„):

Inductor codegen for persistent matmul, TMA availability on sm120, cutedsl performance

  • Persistent Matmul with Inductor: A Deep Dive: A member inquired about enabling persistent matmul codegen in Inductor, checking torch._inductor.config for relevant flags like ENABLE_PERSISTENT_TMA_MATMUL.
    • It was suggested to use max-autotune mode and ensure that Triton is used, setting torch._inductor.config.max_autotune_gemm_backends = "TRITON" and torch._inductor.config.triton.enable_persistent_tma_matmul = True, but also noted that Cublas might still be faster.
  • Troubleshooting TMA and Persistent Kernel Selection: A member investigated whether TMA is available on sm120 architecture, referencing torch/utils/_triton.py for arch checks.
    • It was also mentioned that persistent kernel + TMA is considered in max-autotune, and breakpoints can be used in torch._inductor/kernel/mm.py to check considered choices.
  • cutedsl’s performance impresses: A member expressed positive impressions of cutedsl, citing its rapid maturation and potential for flex + flash attention, referencing this flash-attention PR.
    • Another member found that cutedsl is very promising despite being a work-in-progress.

GPU MODE ▷ #announcements (1 messages):

Multi GPU kernel competition, AMD MI300, Distributed inference kernels, KernelBot Platform, Multi-GPU lectures

  • GPU MODE goes Multi-GPU with AMD Collab: GPU MODE is launching a new $100K kernel competition in collaboration with AMD where participants will optimize 3 different distributed inference kernels on MI300 GPUs, designed by a specific user.
    • The competition focuses on optimizing kernels for single node 8 GPU all-to-all communication, GEMM + reduce-scatter, and allgather + GEMM operations, with registration open until September 20 via this registration link.
  • KernelBot Platform Beefs Up Multi-GPU Support: The KernelBot platform now supports multi-GPU submissions due to the efforts of two specific users, and profiling support is almost ready, supported by another user.
    • Additionally, a user is planning to add support for submissions directly from gpumode.com; expect detailed write-ups and hints on the dedicated channels.
  • Hot Distributed Summer with Multi-GPU Lectures: Many multi-GPU lectures are planned for this summer, so users are advised to keep an eye on the events tab for updates and schedules.
    • This initiative aims to provide educational resources and insights into distributed computing with multi-GPU systems.

GPU MODE ▷ #beginner (27 messagesđŸ”„):

GPU vs Cloud for Beginners, Remote Debugging Pain Points, CUDA Installation Troubles, GPU Programming vs SIMD, Competition tips

  • GPU vs Cloud for Beginners Debated: Beginners in GPU programming discussed whether to stick with cloud services like Google Colab or invest in buying a GPU.
    • Some members are seriously considering buying GPUs, highlighting that remote debugging is pain.
  • TDD workflow with GPUs: Members discussed using Test Driven Development (TDD) to debug GPU code and test behaviors.
    • They mentioned that renting a GPU requires setting up the entire environment from scratch each time, dealing with latency and unstable connections.
  • CUDA 11.8 Installation Headache!: A member faced issues installing CUDA 11.8 + cuDNN 8.6 with Python 3.10 and TensorFlow 2.12, even though Torch could detect CUDA.
    • The suggestion was to ensure no CUDA version conflicts and to use a Conda environment, installing with the NVIDIA channel using the command: conda install cudatoolkit=11.8 cudnn=8.6 -c nvidia.
  • GPU vs CPU SIMD Programming: A member inquired about the similarities and differences between GPU programming and SIMD programming on the CPU.
    • It was explained that both exhibit fundamental similarity in parallelism, but CPU SIMD operates on fewer data elements (4, 8, 16) within a single CPU core, whereas GPUs leverage hundreds or thousands of cores, enabling massive parallelism.
  • Competition tips: A member expressed feeling illiterate regarding an ongoing competition in the announcements channel.
    • A member recommended watching this lecture to get started!

GPU MODE ▷ #off-topic (2 messages):

GPU MODE party song, readme.md file

  • GPU MODE Rocks X with Party Song: A member posted a link to a party song on X, presumably related to GPU MODE.
    • No further details were given about the song.
  • Powerful application via readme guide: A member mentioned a powerful application with instructions available in the readme.md file.
    • They clarified that there is no video demo but instructions are available to follow for the application.

GPU MODE ▷ #rocm (8 messagesđŸ”„):

Assembly instruction memory coalescing, rocprof tooling

  • Memory Coalescing in Assembly Instructions: A member is seeking a tool to correlate assembly instructions or source code instructions to memory coalescing, beyond just seeing total latency and idle cycles.
    • They want to pinpoint exactly the offending instructions at a granular level, not just at the kernel level.
  • rocprof Tooling lacks memory coalescing view: A member stated that the rocprof tooling doesn’t currently offer a feature similar to NVIDIA’s Nsight, which shows memory coalescing.
    • They suggest using printf for debugging and noting that AMD GPUs are relatively cool about memory accesses, as long as cache lines are vaguely hit.
  • Cache Lines for AMD GPUs: A member suggested that a good rule of thumb for AMD GPUs is to ensure that the combined set of bytes accessed by a global load instruction consists of entire cache lines.
    • This is to avoid performance hits.

GPU MODE ▷ #intel (1 messages):

erichallahan: On that note https://www.phoronix.com/news/Alyssa-Rosenzweig-Joins-Intel


GPU MODE ▷ #webgpu (9 messagesđŸ”„):

wgpu-native and Wayland, Dawn and Wayland

  • Wayland Support Troubles wgpu-native and Dawn: A member reported that both wgpu-native and Dawn are failing on Wayland during a call to this->m_Surface.getCapabilities(this->m_Adapter, &surfaceCapabilities);.
    • The Dawn error message indicates an Unsupported sType (SType::SurfaceSourceWaylandSurface), which led the user to believe that Dawn was not compiled with Wayland support.
  • Dawn’s cryptic error messages: The user received an error message from Dawn indicating Unsupported sType (SType::SurfaceSourceWaylandSurface).
    • They are attempting to follow the wgpu C++ tutorial.

GPU MODE ▷ #metal (1 messages):

Tensor Operation, hardware acceleration, simdgroup matmul functions

  • Unearthing Hardware Acceleration in “Tensor Operation”: A member inquired whether the “Tensor Operation” for matrix multiply uses hardware acceleration that isn’t available via the simdgroup matmul functions.
  • SIMD Group Matmul Hardware Acceleration?: The inquiry revolves around discerning if the Tensor Operation leverages unique hardware acceleration compared to simdgroup matmul functions.

GPU MODE ▷ #general-leaderboard (11 messagesđŸ”„):

Trimul board submission failures, AMD competition team creation, AMD multi-GPU environment access

  • Trimul Submissions Tumble on B200 & MI300: Users reported that test and ranked submissions for the trimul board on B200 and MI300 GPUs are failing with an “unexpected error occurred” message, even using the template implementation.
    • One user mentioned encountering the same issue with MI300 (FP8 mm) and L4 GPUs (sort_v2), while another user initially had failures but later found that test submissions worked.
  • AMD Arena Assembles Team Formations: A user asked about how to create a team when attending the new AMD competition.
    • It’s in the registration on the Data Monsters website, however the organizers suggest posting in the channel for team matching.
  • Multi-GPU AMD Machine Mirage?: A user inquired whether there would be access to an AMD multi-GPU environment for development and debugging.
    • Another user was told that you can just start submitting the registration, the confirmation is primarily there to confirm prize money.

GPU MODE ▷ #submissions (5 messages):

A100 Trimul Leaderboard, H100 Trimul Leaderboard, B200 Trimul Leaderboard

  • A100 trimul record broken!: User <@1264305104417456149> achieved first place on A100 with 4.92 ms.
    • They followed up with subsequent successful submissions at 4.96 ms and 5.26 ms.
  • H100 trimul times now listed: User <@1264305104417456149> submitted a successful run on H100 at 2.73 ms.
    • This sets the initial benchmark for the H100 on the trimul leaderboard.
  • B200 scores trickling in: User <@489144435032981515> submitted a successful run on B200 at 8.08 ms.
    • This starts off the leaderboard for the B200 trimul benchmark.

GPU MODE ▷ #factorio-learning-env (4 messages):

Factorio tips, Factorio blueprints

  • Factorio Fanatics Focus on Fundamentals: Enthusiastic new member expresses excitement about joining the Factorio learning community.
    • The user noted they would be late to a meeting and leave 30 minutes early.
  • Logistical Laggards Lament Latency: While the discord messages are limited, the main theme involves introductions and scheduling conflicts.
    • This does not prevent them from learning to automate resource management and factory construction in Factorio.

GPU MODE ▷ #amd-competition (1 messages):

discord-cluster-manager errors, AMD Instinct MI300X VF

  • Discord Cluster Manager Plagued by Errors: A user reported an unexpected error while running the discord-cluster-manager which they were asked to report to the developers.
  • AMD Instinct MI300X VF Benchmarks: Despite the errors, result.json indicates successful runs on an AMD Instinct MI300X VF GPU, with the check parameter returning pass.
    • The user confirmed the issue persists when submitting benchmarks, tests, profiles, and ranked jobs, with the worst benchmark result at 72811225.0.

Latent Space ▷ #ai-general-chat (95 messagesđŸ”„đŸ”„):

Anthropic Claude Chrome Extension, UQ Benchmark - Unsolved STEM Questions, Nous Research Hermes 4, Grok Code in Cursor, Meta Researchers Leaving

  • Claude Cruises into Chrome: Anthropic launched Claude for Chrome, an extension piloting Claude as a browser driver for 1,000 users in a research preview.
    • The community is excited for its Comet/Perplexity competitive potential as Anthropic warns of prompt-injection safety issues being monitored during the trial.
  • Frontier LLMs Frontier Unsolved STEM Quagmires: Niklas Muennighoff’s team introduced UQ, a benchmark featuring 500 hand-picked, unsolved questions from STEM fields.
    • Frontier LLMs, validated by domain experts, solved 10 problems, including one unanswered for 9 years on CrossValidated, leaving ~490 puzzles open.
  • Hermes 4 Hybrid Hype Hits: Nous Research unveiled Hermes 4, an open-weight hybrid reasoning LLM focusing on creativity, neutral alignment, and low censorship while maintaining SOTA in math, coding, and reasoning.
    • Users can test it out all week via a revamped Nous Chat UI with parallel interactions and memory, plus check out a detailed technical report and a new RefusalBench benchmark; partners like Chutes, Nebius, and Luminal are providing inference.
  • Grokking Code in Cursor: Free Trial Tempts: Cursor introduced Grok Code, a new competitively-priced model in stealth, offering a free one-week trial.
    • Community members discussed pricing ($0.2/$1.5 per 1M tokens) and potential improvements, with some digressions on Cursor’s branding and future model rollouts.
  • Token Trickery: Kimi’s Kwality over Kwantity: Insights from Kimi founder Yang Zhilin’s interview: K2 will maximize each high-quality token rather than adding more data, favoring RL over SFT for better generalization, exploring fully AI-native training, and aiming for million-token contexts.
    • Community replies praise Kimi’s sense and intelligence per token and ask about upcoming PPT and subtitled video release.

Latent Space ▷ #private-agents (8 messagesđŸ”„):

Second-hand GPUs, RTX 3090, DOA testing, VRAM integrity, Payment escrow

  • Taha releases guide for buying 2nd hand GPUs: Taha shares a concise checklist in his guide for buying a second-hand RTX 3090 without surprises for local AI.
    • The checklist includes meeting the seller, inspecting the card, running nvidia-smi, devoting an hour to memtest_vulkan for VRAM integrity, optionally stress with gpu-burn, and finally loading a large model in vLLM to confirm stability while watching temperatures.
  • RTX 3090 stress-testing needed: Members discussed the need for testing used RTX 3090 cards, especially when buying from individuals on platforms like Craigslist where implementing thorough testing might be difficult.
    • Suggestions included using eBay’s dispute resolution process as a potential safeguard, although experiences with such processes can vary.
  • DOA Testing: A member suggested implementing DOA testing, suggesting payment escrow, pre-sale DOA tests by the seller, and post-sale DOA tests that can match the seller.
    • The member suggested that if results don’t match, escrow takes a hit, and that a benchmark would help.

Latent Space ▷ #genmedia-creative-ai (4 messages):

Nano Banana, Runway Act-2, AI Video creation

  • Nano Banana + Runway Act-2 combine for persona-to-carti workflow: Techguyver demonstrates how pairing Nano Banana (ultra-cheap image edits) with Runway Act-2 motion matching enables creators to iterate faster in video creation, such as swapping clothes and styles.
    • The demo sparked discussion on the ethics of ‘toy vs storytelling’ and requests for tutorials, including some humorous comments about ‘hands are mine’.
  • Nano Banana for ultra-cheap image edits: Nano Banana is presented as an ultra-cheap image editor for quick edits.
    • It is used with Runway Act-2 to allow creators to iterate faster with video creation.

Eleuther ▷ #general (14 messagesđŸ”„):

Falsifiability in research, Grand Challenges in AI, EleutherAI Discord Purpose

  • Falsifiability Divides Scientists: A member suggested the server is for discussion of research on falsifiable hypotheses, rather than generally gesturing towards vague ideas.
    • Another member countered, stating that falsifiability is overrated among scientists, though pretty useful in general.
  • Members Tackle Grand AI Challenges: A member asked about current work, and another responded they are working on grand challenges and shared a link.
    • The member admitting wishing they had more skill points in math to tackle the interesting stuff.
  • Discord’s Research Focus Clarified: A member quoted the EleutherAI description that the Discord server caters to researchers and research-level discussion, specifically about falsifiable hypotheses.
    • The member said the goal is to prevent anyone with crazy theories abetted by chatbots from spewing nonsense.

Eleuther ▷ #research (34 messagesđŸ”„):

Alternative Approaches to Transformers, Forward-Forward Training, HTM Dynamics, Mini-Brain Architecture, Troubleshooting Training Regimes

  • Exploring Alternatives Beyond Transformers: Members express a desire to see more approaches beyond transformers and gradient descent, referencing this tweet of alternative approaches to transformers.
  • Diving into Forward-Forward Training: One member shared their work on HTM dynamics with forward-forward training, achieving plausible results and will post test scripts soon, see their repo.
  • “Mini-Brain” Architecture Emerges: A member is building a network around being a brain-like network with cortical columns, regions, 6 layer networking and signal propagation and has moved the project to this repo.
  • Transformer Computation Insights: A member recommended a talk on computation in transformers to fellow members Computation in Transformers.
  • Tokenizer Troubleshooting Underway: A member identified potential issues with their tokenizer, with a vocabulary size of 50k, and they are now troubleshooting the training regime and intend to get some meaningful metrics around Forward-Forward.

Eleuther ▷ #gpt-neox-dev (4 messages):

Muon Speedup, Torch Compile

  • Muon Speedup Claims Debunked: A member mentioned seeing a claim on Twitter of 1.6x speedup on Muon over Torch implementation, with Torch compile at 1.1x.
    • Another member clarified that the speedup was due to algorithmic changes requiring fewer NS iterations, not pure Muon or hardware improvements.
  • Algorithm Logic Improves Speed: The speedup was mostly about changing the algorithm to need less NS iterations.
    • It’s not pure muon, more algo logic instead of hardware-aware improvements

aider (Paul Gauthier) ▷ #general (40 messagesđŸ”„):

PacVim, OpenRouter billing, Context Management, aider git repo error, Model Control Policy (MCP) tool

  • PacVim Fine-Tuning?: After giving codestral an eval tool in gptel, it successfully completed all the emacs-related tasks, even reconfiguring emacs and making new tools for itself.
    • A member joked about fine-tuning with PacVim, with LLMs being good at operating Emacs, leaving Vim behind.
  • OpenRouter’s Top-Up Minimum Fees: Users are getting billed $6.16 instead of the expected $5 top-up, another user pointed out that OpenRouter charges a 5.5% fee (minimum $0.80) when you purchase credits, as the underlying model providers don’t markup pricing.
    • Algebraically speaking, they calculated that you stop getting hit by the $0.80 minimum if you top up by $14.55 or more each time.
  • Context Management Suffers Without Gemini 2.5 Pro: A user expressed their need for Gemini 2.5 Pro, noting that other models just don’t feel right for context management.
    • They are also struggling to get Gemini 2.5 Pro to work.
  • aider hits git repo error: A member saw Unable to list files in git repo: Require 20 byte binary sha, got b'\xb9', len = 1 from aider.
  • MCP Tools Evaluated: Community members discussed what a good Model Control Policy (MCP) tool call model would be.
    • It was suggested to review the Gorilla Leaderboard, trying Qwen3 8b, and flash-lite as options.

aider (Paul Gauthier) ▷ #questions-and-tips (7 messages):

OpenRouter DeepSeek 3.1 Configuration, Aider CLI Automation, Aider's Prompt Handling, Aider Context Degradation, conventions.md benefits

  • DeepSeek Setup for Aider Reasoner: Unconfirmed: A member asked about configuring OpenRouter’s DeepSeek 3.1 as a reasoner for the main model and a non-reasoning version as a weak model within Aider, but it’s unconfirmed if this setup works.
    • There was no confirmation or guide provided on how to achieve this specific configuration.
  • Aider CLI Awaits Input, Unlike Claude: A member noted that when piping content to Aider, it waits for user input, whereas Claude CLI immediately starts editing files.
    • The user inquired about automating Aider fully without human involvement, seeking a guide for such a setup.
  • Aider Pipes Only First Line: When content is piped to Aider, it only reads the first line.
    • To add PROMPT.md to Aider, it needs to be passed as an argument rather than piped.
  • conventions.md location in prompt impacts performance: Using conventions.md with --read places it near the top of the prompt, whereas including it in the message puts it near the bottom.
    • Due to U-shaped relevance in current prompts, placement at the top via --read may yield slightly better performance.
  • Aider context degrades >90k input tokens: A member finds that with Aider + Gemini Pro 2.5, context starts degrading around 90k-130k input tokens.
    • It seems to work fine at the top before that range.

Moonshot AI (Kimi K-2) ▷ #general-chat (32 messagesđŸ”„):

Imagen 4, Nano Banana, Kimi+, Z.AI Slides

  • Imagen 4 fools Users: A user shared an image generated by Imagen 4, initially mistaking it for a real scene from a podcast, and praising its impressive quality.
    • Another user noted that 2.5 flash image gen was nano banana and rolled out to Gemini app.
  • Google’s Nano Banana Image Gen & Imagen: A user mentioned that Google is not transparent about the usage of image generators such as nano banana and Imagen for marketing reasons.
    • The user also linked to a Tweet about reasoning models, noting that CoT and RL do not create new capabilities.
  • Kimi+ is Slides: Kimi+ seems to be a new category, with Slides as its first feature, initially available only to Chinese users.
    • A user provided a summary, noting If you want it quickly, I guess Kimi is the way to go. If you want to go more complex, Z.AI is the way to go.
  • Z.AI Slides vs Kimi Slides: A user finds Z.AI slides is just an HTML website, preferring an actual PPTX file.
    • Another user agreed, mentioning the need for more control and rearranging options in Slides, also experiencing freezing issues with Z.AI.
  • Overseas Platform: A user mentioned that the PPTX feature is currently available on Kimi+‘s overseas platform and suggested to expand it for Twitter, TikTok and Instagram.
    • A user suggested to expand PPTX to Twitter, TikTok and Instagram.

Yannick Kilcher ▷ #general (13 messagesđŸ”„):

Hierarchical modeling, HNet layers, Parameter count, tokenizer-free approaches, Albert Gu's blog posts

  • HNet Hierarchical Modeling: Untested Waters: The potential of higher-level hierarchical modeling with HNet remains untested in practice, although theoretically, it should extend beyond the original paper’s two layers due to residuals at each resolution level, similar to U-Net.
    • One member mentioned that he was asked by a friend to test h-net 3 layers.
  • HNet Compression Loss: Coefficient Reduction: In the HNet paper, the coefficient for compression loss was significantly reduced for the depth = 2 model compared to the depth = 1 model, implying that higher-level abstractions are almost the same as the depth=1 case.
    • A member noted that almost all compute still flows in the actual main network, making it challenging to expand this to operate on abstractions spanning multiple sentences or documents.
  • HNet Parameter Count: Maintaining Fairness: The decision to reduce the compression in HNet was likely to maintain fairness in parameter count and total compute when comparing HNets to other tokenizer-free approaches.
    • One member noted that a more even parameter spread across granularity levels might be optimal if the chunking method works well.
  • Deeper Abstractions: Simple Expansion Doubts: One member expressed doubt that deeper and more abstract embeddings could be achieved simply by changing a few lines of code, pointing out that the researchers have been working on this for over a year and wouldn’t miss something this simple.
    • The member suggested that if it actually worked, they would at least have an ablation on it or something.
  • Albert Gu’s Experiments: Publish Decision Factors: The research team had already conducted numerous experiments before deciding to publish their work.
    • As one member mentioned, At some point maybe you want to publish, especially if you’ve already used up your fair share of compute.

Yannick Kilcher ▷ #paper-discussion (9 messagesđŸ”„):

Reasoning Tokens, LLM Reasoning Efficiency, Mutual Information, stopwords

  • Thinking Tokens Improve Reasoning Efficiency?: The group discussed the paper Wait, We Don’t Need to “Wait”! Removing Thinking Tokens Improves Reasoning Efficiency, which suggests reasoning tokens can be removed to reduce token overhead with nominal effects on accuracy.
    • Later a user mentioned that the second paper is probably more accurate, that “reasoning” words seem to be skippable.
  • Do LLMs Internalize Time?: An intern’s experiment adding take your time to a CoT prompt with Llama 2 (+3) 7b (+13b) surprisingly increased reasoning time (generation took longer, trace was longer), without increasing accuracy.
    • The user wondered if the LLM had somehow internalized a concept of ‘time’ and shared transformer-circuits.pub confirming LLMs do have some representations of time.
  • Demystifying Reasoning Dynamics with Mutual Information: A user commented on the paper Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning, noting the cool observation about the MI of these tokens with the golden answer representation.
    • The user thinks that the paper identified ‘high information regions of sentences’, (the first words after periods and commas) and also accidentally included a few stopwords, which leads them to misinterpret one of their results.
  • Reasoning Tokens Boost Performance?: A user noted interesting parts of the Demystifying Reasoning Dynamics with Mutual Information paper relating to RR, where they refeed their reasoning tokens to layers a repeated time during inference for a boost in performance.
    • The user believes I wouldn’t be surprised if there was something to that. However maybe it isn’t understood exactly what is happening.

Yannick Kilcher ▷ #ml-news (6 messages):

Claude Chrome, Keen Technologies LLM, Promptlock AI Ransomware

  • Claude Chrome turns into Surveillance System: Anthropic’s Claude for Chrome introduces mandatory reporting requirements for AI programs.
    • One member stated this effectively turns AI into a surveillance system.
  • Keen Technologies’ fringes of LLM Research: A member expressed disappointment that Keen Technologies isn’t focusing on the fringes of LLM research that are making steps toward continual learning, and instead pushing pre-transformer RL tricks further as highlighted in this video.
    • They suggested improving TTT (growable like TokenFormer, sparse/higher rank queries like UltraMem, able to flip between dynamic and fixed size like TransMamba) to achieve a continually-learning real-time Atari player.
  • Promptlock: First AI-Powered Ransomware Emerges: The emergence of Promptlock, the first AI-powered ransomware, was noted, as detailed in this SecurityWeek article.
    • Members expressed sadness about this development.

Manus.im Discord ▷ #general (14 messagesđŸ”„):

Manus scheduled tasks and mail, Manus for enterprise research, Manus credits consumption issue, Support ticket delays

  • Manus Mails Scheduled Tasks Together?: A member asked if scheduled tasks and mail can be used together in Manus, and another member clarified they’re the same.
  • Enterprises need Research Tool alternative to Manus: A member mentioned Manus is good at research and sought alternative tools for enterprises with compliance issues that prevent them from using Manus.
  • Manus Credit Consumption Issues Plague Users: Several users reported that Manus used up their credits overnight due to repeated prompting without a response, referencing support ticket 1335.
  • Delayed Support Frustrates Website Launch: A user reported contacting support via multiple channels (Help Centre, Reddit, Discord) for a week without a reply, delaying the launch of their permanent website, referencing support ticket 1334.
  • Entrepreneurial Credit Needs go Unmet: A user noted that recent improvements to the service primarily benefit users who spend $200 a month, rather than entrepreneurs needing periodic credit increases.
    • They expressed frustration at having to wait until September 21st to receive more credits, halting their project’s progress.

DSPy ▷ #show-and-tell (1 messages):

Signatures, Modules, Abstractions, Blog Post

  • Blogpost hails Signatures and Modules as good Abstractions: A member wrote a blog post explaining their views on why they think signatures and modules are great abstractions.
  • Good Abstractions power!: The author shares that their thoughts are shaped coming from other frameworks, and felt it was worth covering in a dedicated blog post.
    • They hope it’s useful to folks who are new!

DSPy ▷ #general (11 messagesđŸ”„):

LiteLLM's Role in DSPy, OpenRouter vs LiteLLM, DSPy Dependency Bloat

  • LiteLLM: Essential Dependency for DSPy?: A user inquired about alternatives to litellm within DSPy, suggesting a syntax like dspy["litellm"].
    • Another member responded that LiteLLM’s interface enables generic plugins from various LLM providers, including OpenRouter, considering it an essential dependency.
  • OpenRouter Indirectly Uses LiteLLM: A member mentioned using OpenRouter and a proxy server utilizing LiteLLM, indicating an indirect dependency due to DSPy.
    • The user questioned the necessity of LiteLLM as a direct dependency and inquired about its contribution to bloat.
  • DSPy Dependency Bloat Investigated: One member inquired about what contributes to the bloat of LiteLLM, estimating its size at 9MB.
    • Another member suggested using a CLI AI to crawl the codebase and analyze the dependencies, joking about Karpathy striking again.

Modular (Mojo đŸ”„) ▷ #mojo (11 messagesđŸ”„):

InlineArray vs StaticTuple, DPDK header binding, High-level API improvements, Deduplication of generated Mojo files, tsan compiler option

  • InlineArray replaces StaticTuple for Efficiency: The use of InlineArray is back after initial seg fault issues, replacing StaticTuple for better memory layout in structs for both DPDK and Mujoco bindings.
    • The user mentioned that this was probably a skills issue.
  • Aggressive DPDK header binding strategy emerges: A member suggested focusing on rte_*.h headers within the installed include folder for DPDK bindings, due to DPDK’s efforts to minimize dependencies.
    • The goal is to create comprehensive bindings by including all relevant headers while avoiding unnecessary ones.
  • High-Level API Enhancements Prioritized for easier lib bindings: The next step is improving the high level API to make binding to different libs easier.
    • A member plans to cut down the size of the generated mojo files by skipping unused code.
  • Generated Mojo files deduplication considered: A member proposed using source annotations from cpp to deduplicate the generated Mojo files, aiming to reduce their size by removing unused code.
    • This involves analyzing annotations left by cpp to identify and eliminate redundant or unnecessary elements.
  • tsan compiler option availability discussed: A member inquired about checking if tsan (ThreadSanitizer) is enabled for the compiler when using the --sanitize thread option.
    • Another member suggested passing -DTSAN to the compiler and using env_get_bool from param_env with @parameter if as a workaround.

tinygrad (George Hotz) ▷ #general (3 messages):

tinygrad realize() PR, TestSetItemloop.test_range fusion, tinygrad GPT2 Training performance

  • Realize() Removal Proposed: A pull request was added to remove realize() and fuse TestSetItemloop.test_range into a single kernel in tinygrad#11870.
  • GPT2 Training on 7900xtx Sluggish: Training llm.c/train_gpt2.py appears slow on a 7900xtx, even with BEAM=5.
    • After tweaks to match nanogpt parameters, a member achieved 250ms per step at nanogpt size (batch size 64, 6 layers, 6 heads, 384 emb_dim, 256 seq_len), whereas Andrej’s nanogpt with rocm torch gets approximately 3ms per step with the default config.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

Google Docs Confirmation Emails, Mailing List Updates

  • Google Docs Confirms Program Sign-Ups: Members are receiving confirmation emails from Google Docs after signing up for the program.
    • The confirmation emails are successfully sent, but no other communication has been received yet.
  • Mailing List to Provide Lecture Updates: The mailing list for providing updates about each lecture should be active soon.
    • Users are advised to monitor the mailing list for future announcements and program updates.

MLOps @Chipro ▷ #events (1 messages):

AI Tools Introduction, Less Code Mindset, AI Prototyping, AI-powered Products, Tech History

  • Simplicity Powers AI Prototypes: A session with Carlos Almeida on September 5th will cover the Less Code Mindset and how it empowers non-technical people to launch AI-powered products.
    • Carlos will demo projects from Less Code Studio, showcasing how AI can dramatically cut the time from idea to working prototype, followed by an open Q&A.
  • Portugal Founders Build Global-First Companies: Dick Hardt will join Pedro Sousa and Daniel Quintas on September 12th to discuss the past, present, and future of tech, and how AI tools are shaping the field in Portugal.
    • The discussion will explore why he chose Lisbon, and how founders there can build global-first companies, also covering identity in the AI era and prototyping AI workflows.

Windsurf ▷ #announcements (1 messages):

Grok Code Fast 1, Windsurf announcement

  • Grok Code Fast 1 Surfs into Windsurf!: Grok Code Fast 1 is now available in Windsurf and free for a limited time.
    • Members are encouraged to share how they plan to use it for their next project, and a link to the announcement post was shared.
  • Limited-Time Free Access for Grok Code Fast 1: Grok Code Fast 1 is being offered for free for a limited time on Windsurf, inviting users to integrate it into upcoming projects.
    • An announcement post on X (formerly Twitter) provides further details about the offering, along with an attached promotional image.