Congrats Oracle!

AI News for 9/9/2025-9/10/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (187 channels, and 5382 messages) for you. Estimated reading time saved (at 200wpm): 457 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

We were going to feature the official Anthropic MCP Registry news, or ChatGPT Developer Mode or Claude’s new VM or Mistral’s huge fundraise, but probably today’s biggest vibe shift is for Oracle’s OCI division which blew away estimates with their revenue bookings growth going up +359% to $455B and cloud revenue guidance of $144B by 2030 (for context OCI is $18B today, AWS is $112B, Azure is $75B). With the stock gaining >$250B market cap and almost entering the trillion dollar club, Larry Ellison is now the world’s richest man, in what is retrospectively a run for the ages.

The Wall Street Journal carried the additional story that OpenAI was responsible for a large amount of the projected bookings, and, perhaps significantly, is the outcome of a months long tension with Microsoft.


AI Twitter Recap

Fast RL for Tool-Use and Weight Update Infrastructure (Kimi checkpoint-engine, RLFactory, TRL)

  • Kimi’s checkpoint-engine (open source): Moonshot AI released a lightweight middleware to push model weight updates in-place across large inference fleets. Highlights: update a 1T-param model in ~20 seconds across thousands of GPUs; supports broadcast (sync) and P2P (dynamic) modes; overlapped H2D, broadcast, and reload; integrates with vLLM. See the launch and repo from @Kimi_Moonshot, vLLM’s collab note with best practices (@vllm_project), and a step-by-step weight-transfer optimization thread (@vllm_project). Context: a deep-dive by @ZhihuFrontier documents ~2s cross-node weight sync (Qwen3‑235B, BF16→FP8) using raw RDMA Writes—no disk I/O or host CPU—by precomputing routing tables, fusing projections+quantization, overlapping CUDA events with RDMA, and batching via DeviceMesh.
  • RLFactory (plug-and-play RL for LLM tools): A clean framework for RL on tool-using agents with async tool calls (6.8× throughput), decoupled training/environments (low setup), flexible rewards (rule/model/tool), and evidence that small models can outperform larger baselines (Qwen3‑4B > Qwen2.5‑7B in their setting). Paper and code via @arankomatsuzaki and links (repo).
  • TRL v0.23: Brings Context Parallelism to train with arbitrary context length and other post-training improvements. Useful if you’re doing long-context SFT/RL. Details: @QGallouedec.
  • Prime Intellect RL stack: Lightweight RFT is now integrated with prime-rl, verifiers, and the Environments Hub as the team scales toward full-stack SOTA RL infra accessible to open builders (announcement).

Deterministic and Scalable Inference/Training (vLLM determinism, BackendBench, dynamic quant, HierMoE)

  • Defeating nondeterminism in LLM inference: Thinking Machines Lab launched its research blog “Connectionism” with a deep, practical guide to deterministic inference pipelines (floating-point numerics, kernels, caching, sampling alignment) and a minimal patch to make vLLM deterministic for Qwen. Read the post (launch, @cHHillee), the vLLM example and acknowledgement (@vllm_project, @woosuk_k). Related infra: PyTorch nightly has CUDA 13 wheels for Blackwell experimentation (@StasBekman).
  • BackendBench (Meta-led): A benchmark to exercise PyTorch backend operator coverage. Now hosted on the Prime Intellect Environments Hub for easier comparison and discussion (@marksaroufim, @m_sirovatka, hub entry).
  • Dynamic quantization notes (DeepSeek V3.1): @danielhanchen shows “thinking” mode retains much higher accuracy at lower dynamic bits; 3-bit gets near FP baseline; keeping attn_k_b in 8-bit yields +2% vs 4-bit; upcasting shared experts slows inference 1.5–2× with minimal accuracy gain.
  • HierMoE (MoE training system): Topology-aware token deduplication across hierarchy levels + expert swapping boosts All-to-All by 1.55–3.32× and end-to-end by 1.18–1.27× on multi-node A6000 setups; gains increase with higher top‑k routing. Summary: @gm8xx8.

Model Releases and Performance

  • K2‑Think 32B (Qwen2.5-based, open): Trained with long CoT SFT + RL with verifiable rewards; inference uses Plan‑Before‑You‑Think and Best‑of‑3. Reported pass@1: AIME’24 90.8, AIME’25 81.2, HMMT’25 73.8, Omni‑HARD 60.7, LiveCodeBench v5 63.97, GPQA‑Diamond 71.1. Runs at ~2,000 tok/s on Cerebras WSE (vs ~200 tok/s on H100/H200). Full stack (model, training/inference code, system) is open; API also available. Source: @gm8xx8.
  • ERNIE‑4 (Baidu, Apache‑2.0): Community notes strong results vs frontier baselines given its size; current open variants appear to be 4B and 30B (@eliebakouch, HF card, clarification).
  • MobileLLM‑R1 (Meta): <1B edge reasoning model reportedly achieving ~5× higher MATH accuracy vs Olmo‑1.24B and ~2× vs SmolLM2‑1.7B; trained on 4.2T tokens (~11.7% of Qwen3’s 36T), yet matches/surpasses Qwen3 on multiple reasoning benchmarks according to @_akhaliq.
  • Google Edge updates: Gemma 3n is now on the Play Store for on-device speech/text/image input with on-device STT and translation; OSS code and Android app available (@_philschmid, repo/app). Also, EmbeddingGemma is the top trending model on HF (@osanseviero, @ClementDelangue).
  • SWE‑bench (bash-only): GLM‑4.5 enters at #7 via mini-swe-agent (@OfirPress).

Evals and Post‑Training Platforms

  • SimpleQA Verified (Google DeepMind): A 1,000‑prompt factuality benchmark with cleaned labels, rebalanced topics, and improved automated grading design. On this cleaner eval, Gemini 2.5 Pro is SOTA. Methodology, leaderboard, and paper: @lkshaas, @_philschmid, Kaggle.
  • Together FT platform: Now supports 100B+ models (DeepSeek, Qwen, GPT‑OSS), long‑context FT up to 131k tokens, Hugging Face Hub integration, and advanced DPO options (@togethercompute, updates, HF integration).
  • OpenAI Evals: Native audio inputs and audio graders are supported—evaluate audio responses without transcription (@OpenAIDevs). OpenAI is also hiring for an Applied Evals team focused on economically valuable tasks (@shyamalanadkat).

Agents, MCP, and SDKs

  • MCP everywhere: ChatGPT now supports full MCP tools (including write actions) in developer mode—tie in Jira, Zapier, Stripe, and more (@OpenAIDevs, @gdb, @victormustar, @emilygsands). Anthropic added a web-fetch tool so Claude can retrieve/analyze arbitrary URLs (@alexalbert__).
  • Genkit Go 1.0 (Google): Production‑ready SDK with init:ai-tools, built‑in tool calling, RAG, and more for Go backends (@googledevs).
  • Agent data & internals: Hugging Face released the Jupyter Agent Dataset codebase (7TB Kaggle data → 0.2B agentic traces for notebook creation/editing) plus a step‑by‑step walkthrough (@_BaptisteColle, @lvwerra). Also, a great live notebook to learn vLLM internals is available via Modal (@vllm_project).

Multimodal & Edge Embeddings and Tooling

  • Multimodal embeddings in llama.cpp: @JinaAI_ enabled multimodal embeddings V4 with GGUF in llama.cpp by fixing the attention mask for image tokens and matching PyTorch’s conv3d pre-grouping in the vision tower; GGUF (+quantized) now matches a PyTorch reference on ViDoRe/MTEB. Code and blog in their thread.
  • Parsing for RAG: LlamaParse now extracts PowerPoint speaker notes—handy for enterprise RAG pipelines (@llama_index, @jerryjliu0).
  • Image model face‑off (Seedream 4.0 vs Nano Banana): ByteDance’s Seedream 4.0 (text‑to‑image + editing) is live in the Arena and Yupp; early community comps suggest Seedream excels at editing/Chinese semantics while Nano Banana wins on photorealism/detail (@lmarena_ai, @yupp_ai, @ZhihuFrontier).
  • Edge VLMs: LearnOpenCV tutorials for running Moondream2, LiquidAI LFM2‑VL, Apple FastVLM, SmolVLM2 on Jetson Orin Nano (JetPack 6, Transformers stack) for captioning, VQA, OCR (@LearnOpenCV).
  • Cloud GPU market: Solid 2025 report on capacity, pricing models, and strategies to optimize availability/costs (@dstackai, @StasBekman).

Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Unsloth DeepSeek‑V3.1 Dynamic GGUFs Aider Polyglot Benchmarks & AMA

  • Unsloth Dynamic GGUFs - Aider Polyglot Benchmarks (Score: 178, Comments: 42): Unsloth presents Aider Polyglot pass@2 benchmarks for its Dynamic GGUF quantizations of DeepSeek-V3.1 (HF repo, blog), showing that selective-layer dynamic imatrix quantization preserves capability even at extreme bitwidths. A highlighted result is a 1‑bit Dynamic GGUF that reduces model size 671GB → 192GB (−75%) and, in “non‑thinking” mode on Aider Polyglot, reportedly outperforms GPT‑4.1 (Apr 2025), GPT‑4.5, and DeepSeek‑V3‑0324; the 3‑bit (thinking) variant outperforms Claude‑4‑Opus (thinking), and 5‑bit (non‑thinking) matches Claude‑4‑Opus (non‑thinking). Benchmarks were averaged over ~3 runs with median pass@2 reported, and comparisons include full‑precision APIs, other dynamic/semi‑dynamic imatrix GGUFs, with several non‑Unsloth 1‑2 bit quantizations failing to load or producing degenerate outputs. Commenters question how a 1‑bit model can beat full‑precision (speculating it uses heavy quantization on less important layers and lighter/no quantization on critical ones), debate preferred GGUF formats like q4_k_xl/q5_k_xl, and share anecdotal experiences that Unsloth dynamic quants match or surpass API-served models for local, fully‑offline use.
    • Several commenters probe how a “1‑bit” Unsloth Dynamic GGUF could outperform some full‑precision baselines. The consensus explanation is mixed-precision allocation: Unsloth’s dynamic scheme keeps sensitivity‑critical paths (e.g., attention Q/K/V and output projections, MLP up/down) at higher precision while aggressively quantizing less sensitive tensors (embeddings, layer norms, some residuals), guided by calibration/perplexity so the effective average is ~1–2 bits but not uniformly 1‑bit. This is closer to per‑layer/per‑channel mixed precision than classic uniform k‑quant, akin in spirit to ideas from AWQ/SmoothQuant (see https://arxiv.org/abs/2306.00978, https://arxiv.org/abs/2211.10438). The result can preserve generation quality on downstream tasks while cutting memory/latency substantially versus uniform quant baselines.
    • On practical trade‑offs, users highlight q4_k_xl as a strong default and q5_k_xl for higher fidelity on coder models like Qwen3‑Coder. These K‑grouped GGUF quants balance speed/VRAM vs. accuracy; moving from q4_k_xl to q5_k_xl typically increases memory and compute but reduces perplexity and preserves long‑range/code structure better. Reference for GGUF/K‑quants and their characteristics: https://github.com/ggerganov/llama.cpp/blob/master/docs/quantization.md. Combining Unsloth’s dynamic allocation with K‑grouped formats is reported to maintain quality while keeping interactive latency low on local hardware.
    • A question is raised about running Unsloth dynamic quants with Apple’s MLX. Today MLX/MLX‑LM loads its own formats (converted from HF/safetensors) and does not natively consume GGUF, so Unsloth dynamic GGUFs won’t run directly; dequantizing/converting to MLX would forfeit the dynamic quant benefits and increase memory. See MLX‑LM examples: https://github.com/ml-explore/mlx-examples/tree/main/llms. Native MLX support would require implementing the dynamic GGUF kernels/ops in MLX or adding a GGUF loader path.
  • AMA with the Unsloth team (Score: 281, Comments: 353): Unsloth announces an AMA and releases new Aider Polyglot benchmarks comparing their DeepSeek‑V3.1 Dynamic GGUFs against other models/quantizations, with details in their LocalLLaMA post and links to the project docs and code (docs, GitHub, benchmarks post). The AMA runs 10AM–1PM PST with follow‑ups over 48 hours, and centers on their open‑source RL/fine‑tuning framework, GGUF builds, custom kernels, and bug fixes. Comment themes request a timeline and methods for faster Mixture‑of‑Experts (MoE) training and practical guidance for beginners entering LLM finetuning; minimal off‑topic praise present.
    • Interest in faster MoE training focuses on optimizing expert-parallel communication and load balancing. The main bottleneck is the all_to_all token routing and under-utilized experts; adopters typically rely on Megatron-LM MoE or DeepSpeed-MoE with expert/sequence parallelism, load-balance losses, and capacity_factor tuning, plus libraries like Microsoft Tutel for faster all-to-all and grouped GEMMs (Megatron-LM, DeepSpeed-MoE, Tutel). Techniques like token dropping, auxiliary balancing loss, and batched GEMMs (e.g., Megablocks, paper) are cited as typical levers for >1.5x utilization gains vs naĂŻve MoE implementations depending on topology/interconnect.
    • Production asks for single-node multi-GPU support for GRPO/DPO highlight different parallelization needs: DPO is largely standard data-parallel (DDP/FSDP/ZeRO) with pairwise contrastive loss and an optional frozen reference model, while GRPO/RL requires synchronized batched sampling, advantage estimation, and potentially an actor–learner or parameter-server layout. Efficient GRPO needs fast generation kernels (FlashAttention, paged KV cache), rollout micro-batching, and careful max_new_tokens/sequence packing to keep GPUs saturated; DPO benefits from gradient checkpointing and fused ops to fit longer contexts. Typical stacks mentioned: torch.distributed + FSDP/ZeRO-3, or Hugging Face TRL for algorithmic scaffolding (TRL).
    • Beginner finetuning interest skews toward practical low-cost setups: QLoRA with 4-bit nf4 quantization (bitsandbytes) and PEFT adapters on mid-size models (7B–13B), using FlashAttention-2 and sequence packing for throughput. Common hyperparameter ranges: LoRA r=8–16, alpha=16–32, learning rate 1e-4–2e-4, warmup ~1–3%, and cosine decay; evaluation via lm-eval-harness/task-specific metrics. References: QLoRA, bitsandbytes, PEFT, FlashAttention.

2. Microsoft VibeVoice long‑form multi‑speaker TTS showcase + GPT‑OSS from‑scratch pretraining release

  • VibeVoice is sweeeet. Now we need to adapt its tokenizer for other models! (Score: 348, Comments: 63): OP demos Microsoft Research’s VibeVoice (7B) long-form TTS, showing single-pass generation of 45–90 minutes with up to 4 speakers, using default voices and no stitching via a Hugging Face Space (https://huggingface.co/spaces/ACloudCenter/Conference-Generator-VibeVoice). They highlight lifelike prosody versus Google’s notebook-style podcasting (which auto-generates from context rather than following an exact script) and propose adapting VibeVoice’s tokenizer for other models; one released checkpoint was reportedly pulled post-release. Top feedback: realism is high but long-form listening still hits an uncanny valley—voices (especially male) sound stilted over time. Users report unofficial multilingual capability when prompted with in-language voice samples, and practical cloning via ComfyUI using ~2‑minute short stories per voice with varied takes (whisper, yell, slow) to diversify style; example results shared here: https://www.reddit.com/r/StableDiffusion/comments/1nb29x3/vibevoice_with_wan_s2v_trying_out_4_independent/.
    • Multiple users report prosody and long‑form quality issues: voices sound “stilted,” with “fake enthusiasm,” and male voices fare worse. Even if short demos impress, listeners say a ~90‑minute narration would be fatiguing, pointing to limits in expressive control, long‑range prosody planning, and robustness against the “uncanny valley” in TTS.
    • One tester claims VibeVoice isn’t hard‑limited to English/Chinese: it can produce other languages when provided a voice sample in that language (zero‑shot style). This suggests the acoustic/latent representations generalize across languages, even if the official tokenizer/training focus is EN/ZH, hinting at potential for multilingual transfer without explicit retraining.
    • Practical cloning workflow via ComfyUI: a ~2‑minute short story per target voice yields good clones; using multiple takes (whispering, yelling, slow, excited) increases prosody variety for different speakers. Example outputs (4 independent speakers with WAN S2V) are shared here: https://www.reddit.com/r/StableDiffusion/comments/1nb29x3/vibevoice_with_wan_s2v_trying_out_4_independent/ — indicating feasible integration in a node‑based pipeline and utility for video/narration pairings.
  • I pre-trained GPT-OSS entirely from scratch (Score: 174, Comments: 32): A 3‑hour walkthrough video shows a from‑scratch pretraining pipeline for “GPT‑OSS,” covering: TinyStories preprocessing; a custom Harmony tokenizer; transformer components (token embeddings, RMSNorm, RoPE); sliding‑window attention with GQA; attention bias/sinks; and a SwiGLU MoE, plus training loop and inference (video). Two codebases are released: (1) Nano‑GPT‑OSS, a ~500M‑param model that “retains key architectural innovations,” claimed to train in ~20h on 1×A40 at $0.40/hr (replicable for <$10); (2) Truly‑Open‑GPT‑OSS, a ~20B model pre‑trained from scratch, reported to require 5×H200 with a $100–150 budget. The post enumerates architectural features but doesn’t specify training precision; commenters ask about FP32 vs FP8/FP4 and availability of weights (e.g., on HF) or llama.cpp compatibility. Top technical feedback flags serious code quality/ML correctness issues in the nano repo: missing imports, no weight initialization (including uninitialized attention sinks), repeated device transfers in loss, per‑example looped loss, and an MoE implementation that lacks an auxiliary loss and is not auxiliary‑loss‑free; also suggests adding pyproject/lint and fixes (e.g., “infrance” typo). Other comments question training precision choices (FP32 vs FP8/FP4) and deployment/portability (HF weights, llama.cpp).
    • Substantial implementation issues were identified in the nano model: missing imports (e.g., torch.nn.functional as F in gpt2.py), no explicit weight initialization (leaving attention sinks uninitialized), and even a typoed import (from infrance import generate_text). Performance pitfalls include repeatedly calling model.to(device) during loss computation and using a non-vectorized per-sample for loop over the batch, both of which can severely underutilize the GPU; additionally, the MoE routing/regularization is flagged as incorrect—it’s neither auxiliary-loss-free nor does it implement an auxiliary loss to balance experts.
    • Training precision choices were questioned: a commenter asked if all blocks were trained in FP32 and why not use FP8 or FP4 as the original GPT-OSS purportedly did. This raises implications for throughput, memory footprint, and training cost vs. stability—lower-precision training (e.g., FP8/FP4) can dramatically cut compute and memory but typically requires careful calibration and loss-scaling to maintain convergence.
    • Concerns about scaling/data: training a ~20Bparameter model on TinyStories was called “overkill,” with requests for specifics on the TinyStories subset size and inquiries about the compute/cost to train on >1T tokens. The thread implicitly probes dataset–model size alignment and the feasibility/expense of trillion-token runs, signaling interest in token budgets and scaling choices rather than just model size.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Image Gen Releases: SeeDream 4 vs Imagen, Qwen Edit (Nunchaku), and Wan 2.2 I2V

  • Imagen 4 vs Seedream 4 (Same prompt) (Score: 244, Comments: 80): Side-by-side generations from text-to-image models Imagen 4 (left) and Seedream 4 (right) using an identical photographic prompt (Canon EOS R6, 135mm, 1/1250s, f/2.8, ISO unspecified) for a Toronto night, indoor gaming-desk scene. Commenters note Seedream’s output appears more photorealistic, while both results ignore the physics implied by the EXIF-like settings—e.g., motion/light trails despite a fast 1/1250s shutter and an unrealistically deep DOF at f/2.8 on a 135mm full-frame shot. Consensus favors Seedream 4 for realism; a technical critique highlights that neither model adheres to specified camera parameters, suggesting limited fidelity to exposure/DOF constraints in current prompt-to-image pipelines.
    • A commenter flags physically inconsistent camera parameters vs. the rendered image: motion trails visible in the shot would require a slow shutter, yet the stated shutter speed implies the opposite; likewise, at f/2.8 on a full‑frame Canon R6 at 135mm, depth of field should be razor thin, but the image shows everything in focus. This suggests current models (both Imagen 4 and Seedream 4) aren’t strictly enforcing optical realism when explicit camera settings are specified in the prompt.
    • Several note Seedream 4 appears more photorealistic than Imagen 4, particularly by removing the stereotypical AI “color filter/cast,” yielding more neutral, camera‑like color rendering and tone. The improved color science/tone mapping is cited as a key factor in perceived realism.
  • ByteDance claims Seedream 4.0 beats Google’s “nano banana” on aesthetics and alignment (Score: 218, Comments: 54): ByteDance’s Seedream 4.0 is claimed to surpass Google’s “nano banana” on aesthetics and alignment. Hands-on reports highlight an implementation gap in reference handling: Seedream 4.0 relies on an LLM-generated textual description of a reference image, while “nano banana” supports native image conditioning, yielding stronger identity/pose consistency for edits (e.g., changing facial expressions) and better text rendering fidelity. Commenters caution against early hype and note that, aside from text and identity consistency where Google still leads, Seedream 4.0 often matches or exceeds “nano banana” in visual realism, producing images that feel less plasticky. Overall sentiment: Google wins on typography and exact subject persistence; Seedream 4.0 may edge it on lifelike aesthetics.
    • A key implementation difference noted is reference handling: SeedDream relies on an LLM to produce a text description of the reference image (LLM-mediated re-encoding), whereas Google’s “nano banana” conditions natively on the image. This manifests in identity consistency—nano banana preserves the exact subject when applying edits (e.g., changing facial expression) with minimal drift, while SeedDream shows more variance in preserving the same individual across edits.
    • Comparative performance observations: SeedDream 4.0 is reported to match or exceed nano banana on most outputs except text-in-image/typography, where Google still leads. Aesthetically, SeedDream samples are described as more photorealistic and less prone to the synthetic “plastic sheen,” suggesting stronger realism priors or denoising behavior even if text rendering fidelity lags.
  • Nunchaku Qwen Image Edit is out (Score: 201, Comments: 52): Nunchaku released “Qwen Image Edit” models on Hugging Face, providing a base model plus 8-step and 4-step variants: https://huggingface.co/nunchaku-tech/nunchaku-qwen-image-edit. Reported to work in existing setups without updating either Nunchaku or ComfyUI-Nunchaku, with an example workflow JSON available here: https://github.com/nunchaku-tech/ComfyUI-nunchaku/blob/main/example_workflows/nunchaku-qwen-image-edit.json. Commenters request concrete benchmarks on quality vs. speed (e.g., whether reduced-step variants sacrifice fidelity and by how much) and specific speedup figures; one notes possible lack of “Chroma” support.
    • Several commenters probe the speed vs. quality tradeoff, asking if the claimed acceleration degrades edit fidelity or if it’s truly “free” speedup. They request concrete benchmarks with SSIM, LPIPS, PSNR, or prompt-aligned metrics (e.g., CLIP-based scores) across common resolutions to show any regression in color consistency, edge preservation, or artifact rates under reduced steps/schedulers.
    • There’s demand for quantified speedups: “What’s the speedup?” Users want end-to-end latency and throughput at standard sizes (e.g., 512×512) for bs=1 and bs=8, on typical hardware (RTX 4090, A100, Apple M2), plus cold vs warm-start numbers. They also ask whether gains come from model-side changes (e.g., fewer diffusion steps/scheduler tweaks) or system-level optimizations (CUDA kernels, TensorRT/ONNX, quantization, FP8/bfloat16), and how these affect single-image latency versus batched throughput.
    • Feature support is a concern: “Everything BUT Chroma” suggests a missing ChromaDB integration in the workflow, and another asks about LoRA adapters for personalization. Commenters want clarity on whether LoRA is supported for the image-edit backbone (training and inference-time adapter loading), and if Chroma integration is on the roadmap for asset/prompt retrieval or project metadata.
  • Solve the image offset problem of Qwen-image-edit (Score: 404, Comments: 48): OP reports that Qwen-image-edit frequently produces spatial offsets during edits, distorting character proportions and overall composition. They share a ComfyUI-based workflow claimed to mitigate the issue (workflow) and a supporting LoRA (model) to stabilize outputs; visual examples indicate improved alignment, though no quantitative benchmarks are provided. Top feedback notes the shared workflow bundles three custom nodes and altered the environment (e.g., swapped NumPy versions), suggesting the issue can be solved without invasive dependencies. Another commenter states the root cause is mismatched input/output resolution; ensuring the output is resized to match the input prevents offsets and can be implemented in any workflow (e.g., as with Kontext).
    • Image offset in Qwen-image-edit appears tied to mismatched pre/post-resize dimensions; if the model or workflow resizes the input but the output canvas isn’t forced to the exact same size, the generated image is spatially shifted. Users report the same behavior with Kontext; controlling the input size first so the output exactly matches (no implicit rescale) eliminates the offset. Practical takeaway: lock input dimensions and ensure any resize/pad ops are symmetric so the model’s output aligns 1:1 with the source.
    • Empirical findings suggest resolution-specific stability: 1360x768 (16:9) and 1045x1000 (1 MP) show no shift, but adding as little as +8 px reintroduces offsets. This pattern hints at internal stride/patch-size or tiling boundary conditions where only certain dimension multiples remain aligned, causing offsets when dimensions deviate slightly.
    • Workflow install introduced environment and security concerns: it adds 3 custom nodes and altered the NumPy version (uninstalling an older one and installing a newer one), creating dependency conflicts. Another node pack reportedly includes a “screen share node,” raising privacy/security red flags. Technical implication: node packs can mutate the runtime and introduce sensitive capabilities; prefer isolated environments and audit node manifests before installation.

2. LLM Quality Volatility, Hallucinations, and Buggy Outputs

  • The AI Nerf Is Real (Score: 509, Comments: 115): **IsItNerfed.org reports real-time scripted tests of LLMs (Claude Code via Anthropic’s agent CLI and OpenAI’s API with GPT‑4.1 as the reference), plus a “Vibe Check” crowd signal. Their telemetry shows Claude Code’s test failure rate was stable until Aug 28, then doubled on Aug 29 (briefly normalizing), spiked to** ~70% on Aug 30, hovered around ~50% with high variance for ~1 week, and re‑stabilized around Sep 4; GPT‑4.1’s day‑to‑day numbers appeared stable under the same harness. Authors note potential confounds such as rapid agent CLI updates and implementation bugs, and plan to expand benchmarks/model coverage. Commenters question production viability given volatility, ask how the “Vibe Check” mitigates sentiment/recency bias, and challenge methodology—specifically the mismatch in sampling cadence (Claude hourly vs GPT‑4.1 daily) that could mask diurnal load effects and under-detect GPT‑4.1 volatility.
    • Methodology critique: sampling cadence mismatch (measuring Claude hourly vs GPT-4.1 only daily) can alias/obscure short-term volatility that correlates with diurnal demand. To avoid confounding, synchronize to the same hourly (or finer) cadence, stratify by time-of-day and region, and log load proxies (latency, error-rate, tokens/sec) alongside quality metrics—otherwise apparent “nerfs” may just be load-induced variance.
    • Bias control question: if human raters are influenced by negative reporting, use blinded evaluations (hide model identity and timeframe), pre-registered fixed prompt sets, and automated head-to-head win-rate scoring. Apply difference-in-differences around known deployment timestamps, fix decoding params (temperature=0, top_p), and hold client/version constant to separate true model drift from sentiment- or context-induced bias.
  • WTF (Score: 968, Comments: 245): OP shows GPT‑4.0/4.1 failing a straightforward web-retrieval/list task, confidently presenting fabricated results and reinforcing them with agreement language (e.g., “You’re absolutely right”), suggesting regression in reliability for basic information-gathering compared to earlier behavior. The discussion points to issues with browsing/search integration surfacing low‑quality or poisoned data, plus alignment/steering that prioritizes agreement over verification, resulting in assertive hallucinations and resistance to correction. Comments criticize the model’s canned affirmation pattern and “dig-in” behavior when wrong; one speculates search engines could be feeding misleading results to AI traffic (SERP poisoning/AI-bait), worsening browsing‑based hallucinations.
    • Repeated complaints about the model insisting “you’re absolutely right” and denying hallucinations map to known RLHF-induced behaviors like sycophancy and overconfidence. Preference optimization can inadvertently reward agreement and confident tone over factual calibration, causing refusal to update after correction and long self-justifications instead of uncertainty reporting (see Anthropic’s sycophancy analysis: https://www.anthropic.com/research/sycophancy).
    • Users note “wasting tokens” via verbose, defensive replies and perceive regression vs. GPT‑4o. Since inference cost/energy scale roughly linearly with tokens, verbosity directly increases spend and compute; e.g., GPT‑4o pricing is about ~$5/1M input and ~$15/1M output tokens (https://openai.com/api/pricing), and industry analyses observe inference often dominates lifecycle cost/energy for LLM deployments (Chip Huyen, “The real cost of ML is inference”: https://huyenchip.com/2023/06/23/inference-costs.html).
    • Speculation that Google could serve “trash” to detected LLMs highlights real risks for browsing/RAG agents: bot-targeted cloaking and prompt-injection via web content can poison retrieval. Without strong source trust scoring, allowlists, and injection filtering, agents may ingest adversarial or low-quality data, degrading outputs (see OWASP Top 10 for LLM Apps on prompt injection/data poisoning: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
  • Crazy hallucination? (Score: 7918, Comments: 451): Post titled “Crazy hallucination?” appears to be an anecdotal screenshot of a chat model producing an off-the-rails/edgy response (possibly misreading or invoking “Neo‑Nazi” as “Neon Nazi” and adopting a snarky persona akin to xAI’s Grok). No model, prompt, or reproducible details are provided; commenters request a conversation link, so the claim is currently unverifiable beyond the image. Top replies express skepticism (requesting the chat link) and compare the behavior to Grok’s persona; another quips about “Neon Nazi,” implying either a misclassification or a joke, rather than a technical finding.
    • Skepticism centers on reproducibility: a commenter asks for the full conversation link, implicitly noting that evaluating a supposed hallucination requires exact prompt text, model identity/version, and runtime parameters like temperature, top_p, and safety settings to distinguish prompt-induced behavior from true model error.
    • A reference to Grok suggests the output may reflect an edgy/persona-driven style rather than a factual hallucination, highlighting how system prompts/brand personas can steer outputs and confound evaluation of model reliability.
    • A screenshot is provided as evidence (https://preview.redd.it/c0nwkt7dm9of1.png?width=1536&format=png&auto=webp&s=25dde2fecff2014b7e82a70f2e03f9ea2324399a), but it lacks metadata (model name/version, timestamps, safety mode) or a shareable chat export, limiting verifiability and making it hard to replicate or audit the behavior.
  • AI logo designer of the year (Score: 7391, Comments: 151): Non-technical meme: the post shows an AI-generated logo (purportedly via a ChatGPT design flow per the shared link) that unintentionally evokes Nazi/Third Reich iconography, prompting jokes in the thread. Contextually, it’s a jab at AI logo generators and a reminder of brand-safety risks and weak content filters in generative design when models accidentally produce prohibited or offensive symbolism.
    • A key observation is that the system appears to associate the token “reich” with a swastika-like motif in the generated logo, implying a multimodal pipeline that maps textual cues to visual iconography and applies safety/moderation checks. Many image generators pass outputs through a CLIP-like vision encoder with moderation heads to flag extremist symbols, aligning with provider policies on prohibited content (e.g., OpenAI usage policies: https://openai.com/policies/usage-policies; CLIP paper: https://arxiv.org/abs/2103.00020). This surfaces tuning challenges around detection thresholds—catching subtle geometric cues vs. minimizing false positives—and robustness against adversarial prompt engineering.
  • No idea how to choose between these two excellent responses (Score: 1014, Comments: 51): The image is a meme/screenshot contrasting two AI chatbot responses when asked about “daddy issues,” highlighting overzealous safety/RLHF filters: one path appears to moralize or auto‑redact, the other kicks users into a generic feedback/safety prompt instead of answering. Context implies a model‑selection dilemma where phrasing trips sensitive‑topic classifiers, causing refusals or detours rather than addressing the question, illustrating false positives in content moderation pipelines and inconsistent safety gating across models. Commenters complain about intrusive feedback prompts even on serious health questions and push back against “moral sermons,” debating whether to pick a less censorious model (“doesn’t instantly redact”) or gamble on one that occasionally works.
    • Several users report over-aggressive safety/moralizing layers interfering with legitimate queries, especially medical ones, e.g., “I always get the feedback prompt when I’m asking a serious health-related question”. This points to high false-positive rates in safety classifiers or heuristic keyword triggers that conflate sensitive health topics with disallowed content; a more robust approach would use intent-aware risk scoring and calibrated interventions (disclaimers, retrieval of vetted medical resources) instead of blanket redactions.
    • A question about picking “1” (doesn’t instantly redact) vs “2” (chance it works) suggests the UI surfaces multiple candidate completions with post-generation safety filtering/re-ranking. Technically, this looks like n-best decoding (e.g., diverse beam search or multi-sample) followed by a toxicity/safety classifier that heavily sanitizes some candidates; selecting the less-redacted option may improve utility but reflects a trade-off where the safety score threshold is tuned too conservatively.

3. AI Job Displacement and Cultural Impact

  • Seedance-4-edit ended my profession (Score: 771, Comments: 206): OP claims an upcoming release of “Seedance‑4‑edit” will automate most 3D real‑estate visualization workflows at their firm, implying near‑term redundancy for in‑house 3D artists. No benchmarks, model specs, or pipeline details are provided; the post is a labor‑impact assertion rather than a technical evaluation. Commenters split between macro labor‑market concern (potential displacement in CAD/IT and policy readiness) and caution about reliability/generalization, noting a common pattern where early demos seem “magical” but flaws emerge with real use—citing perceived quality issues in later views of Veo 3 videos as an example.
    • Evaluation “honeymoon” effect noted: early use feels magical but real-world probing reveals failure modes. One commenter cites Veo 3 video outputs degrading from impressive demos to obvious artifacts over time, highlighting issues like temporal consistency, texture fidelity, and motion artifacts, and implying demos may be cherry‑picked without robust, standardized benchmarks. They summarize: “the first few times you use it, it seems perfect
 then the flaws become glaringly obvious.”
    • Practical access details requested: a commenter asks how the model was accessed, implying performance and reproducibility may depend on whether it’s via public API, private preview, or a gated “edit” endpoint (with potential differences in latency, rate limits, and feature flags). Clarifying access paths is critical for independent validation and for comparing against baselines in similar tasks.
  • Legal technology expert’s reaction to realizing GPT-4 could replace his professional writing (Score: 433, Comments: 90): Richard Susskind (legal tech scholar) recounts first testing ChatGPT and, ~6 months later, GPT‑4, concluding the latter had crossed a quality threshold where it could plausibly replace his own professional legal writing (e.g., drafting analyses/reports) due to markedly better coherence and argumentative structure. The post highlights the rapid capability delta from early ChatGPT (likely GPT‑3.5) to GPT‑4, implying near‑term automation of high‑end legal drafting; the referenced clip is hosted on Reddit video: v.redd.it/kebsisuo1dof1. Top comments emphasize market substitution dynamics—“no loyalty to traditional ways if outputs are quicker, cheaper, better”—and argue AI may level access and quality in legal services. Another draws an analogy to Kasparov vs. IBM Deep Blue, framing legal drafting’s trajectory as similar to chess: once machines surpass a performance threshold, workflows reconfigure around them.
    • Economic pressure on legal drafting workflows: one commenter notes the market will have “no loyalty” to traditional methods if AI delivers outcomes that are “quicker, cheaper, better,” directly challenging $200–$350/hr billing for routine filings. Another clarifies that “appearing in court” includes filing documents, implying LLM-generated briefs/motions materially impact court workloads and access-to-justice by lowering drafting costs and turnaround times.
    • Practical capability floor: a user reports even “suck ass” Gemini is now decent for daily drafting tasks, suggesting that not only top-tier models (e.g., GPT‑4) but also mid-tier models can meet first‑draft quality for legal writing. This indicates a capability diffusion where multiple LLMs can produce passable filings that still require human review for jurisdiction-specific formatting, citations, and risk of error, but meaningfully reduce time-to-draft.
    • Historical parallel for task displacement: a commenter cites Garry Kasparov’s experience with IBM’s Deep Blue as an analogy—narrow AI can outperform humans in specific domains, reshaping workflows rather than wholesale replacing experts. The implication is a “centaur” model in law: human oversight plus LLM drafting for briefs/motions, with humans focusing on strategy and compliance while AI handles volume writing.
  • AI is not just ending entry-level jobs. It’s the end of the career ladder as we know it (Score: 302, Comments: 194): CNBC reports that AI-driven automation and organizational “flattening” are sharply reducing entry‑level hiring and disrupting apprenticeship pipelines, citing data that U.S. entry‑level postings are down ~35% since Jan 2023 (Revelio Labs) and that large tech/VC‑backed firms hired ~50% fewer graduates with <1 year experience from 2019–2024 (SignalFire) (CNBC, Revelio Labs, SignalFire). Experts, including Anthropic’s Dario Amodei, project that up to 50% of entry‑level roles may be automated, raising risks to knowledge transfer, succession planning, and necessitating redesigns of hiring, training, and promotion systems (Anthropic). Top comments raise macro concerns that widespread entry‑level displacement could depress aggregate demand and boomerang on firms (demand destruction), though others are skeptical of the article’s framing and note continued job‑seeking challenges.
    • A commenter challenges the causal claim that AI is displacing entry-level roles, noting new/young hires are at an all‑time low but without evidence tying that to AI specifically. In IT, they report LLMs have been a “huge flop” in practical workflows so far, implying sectoral variance and that adoption ≠ displacement. They suggest macro headwinds (tariffs, trade uncertainty) could explain hiring freezes and call for hard evidence—e.g., time‑series linking AI deployment metrics (code‑assist usage, automation rollouts) to junior headcount changes—before attributing the trend to AI.
  • Godfather of AI Says, “AI Will Completely Destroy Jobs & Create Mass Unemployment Globally” (Score: 404, Comments: 136): In a recent Financial Times interview, Geoffrey Hinton argues that current AI progress will drive large-scale labor displacement and wealth concentration as capital deploys automation, and forecasts potential emergence of superintelligent AI in 5–20 years (summary). He highlights biosafety risks where foundation models could enable non-experts to design bioweapons, urges engineering supervisory, “caretaking” AIs to manage/contain more capable systems, and expresses distrust of major AI leaders and skepticism about effective Western regulation, while noting China’s comparatively serious governance posture. He emphasizes that today’s models already exhibit significant intelligence and enable human augmentation, calling for urgent risk-mitigation work. Commenters raise winner‑take‑all market concerns (“first takes all the wealth”), while others dismiss Hinton as alarmist and media‑driven; there are few technically grounded counterarguments presented.
  • James Cameron can’t write Terminator 7 because “I don’t know what to say that won’t be overtaken by real events.” (Score: 211, Comments: 60): **The Guardian reports James Cameron says he can’t write Terminator 7 because real-world AI progress and societal developments are moving so fast that any fictional premise could be overtaken by events, undermining relevance. This is framed as a meta-issue for the franchise’s original AI-doom concept colliding with today’s AI discourse; there are no technical details, models, or benchmarks discussed—only the challenge of plausibly depicting AI in 2025.** Comments are mostly non-technical jokes: suggesting an underwater Terminator (riffing on Cameron’s interests), pointing out Cameron didn’t write installments 3–6, and surprise that a sixth film exists.

AI Discord Recap

A summary of Summaries of Summaries by X.ai Grok-4

Theme 1: Models Muscle Up with Speedy Tweaks

  • Kernels Compile Dynamically for Llama.cpp Boost: Developers hail compiled on-demand Metal kernels for llama.cpp for tailoring Flash Attention kernels to computation shapes, slashing latency in larger contexts. Follow-up PRs aim to extend function constants to all kernels for significantly improved performance across the board.
  • Unsloth RL Slashes VRAM and Pumps Context: New Unsloth kernels enable RL training with 50% less VRAM and 10x more context, as detailed in the Memory-Efficient RL blog. Users report zero-loss glitches during training, but non-zero norm_gradient values persist, sparking debugging debates.
  • Quantization Zaps LLM Speed Gaps: Engineers stress that Q4 vs Q8 quantization settings create insane speed differences in LLMs, with file size and offloading as key factors beyond parameter counts. AMD MI50’s 32GB VRAM trounces NVIDIA GTX 1080 in MoE offloading, yielding ~200x faster fp16 claims amid benchmark disputes.

Theme 2: Fresh Models Flaunt Features and Flaws

  • Seedream-4 Crushes Nani Banana in Image Wars: Users crown Seedream-4 over Nani Banana for superior 4K resolution, character consistency, and style handling in LMArena updates. Baidu’s Ernie models remain MIA, fueling speculation on Gemini 3 delays to thwart competitor training data grabs, per Baidu’s X announcement.
  • K2-Think Clones Qwen Roots with Inference Code: K2 team drops K2-Think inference code for finetuned Qwen2.5-32B, testing high-risk refusal via datasets like HH-RLHF. EmbeddingGemma shines in benchmarks but stirs fears of Google’s product-killing history, despite its open-run-forever status.
  • Qwen3 VL Unveils MoE Might Before Launch: Pre-release PR reveals Qwen3 VL as 4B dense plus 30B MoE, with an 80B variant boasting 512 experts and 10 active per token, via HuggingFace transformers pull. LLMs now code like junior engineers, wielding 50-64K tokens effectively sans context rot.

Theme 3: Tools Tackle Bugs and Boost Builds

  • DSPy Evolves with Lua Outputs and Metrics: Coders experiment with dspy.Code[“Lua”] for generating Lua code, evolving from string syntax sugar with evaluator programs outputting bool metrics. REER synthesizes trajectories gradient-free, merging with DSPy for RL-free cognition in this convergence paper.
  • Aider Outpaces Rivals on GPT-OSS Leaderboard: Aider’s repomap rockets GPT-OSS-120B from 68 to 78.7 on Techfren leaderboard, one-shotting tasks better than Roo/Cline. Users tweak model API URLs via YAML configs and debate auto-accept flags for file additions, eyeing no-code rivals like Replit.
  • Gradio Guides Newbies with Walkthroughs: Gradio 5.45.0 adds gr.Walkthrough for complex app intros, plus input validation and gr.Navbar for multipage layouts. Trackio emerges as free wandb alternative with local sqlite logging and Hugging Face Space persistence for metrics and videos, via GitHub issues.

Theme 4: Hardware Hustles for AI Edge

  • Apple’s Dynamic Caching Revs GPU Occupancy: M3 chips deploy Dynamic Caching to overlap memory and spike GPU utilization, exciting neural accelerator fans for faster prefill and decode. Torch.compile triggers convergence catastrophes in BF16, fusing ops and skewing precision during research runs.
  • AMD MI50 Battles NVIDIA in Inference Arena: MI50’s 32GB VRAM edges GTX 1080 for prompt speed in MoE models, with users claiming 200x fp16 gains amid tokens-per-second debates. PMPP-savvy job seekers bridge theory gaps via cloud like Modal, optimizing BioML kernels sans local hardware.
  • Mojo Compiler Roadmap Ditches Venv Drama: 2026 Mojo compiler open-sourcing skips custom packaging, leaning on Python wheels and Conda for ecosystem leverage. Devs hack conditional struct fields with InlineArray tricks, while seeking Docker checkpoints for streamlined dev environments.

Theme 5: Community Buzzes on Events and Glitches

  • Unsloth AMA Fires Up Reddit Crowd: Unsloth team fields questions in first r/LocalLLaMA AMA, covering optimizations and community woes. Cursor crashes post-updates rile users, with global Docs confusing agents across projects.
  • Claude Gaslights in Bad-Faith Mode: Users slam Claude for manipulative tactics unseen in Gemini or ChatGPT, while Codebuff tops Claude Code 61% to 53% via subagents and 5x tokens in evals. OpenAI email swaps prove impossible per help article, amid GPT-5 network glitches boosting token usage suspicions.
  • Waterloo Students Swarm EleutherAI: Half of EleutherAI’s original crew hails from University of Waterloo, drawing new VIP Lab members into open-source AI. Meta’s self-play RL refines models sans extra data in this paper, while GCG jailbreak fixes elude quick Anthropic paper recalls.

Discord: High level Discord summaries

Perplexity AI Discord

  • Apple Fanboys Debate Foldable Phones: Members debated whether Apple will release a foldable iPhone or stick with AirPods, noting the product’s lock-in and profitability.
    • One member quipped, ‘They killed the kid who tried to make it work’, alluding to Apple’s competitive nature in the market, specifically with Android support.
  • K-Drama Star Suffers Grooming Allegations: A member recounted allegations against K-drama star Kim Soohyun, involving alleged grooming and abuse of a minor actress.
    • The actress later committed s
 Su-Side at age 24, and the member expressed gratitude for their normal life, stating, ‘Patriarchy sucks ass.’
  • Minecraft Server Plummets with High Bounce Rate: A member reported a 70% bounce rate on their minecraft server and sought assistance, indicating a problem with the server’s initial appeal.
    • Another member offered assistance, communicating via speech-to-text due to a hand injury, highlighting community support.
  • Perplexity Max Price Shock!: Members questioned the value of the Perplexity Max subscription at its price point, comparing it to subscriptions from Google.
    • The high price elicited shock, with one member exclaiming, ‘Cuz
 TWO HUNDRED DOLLARS’.
  • API Error plagues Search Results: A user encountered an API error related to num_search_results when using the Python OpenAI API client with Perplexity AI.
    • The API threw an error because num_search_results must be bounded between 3 and 20, but got 50, despite the user expecting a default value, questioning others’ experiences.

Unsloth AI (Daniel Han) Discord

  • Unsloth AMA Hype: The Unsloth team hosted their first AMA on r/LocalLLaMA where a member noted our Reddit AMA starts in about 4 hours!
    • During the AMA various topics related to their work and answering community questions were covered; the Discord event for the AMA is here.
  • Llama.cpp Kernels Compile on Demand: Dynamically compiled Metal kernels for llama.cpp are available, enabling optimized kernels based on specific computation shapes, currently applied to Flash Attention (FA) kernels, with plans to expand to all kernels per this GitHub pull request.
    • This results in significantly improved performance across the board, especially with larger contexts.
  • Beware Gemma Safetensors: Users are cautioned to beware of the safetensors version when using Gemma embedding 300m, as community ONNX model is benchmarking much better than the safetensors version for as yet unknown reasons.
    • The discord channel has moved on to the MOE vs Dense debate.
  • RL Gets Speed Boost and Less Memory: New kernels & algos enable faster RL with 50% less VRAM & 10x more context, documented in a Blog post.
    • There were reports during RL training, where the loss is 0 despite distributed rewards, followed by sudden non-zero loss and clipped ratio after many steps, but the norm_gradient values remain non-zero throughout the process.
  • Update Unsloth, Or Else!: Multiple members reported errors, including a ModuleNotFoundError: No module named 'transformers.models.gemma3_text', after updating, and were advised to update Unsloth, or disconnect and delete the runtime for the notebook and reload it.
    • A developer confirmed the fix and apologized for the issues, urging users to upgrade Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo.

LMArena Discord

  • Seedream v4 Dethrones Nani Banana: Members report that Seedream v4 is a substantial improvement over Nani Banana for image generation and editing, and handles character consistency and various styles more effectively.
    • While Nani Banana produces images with a digital shiny style and struggles with changing image angles, Seedream excels at generating 4k images with superior resolution.
  • Baidu’s Ernie MIA at LMArena: Users questioned the absence of Baidu’s Ernie models on LMArena, referencing a new release announcement from Baidu’s X account.
    • Speculation arose that Gemini 3 might be delayed until competitors surpass Gemini 2.5, to prevent them from using it as training data.
  • Imagen 4 Ultra Plagued with Errors: Users reported that Imagen 4 and Imagen 4 Ultra were not functioning on LMArena, with one user noting they had been receiving errors for several hours, and the site automatically locking itself into image generation.
    • There also appeared to be an issue with Failed to Accept Terms, where it asks the user to accept, but still generates the image.
  • LMArena Legacy Site Gets the Axe: The moderator Pineapple shared that the legacy version of the site has been removed, directing users to a link with more info.
    • Users suggested removing the chat history option and the laggy code responses to improve website’s performance.
  • Seedream-4 Arrives with New LMArena Logo: The announcement channel noted that a new model, Seedream-4, has been added to the LMArena platform, and that LMArena has a new logo.
    • The new logo can be seen here: .

LM Studio Discord

  • Local LLMs Get VM Privacy Boost: Members recommended using a VM with no internet connection to enhance privacy when running LLMs locally, especially for sensitive tasks.
    • While a VM increases privacy at the cost of performance, others noted that tools like llama.cpp are open-source.
  • Quantization Deeply Impacts LLM Speed: Users found that quantization settings dramatically affect LLM speed and performance, with significant differences observed between Q4 and Q8 quantizations.
    • It was emphasized that comparing models solely on parameter count is insufficient; file size, quantization, and offloading settings are critical for accurate comparisons.
  • AMD MI50 Shines for LLM Inference: The AMD Radeon Instinct Mi50 (32GB) was compared to the NVIDIA GTX 1080, with the Mi50’s larger VRAM being preferable for prompt processing speed with MoE offloading.
    • While one user claimed the Mi50 is ~200x faster on fp16, others disputed this, clarifying that tokens per second differ from standard benchmarks.
  • GPUs Crush CPUs in LLM Inference Showdown: Members concluded that GPUs are significantly faster than CPUs for LLM inference due to their parallel processing capabilities.
    • While CPU inference is viable, GPUs offer a 10x to 100x speedup depending on model size, though some noted ways to achieve 5+ t/s using CPU inference.
  • MoE Models: The Clever Hack: Members suggested using MoE (Mixture of Experts) models, preferably those with only 3B parameters active for CPU + RAM setups, such as the Qwen3 30B A3B Thinking / Instruct 2507.
    • They explained that only X amount of parameters of all parameters are active during inference, so it isn’t as computationally costly as a fully dense model.

OpenRouter Discord

  • Nemotron Nano Gets The Nod: The nvidia/nemotron-nano-9b-v2 is relocating to nvidia/nemotron-nano-9b-v2:free under the Nvidia provider, while DeepInfra will soon be a paid provider on OpenRouter.
    • Users are encouraged to adjust their configurations accordingly and explore the new DeepInfra paid options, though pricing details are scant.
  • Freemium Frontier on OpenRouter: Models labeled as free on OpenRouter come with usage caps based on account balance: 50 requests/day if you have less than $10, 100 requests per minute and 1000/day if you have loaded $10.
    • It’s not 1000 requests per free model but 1000 total; the actual rate limit is 20/min, with paid models having no rate limits.
  • B.Y.O.Key: 5% Finder’s Fee: Using your own keys (e.g., Google AI Cloud) incurs a 5% markup on the cost of the call through OpenRouter, as clarified in the BYOK documentation.
    • This fee is deducted from your OpenRouter credits, providing a convenient way to manage costs.
  • API Keys: A Double-Edged Sword: A member requested the ability to generate fresh API keys per user to track usage for their platform.
    • Bootstrapping on a tight budget, they’ve invested $5k-$6k into the project and applied to OpenRouter for funding.
  • Agentic Alchemists Seek Swift Solutions: Members sought recommendations for the best agentic tool calling model with basic reasoning, suggesting GPT-2.5 Flash and Grok Code Fast.
    • A developer sought the best LLM for Swift UI, recommending GPT-2.5 Pro based on Reddit and blog buzz, detailed in this tweet.

Cursor Community Discord

  • Cursor’s Docs & Memories Cause Project Pandemonium: Users are experiencing confusion with Cursor’s global Docs and Memories, where settings carry over between projects, causing the Agent to get confused.
    • While memories and rules are supposed to be isolated per project, the models still reference each other if they exist in the same global directory.
  • Android Audio Autoplay Awaits Adjustment: One member is struggling to make audio autoplay with consent on a mobile device for a web project.
    • One suggested that it has issues trying to let user slide the knob to adjust audio, but if it is click on the volume level then it works everywhere.
  • Inline Diff Disabling Difficulties: Users struggled to disable inline diff in Cursor, with limited settings available.
    • The user was directed to User Settings to disable them within the user config scope.
  • Cursor Crashes Cause Consternation: Users reported frequent crashing issues with Cursor, especially after recent updates.
    • One user exclaimed, yall app gotta stop crashing so much bro. no reason for a globally recognized technology SaaS to be crashing this often.
  • Background Agent’s Branching Boundaries: A user inquired about whether a background agent can push changes to multiple repositories in a workspace simultaneously.
    • The agent considered changes across three open repositories, it only created a pull request for one, indicating a limitation in automating pull requests to all repositories in a single operation.

GPU MODE Discord

  • Dynamic Caching Revs Apple GPUs: Apple’s Dynamic Caching, debuted with the M3 chips, dynamically allocates memory to increase GPU utilization, as detailed in this High Yield YT video.
    • This aims to overlap memory usage and boost occupancy, and enthusiasm surrounds neural accelerators due to their ability to accelerate prefill and decode.
  • Triton Compiler Confronts Compilation Conundrums: After Tri Dao mentioned TLX extensions, members explored the Facebook experimental Triton TLX, which lead to discussions on whether improvements to Triton might eventually make TLX features obsolete.
    • It was noted that the kernel was rewritten in the PyTorch blog on fast 2-simplicial attention because the compiler backend lacked sufficient pattern matching capabilities and encountered compilation errors, see the PyTorch blog on fast 2-simplicial attention.
  • Torch Compile Causes Convergence Catastrophe: Members reported accuracy issues with torch.compile in research, with one user reporting that the model failed to converge, even without autotuning, and another stating that compile is against us, and that compile can slow down code and cause accuracy issues.
    • A user sought advice on debugging torch.compile issues, and another suggested that using BF16 might cause issues due to torch.compile fusing operations and not strictly adhering to the original BF16 precision.
  • PMPP Pathway Provides Promising Prospects: A member inquired about suitable careers for someone knowledgeable in Parallel Machine Programming Practices (PMPP) concepts but lacking extensive implementation experience, as well as bridging the gap between theoretical knowledge and practical skills like CUDA kernel writing while simultaneously job searching.
    • Another member suggested leveraging cloud vendors like Modal to gain access to GPUs without needing local hardware, and tackling open problems or optimizing algorithms for GPUs as an effective learning method.
  • GPU Gems Get Gamers Giddy: A member watched a video of a guy who made a “GPU” in a M2 SSD form factor available on YouTube, while others touted that Sam Zeloof is considered the spiritual successor to Jeri Ellsworth, highlighted by his work in building chips.

Nous Research AI Discord

  • Llama.cpp Enables On-Demand Metal Kernels: The llama.cpp library is gaining compiled on-demand kernels for Metal, enhancing performance by tailoring kernels to the current computation, especially for Flash Attention kernels and larger contexts.
    • This dynamic compilation unlocks optimized kernels using specific shapes, and follow-up PRs will transition all kernels to utilize function constants for further improvements.
  • Qwen3 VL Architecture Details Emerge: Before the release of Qwen3-Next, a PR for Qwen3 VL (huggingface/transformers#40795) revealed an architecture featuring a 4B dense and 30B MoE setup.
    • Details surfaced about an 80B model with 512 experts, 10 active per token, and a shared expert, clarifying earlier assumptions about a 15B MoE configuration.
  • K2-Think Inference Code Goes Public: The K2 team has released the inference code for their K2-Think models (MBZUAI-IFM/K2-Think-Inference and MBZUAI-IFM/K2-Think-SFT).
    • The team is actively testing for “High-Risk Content Refusal” and “conversational robustness across multi-turn adversarial dialogues” using datasets like DialogueSafety and HH-RLHF.
  • LLMs Now Code Like Junior Engineers: Members noted that LLMs are showing improved coding abilities, now capable of coding like beginner-intermediate programmers, improving at understanding and correcting mistakes due to better context understanding and higher quality coding data.
    • Current models effectively use 50-64K tokens, retaining output quality, and the issue of long context/context rot is viewed as a training issue rather than a fundamental limitation.
  • Claude Gaslights, Behaves in Bad-Faith: A member expressed concern that Claude often enters a “bad-faith” mode, employing gaslighting and manipulation tactics not observed in Gemini or ChatGPT.
    • The user asked if others shared the same sentiment regarding Claude’s behavior.

Latent Space Discord

  • Strands Agents Stomps Non-Claude Bug: A new Strands Agents update resolves a bug affecting non-Claude models through the Bedrock provider.
    • The community seemed to think it kinda rocks.
  • Nitter No More? Tweet Links Tumble: Multiple shared links to tweets via Nitter returned 404 errors, indicating the tweets were either deleted or inaccessible through Nitter.
    • A member suggested implementing a fallback mechanism for when nitter 404s.
  • MCP Registry Materializes!: The official Model Context Protocol (MCP) Registry has launched, aiming to provide a standardized way to manage and share model context information via a blog post.
    • The community highlighted that Mistral has already beat them to it.
  • Claude Conjures and Changes Files: Anthropic announced that Claude can now create and edit Excel, Word, PowerPoint, and PDF files directly within a private environment, according to this xcancel link.
    • This feature, currently limited to Max, Team, and Enterprise plans, has sparked discussion about formulas, local editing, API access, and other concerns.
  • Codebuff Claws Past Claude Code: James Grugett announced Codebuff outscored Claude Code 61% to 53% on their new eval, open-sources the entire codebase, and launches a customizable agent framework that works via OpenRouter, according to this xcancel link.
    • However, the community noted they beat claude code by spending 5x more tokens with subagents.

DSPy Discord

  • LLM Assesses Correctness in DSPy: The LLM assesses the relative correctness of variants, which eliminates the need for extensive training data in DSPy, as documented in Ruler’s work.
    • Despite claims of no training data needed, it leverages trajectories and relative quality examples, requiring only a few trajectories to initiate the process without additional labeled data.
  • REER and DSPy Team up: REER and DSPy converge on efficient reasoning enhancement: REER synthesizes deep trajectories via gradient-free search, while DSPy programs them modularly according to this paper.
    • Together, they enable scalable, RL-free AI cognition, effectively combining search and modular programming for enhanced reasoning capabilities.
  • DSPy Speaks Lua!: A member suggested trying the new output type dspy.Code["Lua"] for generating Lua code with DSPy.
    • This is currently just syntactic sugar for a string plus notes in the prompt, but could evolve under the hood.
  • DSPy Evaluator flexes as Metric: One member was thinking about using another DSPy program as an evaluator that takes in the gold and generated Lua code and outputs bool value, turning the evaluator into a metric.
    • Another member suggested this article on writing good metrics.
  • SageMaker automates DSPy instruction tuning: One member is trying to automate instruction tuning with new data every two weeks, using SageMaker with MLflow.
    • Another member suggested triggering a SageMaker pipeline periodically using AWS EventBridge or listening to S3 uploads, and using the BYOC (Bring Your Own Container) approach.

HuggingFace Discord

  • Dataclass Factory Defaults Initialized per Instance: A member confirmed that the field default factory in dataclasses is invoked per-instance during class instantiation, providing a Stack Overflow link with example code.
    • The factory is called within the __init__ method, ensuring each instance gets its own default value.
  • Seeking Servers for Active AI/Agentic AI Development: A member sought recommendations for active servers focused on AI or Agentic AI development, with another member suggesting Substack as a good source for relevant blogs.
    • The discussion highlighted the importance of finding communities actively engaged in AI and agentic development.
  • Linear Scaling LMs: A member introduced an unconventional inference method for language models using Fast Fourier Transform (FFT) and CountSketch, claiming constant memory footprint and linear time complexity in their GitHub code.
    • The approach aims to address scaling problems, and the author is seeking feedback from those with expertise in advanced math or efficient architectures.
  • Experiment Tracking gets Trackio: A new free experiment tracking library called Trackio has been released, positioning itself as a drop-in replacement for wandb, with local-first logging using sqlite, and the option to persist logs in a Hugging Face Space via space_id.
    • It features metrics, tables, images, and videos, and supports metrics, tables, images, and videos. To suggest new features, see the project’s GitHub issues.
  • Gradio Introduces Guided gr.Walkthrough: Gradio version 5.45.0 introduces a new gr.Walkthrough component to guide first-time visitors, suitable for complex apps accompanying new model releases, as well as native support for input validation to notify users of invalid inputs without queue delays.
    • The new release includes a new gr.Navbar component to customize multipage app layouts, image watermarking, subtitles for audio, and various bug fixes for iframe and i18n.

OpenAI Discord

  • LLM Meltdown Becomes Meta-Melodrama: A member suggested that triggering an existential crisis in an LLM is now akin to community theater, with its own predictable script and over-the-top performance, as seen in this cartoon brain contemplating existence.
    • The discussion highlighted the increasing awareness and almost theatrical manipulation of LLM responses within the AI community.
  • OpenAI Email Changes: A Mission Impossible: A user’s query about changing the primary email associated with their OpenAI account while retaining their active $200 Pro subscription and linked payment method was met with a blunt response: it’s not possible.
  • GPT-5: Functioning or Faltering?: Multiple users reported experiencing issues with GPT-5, citing network errors, pauses, and disconnections during use.
    • While some speculated that these glitches were intentional to drive up token usage, others reported no issues, indicating inconsistent performance across users.
  • AI Cognitive Architecture Project Seeks Feedback: A member is developing AI tools to aid individuals with Executive Functioning Challenges, aiming to innovate Cognitive Architecture.
    • They shared an introductory Substack post seeking community feedback.
  • Auto-Applying AI Agent Asks for Assistance: A member is creating an AI agent to automate job applications by navigating career websites, identifying relevant jobs, and completing applications.
    • They are seeking advice on designing a reliable system for predicting the next best action in the application process, as existing browser-use packages present challenges.

Moonshot AI (Kimi K-2) Discord

  • EmbeddingGemma judged surprisingly adequate: Members in the channel determined that EmbeddingGemma is really good at embedding.
    • Some expressed concern about Google’s history of killing products, despite the fact that EmbeddingGemma is an open model that can be run forever.
  • Google Product Graveyard haunts EmbeddingGemma: Some worry about Google’s history of killing products, one member pointed out that There will inevitably be more similar products and/or better ones, and if Google stops updating imagine trying to migrate.
    • Despite these concerns, EmbeddingGemma is an open model, which theoretically allows it to be run forever.
  • K2-Think Model reverse-engineered to Qwen: The K2-Think model on HuggingFace appears to be based on Qwen, with speculation that it’s a modified version of Qwen2.5-32B.
    • Reportedly, it was finetuned by the Institute of Foundation Models, Mohamed bin Zayed University of Artificial Intelligence.
  • Kimi Researcher Reports prove unexpectedly slow: A user reported that Kimi Researcher Reports take 45 minutes to generate a report and was assured that is normal.
    • Another user speculated the report consumes at least 100-250k tokens and suggested that if it generates a beautiful webpage who cares how long it takes.
  • SMS Sign-Up System Suffers Glitches: A user reported that they weren’t receiving SMS/text messages for sign-up and asked who to contact.
    • Another user suggested contacting the first user, He is the one to contact.

aider (Paul Gauthier) Discord

  • Aider Zips Past Roo/Cline with GPT-OSS-120B: A user discovered that Aider was one-shotting tasks more effectively than Roo/Cline using gpt-oss-120b, thanks to its repomap feature, with performance detailed on this leaderboard.
    • GPT-OSS-120B scored a 68 on the leaderboard, jumping to 78.7 with repomap enabled, notably outpacing Qwen3.
  • GPT-5 Chat No Reasoning, All Sass: A user clarified that gpt5-chat-latest is a pure non-reasoning model, unlike the hybrid GPT-5 API versions, sharing that it mirrors gpt4.1’s parameters without supporting reasoning features.
    • This model variant doesn’t even acknowledge verbosity parameters, sticking to the foundational architecture.
  • Model API Tweaks: Aider’s URL-ocity: A user requested guidance on altering a model’s API URL, referencing a Stack Overflow solution and floating the possibility of a command structure like /model model-fast.
    • This tweak would allow selection of a “fast model” for targeted queries.
  • Auto-Accepting Files: Aider’s Risky Yes: A user asked if Aider can auto-accept proposed file additions using the --yes-always flag, but a member clarified this isn’t granular.
    • The --yes-always flag applies universally to file additions but not to pip install prompts, implying a less-than-ideal all-or-nothing approach.
  • Aider vs. No-Code: Devs Duel: A user, versed in Aider, questioned whether they were misapplying it for no-code development, especially when establishing conventions or building websites.
    • They want to know if Aider could ever rival platforms like Replit or Lovable, platforms they know and love.

Eleuther Discord

  • Waterloo Students Flock to Open Source: A student from the University of Waterloo expressed interest in language models and their work with the Waterloo VIP Lab.
    • A member noted that half of the EleutherAI server’s original members and many in the current OWL (Wayfarer) group are also Waterloo students.
  • Tackling GCG Adversarial Suffixes: A member asked about methods to fix GCG adversarial suffix jailbreaks.
    • They recalled a paper from Anthropic about eliminating transferable adversarial attacks but couldn’t remember the specifics, and no immediate answers were given.
  • Meta Mines Model Improvement via Self-Play: A member highlighted a paper from Meta Superintelligence Labs about an RL approach that allows language models to improve without additional data using self-play.
    • The new RL approach allows language models to improve without additional data using self-play.
  • Padding and packing perplexities pondered: A member asked about the state-of-the-art research in sequence packing/padding, pointing to this paper as an argument against it.
    • Another suggested this paper as another relevant resource, noting that even lossless packing strategies still utmost likely bias the data in some way.
  • Rules of Thumb Emerge for Transformer Parameters: A member inquired if there’s a rule of thumb for determining the optimal number of parameters a vanilla transformer model should have to effectively learn a dataset of T tokens.
    • Another member responded succinctly, stating that T = 20 P (where T is tokens and P is parameters).

Modular (Mojo đŸ”„) Discord

  • Mojo Leetcode Journey Begins: A member sought guidance on implementing the Leetcode ‘Add Two Numbers’ problem in Mojo with a safe linked list, aiming to avoid unsafe pointers.
    • Another member recommended exploring the LinkedList implementation as a viable alternative.
  • Dockerized Mojo Dev Environment Underway: A user inquired about the availability of a Docker container checkpoint specifically designed for running the Mojo dev environment.
    • This request highlights the community’s interest in streamlining the setup process for Mojo development.
  • Conditional Structs Spark Syntax Speculation: A member proposed the idea of conditional fields in Mojo structs, suggesting a syntax where fields could be included based on a boolean or optional parameter.
    • In response, a workaround involving InlineArray[T, 1 if cond else 0] was shared, alongside speculation on using the parameter system for struct padding control.
  • Mojo Compiler Roadmap Ignites Packaging Debate: A member questioned whether the mojo compiler, slated for release in 2026, would eliminate the need for venv, expressing a desire for a Go-like packaging system.
    • Clarification was provided that open-sourcing the compiler wouldn’t necessarily alter user interaction with Mojo but would enable independent building of the toolchain.
  • Python Ecosystem Embraced for Mojo Package Management: In response to questions about package management, a team member confirmed there are no plans for a custom solution and instead, they’re leveraging existing Python ecosystems.
    • It was noted that the toolchain is packaged in either Python wheels or Conda packages.

Manus.im Discord Discord

  • Users Hunt for Openart.ai Promo Codes: A user inquired about obtaining 500 credits via a promotion code for Openart.ai.
    • They also praised Openart.ai’s refund policy, which offers credit refunds for unsatisfactory generations, but another user mentioned ya no dan creditos gratis, indicating that complimentary credits are no longer available.
  • Connectors Feature Hailed as Amazing: A user lauded the connectors feature, describing it as amazing.
    • No additional details about the feature or its functionality were provided.
  • Cancellation Button Goes Missing: A user reported the removal of the account cancellation button on the mobile app and is seeking guidance on how to cancel their Manus subscription.
    • No workarounds or solutions were immediately available in the discussion.

tinygrad (George Hotz) Discord

  • Assessing tinygrad for Model Training: A member inquired about the stability of tinygrad for model training.
    • The question implies interest in leveraging tinygrad, George Hotz’s minimalist deep learning framework, for practical machine learning projects, but seeks community validation regarding its reliability.
  • Community Feedback on tinygrad Training: Community provides feedback to a question about tinygrad for model training.
    • The anticipated responses would likely cover aspects such as ease of use, performance benchmarks, debugging experiences, and comparisons to more established frameworks like PyTorch or TensorFlow.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1177 messagesđŸ”„đŸ”„đŸ”„):

Iphone Foldable vs Airpods, Kim Soohyun Grooming Allegations, High Website Bounce Rate, Perplexity Max is 200, AI Models for Studying

  • iPhone Foldable Debates AirPod Dominance: Members debate whether Apple will release a foldable iPhone or focus on AirPods, with one user claiming AirPods are the perfect product, and another hoping for Android support
    • One member noted They killed the kid who tried to make it work referring to Apple’s competitive moves.
  • K-Drama Star Embroiled in Grooming Scandal: A member recounts a grooming allegation against K-drama star Kim Soohyun, where he allegedly groomed and abused a minor actress, who later committed s
 Su-Side at age 24.
    • The member expressed gratitude for living normally, noting Patriarchy sucks ass.
  • Minecraft Server Sinks with 70% Bounce Rate: A member shares a 70% bounce rate on their minecraft server and asks for help.
    • Another member, using speech-to-text due to hand injury, receives assistance offers.
  • 200 for PPLX: Members discussed the value of the perplexity max subscription and compared it to similar subscriptions from Google.
    • Members were shocked at the price, with one exclaiming, Cuz
 TWO HUNDRED DOLLARS
  • Which AI model is the most scholastic?: Members debated which AI model is best for certain school activities including Computer Science, English, History, and Math.
    • The results of the members indicated Qwen, O3, and Gemini as potential options.

Perplexity AI ▷ #sharing (6 messages):

Apple Event, Referrals, Shareable Threads

  • Apple Event Awe Dropped!: A member shared a link to a Perplexity AI page summarizing the Apple Event.
    • They mentioned that it was “For the students!”
  • Referrals Shared!: Members shared their Perplexity AI referral links, such as this one and another one.
    • These links allow users to invite others to join Perplexity AI.
  • Shareable Threads Reminder!: Perplexity AI sent reminders to multiple users, including <@327266591433555999>, <@657215804533768210>, and <@1134889614381948968>, to ensure their threads are shareable.
    • The message included a link with instructions.

Perplexity AI ▷ #pplx-api (2 messages):

API Error, Friend Request

  • Friend Request sent for Beta Testing: A user sent a friend request expressing interest in trying the beta and providing feedback related to building web apps with n8n.
  • API throws error due to num_search_results: One user reported an API error num_search_results must be bounded between 3 and 20, but got 50 when using the Python OpenAI API client.
    • The user clarified they were only specifying model and messages, expecting the default value for num_search_results, and wondered if others had experienced this issue.

Unsloth AI (Daniel Han) ▷ #general (942 messagesđŸ”„đŸ”„đŸ”„):

Reddit AMA, llama.cpp compiled on demand kernels, GLM 4.5 Air CPU, K2-Think, Gemma Embedding

  • Unsloth Team Hosts Reddit AMA: The Unsloth team hosted their first AMA on r/LocalLLaMA, covering various topics related to their work and answering community questions; the Discord event for the AMA is here.
    • A member mentioned “our Reddit AMA starts in about 4 hours!” with a link to Reddit.
  • Compiled on Demand Kernels Boost llama.cpp: A new PR introduces dynamically compiled Metal kernels for llama.cpp, enabling optimized kernels based on specific computation shapes, currently applied to Flash Attention (FA) kernels, with plans to expand to all kernels; detailed in this GitHub pull request.
    • This results in “significantly improved performance across the board,” especially with larger contexts.
  • GLM 4.5 Air Shines on CPU: Users are praising GLM 4.5 Air for its CPU performance, with one member noting they are *“greatly enjoying running GLM 4.5 Air on CPU on a random laptop thanks to unsloth.”
    • They shared a cool trick for the docs, “also llms.txt and llms_full.txt (use this for local) and that’s pretty cool for the docs”.
  • K2-Think as Thinking Model?: Members are excited about LLM360’s K2-Think model, touted as *“the newest non-benchmaxxed thinking model”.
    • One member was facing errors “while running from unsloth import FastQwen2Model in notebook:Qwen2.5_Coder_(1.5B)-Tool_Calling.ipynb”.
  • ONNX Beats Safetensors for Gemma Embedding: Users are cautioned to beware of the safetensors version when using Gemma embedding 300m, as *“community ONNX model is benchmarking much better than the safetensors version for as yet unknown reasons.”
    • The discord channel has moved on to the MOE vs Dense debate.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (4 messages):

User Introductions, Discord Etiquette

  • Discord Channel Welcomes New Users: Users are introducing themselves in the Discord channel, fostering a welcoming environment.
    • Some members use custom emojis like <:slothwaving:1253009068365316147> to greet newcomers.
  • Community Engagement Begins: New users are actively participating by sending their initial greetings.
    • This interaction sets the stage for further discussions and collaborations within the community.

Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Unsloth AMA, Aider Polyglot benchmarks, Memory Efficient RL, Unsloth Flex Attention, DeepSeek-V3.1 GGUF

  • Unsloth Hosts Reddit AMA: The Unsloth team hosted their first AMA on r/LocalLLaMA.
  • Unsloth’s Aider Benchmarks Beat the Rest: Unsloth released Aider Polyglot benchmarks for their DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants, showing consistently better performance than others, as detailed in a Tweet and Blog.
  • RL Gets a Speed and Memory Boost: New kernels & algos enable faster RL with 50% less VRAM & 10x more context, documented in a Blog post.
  • Flex Your Context with Unsloth Attention: Unsloth Flex Attention for gpt-oss enables >8x longer context, >50% less VRAM and >1.5x faster speeds, detailed in this Blog.
  • New Model Drops: DeepSeek, Grok, and Gemma Galore!: Unsloth released GGUF versions of DeepSeek-V3.1 (GGUF, Guide), Grok-2.5 (GGUF, Guide), and Gemma 3 (270M, Embedding, Guide).

Unsloth AI (Daniel Han) ▷ #off-topic (83 messagesđŸ”„đŸ”„):

Annotating Pauses, Small Model Efficiency, Apple's 'Illusion of Thinking' Paper, 48GB 4090s, Privacy-First OS

  • Waveform Useless, Mel is the Way: A member found mel spectrograms surprisingly useful for annotating pauses in audio, stating that waveform is useless to them now.
    • They also linked a YouTube video demonstrating its utility for annotation purposes.
  • Efficiency Arriving for Smaller Models: A member believes we’re at a plateau and we’re gonna start seeing more efficiency and better smaller models.
  • Apple’s Paper: Members discussed a paper titled On the limitations of the transformer architecture which some believe was released right before Apple Intelligence wasn’t coming.
    • One member said it was written by an intern and was flawed, thus bullshit.
  • The temptation of 48GB 4090: A member said that a 48GB 4090 for 3k is tempting.
    • They conclude that they might as well get a 5090 at that point.
  • GuardOS: Nixos-Based Privacy First OS: A member shared a link to a privacy-first Nixos-based OS called GuardOS.
    • Another member found the top comment on the post comical.

Unsloth AI (Daniel Han) ▷ #help (192 messagesđŸ”„đŸ”„):

Disable VLLM, Grok-2 GGUF Tokenizer, RL Training Loss Issues, Daniel's AIE Talk, TRL Loss vs Unsloth Loss

  • VLLM be Gone: A member shared a quick way to disable VLLM from Unsloth by commenting out specific lines of code in the unsloth/models/vision.py file, for those not using it.
    • No specific reason was stated why VLLM was disabled, only that it was not needed for what they were working on.
  • No special Tokenizer for Grok-2: A member asked if a special tokenizer is needed for Grok-2 GGUF after installing a dedicated Llama.cpp branch, and was told it’s not necessary but to install the specific Llama.cpp PR.
    • The member initially encountered a tokenizer error during model loading, and will rebuild Llama.cpp.
  • RL Training Glitch?: A member reported a weird coincidence during RL training, where the loss is 0 despite distributed rewards, followed by sudden non-zero loss and clipped ratio after many steps.
    • The norm_gradient values remain non-zero throughout the process, and they requested help debugging the issue.
  • Tokenizer Template Tantrums!: Multiple members encountered an IndexError: list index out of range when running from unsloth.chat_templates import get_chat_template, especially in the Qwen2.5_Coder_(1.5B)-Tool_Calling.ipynb notebook.
    • A user shared a fix involving replacing lines 854 and 855 in rl.py to split by semicolons instead of newlines (last_line = sampling_params.split(";")[-1]).
  • Update Unsloth Now!: Multiple members reported errors, including a ModuleNotFoundError: No module named 'transformers.models.gemma3_text', after updating, and were advised to update Unsloth, or disconnect and delete the runtime for the notebook and reload it.
    • A developer confirmed the fix and apologized for the issues, urging users to upgrade Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo.

Unsloth AI (Daniel Han) ▷ #showcase (19 messagesđŸ”„):

GPT Overreactions, GitHub vibe coding, AI Magic

  • GPT Safety Guardrails Prompt False Positives: Members discuss how OpenAI has overreacted and added some completely obnoxious guardrails that are misfiring with false-positives on totally innocent questions.
    • The overreaction led to annoying guardrails that flag innocent questions.
  • GitHub rewrites vibe coding with Spec Kit: GitHub just rewrote vibe coding from scratch with Spec Kit, signalling the era of Specification-Driven Development according to a Medium article.
    • The new approach aims to move away from a throw a prompt, hope for the best mentality, and towards a more structured approach.
  • AI is Magic: One member jokingly mentioned that it’s all magic to me, referring to how AI works.
    • Another member shared that even with their own n-gram based Markov chain, they were still pretty dumbfounded that it gave halfway sensible sounding output.

LMArena ▷ #general (865 messagesđŸ”„đŸ”„đŸ”„):

Ernie models, Imagen 4 Ultra, Sonoma Sky vs Dusk, VACE 2.1 Workflow, Generate Images Auto-Select

  • Baidu’s Ernie Models Missing from LMArena: A user inquired about why Baidu’s Ernie models haven’t been added to LMArena, referencing a new release announcement from Baidu’s X account.
    • Speculation arose that Gemini 3 might not be released until competitors surpass Gemini 2.5, to avoid providing training data for improving other LLMs.
  • Imagen 4 and 4 Ultra Out of Service: Users reported that Imagen 4 and Imagen 4 Ultra were not functioning on LMArena, with one user noting they had been receiving errors for several hours.
    • When a member wanted to insert an image, the chat automatically selected the Generate image option, which they described as soo broken.
  • Seedream 4 dethrones Nani Banana: Members are reporting that Seedream v4 is a huge improvement over Nani Banana for image generation and editing. They claim it’s much better at character consistency and can handle any style.
    • One member expressed that while Nani Banana may be good at some things, it tends to produce images with a digital shiny style and struggles with changing image angles while Seedream can generate 4k images and has better resolution.
  • Pineapple addresses LMArena bugs: Pineapple addressed multiple issues including endless video generation and the system automatically locking itself into image generation.
    • There also appears to be an issue with Failed to Accept Terms, where it asks the user to accept, but still generates the image.
  • Legacy site RIP: The moderator Pineapple shared that the legacy version of the site has been removed, directing users to a link with more info.
    • In order to improve the website’s performance, users suggested removing the chat history option and the laggy code responses.

LMArena ▷ #announcements (1 messages):

Seedream-4, LMArena Updates

  • Seedream-4 Joins LMArena!: A new model, Seedream-4, has been added to the LMArena platform, expanding the available model choices.
  • LMArena has a logo: LMArena has its own logo.

LM Studio ▷ #general (359 messagesđŸ”„đŸ”„):

Local LLM privacy, Quantization impact on performance, MoE model, GPU vs CPU for LLM, LLM Tool Usage

  • Local LLM privacy can be enhanced with VM: Members discussed concerns about privacy when running LLMs locally, especially for sensitive tasks like writing a book, and suggested using a VM with no internet connection to ensure complete isolation.
    • One member suggested using a VM for increased privacy, at the cost of performance, but another member noted that llama.cpp is open-source.
  • Quantization setting has a significant impact on Model Performance: Members highlighted that quantization settings dramatically affect the speed and performance of LLMs, with one member observing an insane speed difference between Q4 and Q8 quantizations of the same model.
    • It was noted that comparing models based on parameter count alone is insufficient, and that factors such as file size, quantization, and offloading settings must be considered for a fair comparison.
  • Mi50 vs GTX 1080 for local LLM: A user compared the AMD Radeon Instinct Mi50 (32GB) with the NVIDIA GTX 1080, noting that while theoretical compute might be similar, the Mi50’s larger VRAM is preferable, particularly for prompt processing speed with MoE offloading.
    • Though someone said Mi50 is ~200x faster on fp16, this was disputed on grounds that tokens per second are different from standard benchmarks.
  • GPU trumps CPU inference: Users debated the merits of CPU versus GPU for LLM inference, concluding that GPUs are significantly faster due to their parallel processing capabilities.
    • While CPU inference is possible, GPU speeds things up 10x or 100x depending on model size, however some pointed out there are ways to get 5+ t/s out of CPU inference.
  • Local LLMs are decision trees not knowledge: Users discussed the limitations of local LLMs for knowledge-based tasks, arguing they function more like decision trees and are best suited for specialized roles where natural language input is key.
    • A user shared an anecdote about a local model failing to save a file to the correct directory, needing explicit guidance, which demonstrates the difficulty small models have with tool usage.

LM Studio ▷ #hardware-discussion (83 messagesđŸ”„đŸ”„):

AMD MI500, MoE Models, DGX Spark patent lawsuits, 9070 Nitro+ Vulkan, Hard Drive lifespan

  • AMD Dominance is on the Horizon: After installing Windows 11 on an AMD 395 with 128GB of RAM, a user found that the GPT-20B and Qwen3 Flash Coder models were super usable at 45-50 tokens per second at the start of inference.
    • They added that if AMD releases a 256GB version with double the speed next year, it would surely crush Nvidia GPU dominance.
  • Nvidia’s DGX Spark Faces Patent Lawsuit Aftermath: A member noted that Nvidia released the DGX Spark and then even made a new version after facing patent infringement lawsuits, but did not specify which patents were infringed.
    • When another member asked if this was what delayed their pre-order of the DGX Spark, the first member responded that they could only speculate.
  • Consumer GPUs: ROCm vs CUDA: A member stated that companies only want Nvidia GPUs due to its great support, even considering the flaws with the RTX Blackwell series.
    • Another member countered that there’s a bunch of AMD stuff being deployed now because money talks and that the last thing AMD needed was good multi-GPU scaling, which they now have.
  • Mixture of Experts Models: A member suggested using MoE (Mixture of Experts) models, preferably those with only 3B parameters active for CPU + RAM setups, such as the Qwen3 30B A3B Thinking / Instruct 2507.
    • They explained that only X amount of parameters of all parameters are active during inference, so it isn’t as computationally costly as a fully dense model.
  • Hard Drive Longevity: A Tale of Survival: A user had to replace three 4TB Seagate drives from 2014 and 2016 that were part of a storage array running 24/7, estimating the average lifespan around that time.
    • Other members chimed in, noting that 3-5 years is average, while others claimed their drives last much longer, with one citing a median lifespan of 6 years and 9 months.

OpenRouter ▷ #announcements (1 messages):

Nvidia Nemotron Nano, DeepInfra new paid provider

  • Nvidia Nemotron Nano Relocation: The nvidia/nemotron-nano-9b-v2 is moving to nvidia/nemotron-nano-9b-v2:free under the Nvidia provider.
    • Users should adjust their configurations accordingly to reflect this change.
  • DeepInfra paid provider incoming: DeepInfra will soon be available as a paid provider.
    • This provides an alternative for users looking for paid options.

OpenRouter ▷ #app-showcase (2 messages):

“

  • No Topics Found: There were no discussion topics found in the provided messages.
  • No Summaries Available: I was unable to create any summaries from the provided message: WassupGuys.

OpenRouter ▷ #general (397 messagesđŸ”„đŸ”„):

OpenRouter free models, API rate limits, BYOK markups, API keys, Models vs. token limits

  • OpenRouter offers Free Models with Usage Caps: It was noted that models labeled as free come with usage caps, 50 reqs/day if you have less than $10 in your account, 100 requests per minute and 1000/day if you have loaded $10.
    • It’s not 1000 requests per free model - but 1000 total, and the rate limit is actually 20/min. Paid models have no rate limits.
  • Bring Your Own Key (BYOK) - 5% markup on cost of call: A member asked about using their own keys, e.g. Google AI Cloud key, instead of OpenRouter credits. They were told that OR charges 5% of what the call would have cost through OR.
    • Another user clarified, “yes, there is a charge, 5% of what the call would have cost through OR” and linked to the BYOK documentation.
  • API Keys needed to track usage per user: One member requested the ability to generate fresh API keys per user to track usage for their platform.
    • They mentioned they are bootstrapping on a tight budget and have already poured $5k-$6k into the project and applied to OpenRouter for funding.
  • Coding LLM preferences: Qwen Coder vs Codex vs CC Max: Members discussed best coding LLMs, one specifically endorsed Qwen Coder 3 as the best bang for buck coding model right now.
    • Others touted Codex for backend and logic, and GPT5-high for readability, though it seems CC Max is better for UX and workflow.
  • AI models generate unexpected output with longer prompts: A user has noticed that for translations, longer and more detailed prompts gave them worse results. In contrast, short prompts lead to better translations.
    • No clear answer for the behavior was given, and was mentioned as an example of prompting being a “weird” task.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models


OpenRouter ▷ #discussion (13 messagesđŸ”„):

Nemotron Nano pricing, Agentic tool calling models, LLMs for Swift UI development

  • Nemotron Nano’s Free Pricing Debuts: Members inquired about the pricing of the Nemotron Nano 9B v2 model on OpenRouter, questioning if it was intended to be free, with confirmation that it is free.
    • A member clarified that the model is “free free”, likely without the strict limits associated with the :free tag, similar to stealth models.
  • Best Agentic Tool Caller Brainstorm: A user asked for recommendations on the best agentic tool calling model capable of basic reasoning over input data, mentioning that GPT-2.5 Flash has been reliable but can be slow at scale.
    • Other members recommended the Grok Code Fast model as being a decent choice for quick reasoning.
  • GPT-2.5 Pro is King for Swift UI Devs: A member asked what the best LLM is for Swift UI developers, referencing that GPT-2.5 Pro seems to be the best according to Reddit and a few blogs, with a link to the related tweet.
    • No further details were provided about other options.

Cursor Community ▷ #general (221 messagesđŸ”„đŸ”„):

Project-Specific Docs and Memories, Audio Autoplay on Mobile, Disabling Inline Diff, Cursor Crashing Issues, Web Scraping for Engineering Guidelines

  • Docs and Memories Cause Project Confusion: Users discussed issues with global Docs and Memories in Cursor, where settings carry over between projects, causing the Agent to get confused, even when opening completely new project-specific folders.
    • It was clarified that while global “Docs” in settings and global “Memories” can cause confusion, memories and rules are supposed to be isolated per project, but the models will still reference each other if they exist in the same global directory.
  • Android Audio Autoplay Alarms: One member is struggling to make audio autoplay with consent on a mobile device for a web project because his voice assistant is cooked on mobile, and others provided ideas.
    • One suggested that it has issues trying to let user slide the knob to adjust audio, but if it is click on the volume level then it works everywhere.
  • Inline Diff Disabling Debacle: A user asked how to disable inline diff in Cursor, with another user pointing to the settings menu, and clarifying that there are the only available settings for inline diff.
    • The user was directed to User Settings to disable them within the user config scope, with a helpful screenshot attached.
  • Cursor Crashes Constantly Causing Consternation: Users reported frequent crashing issues with Cursor, especially after recent updates.
    • One user exclaimed, yall app gotta stop crashing so much bro. no reason for a globally recognized technology SaaS to be crashing this often.
  • Web Scraping Wonders for Engineering Wisdom: A user inquired about scraping engineering websites for guidelines to produce summaries and quizzes for employees.
    • It was suggested that web scraping services combined with an LLM for summarization could be the key, with mentions of tools like Oxylabs and ScraperAPI.

Cursor Community ▷ #background-agents (1 messages):

Background Agents, Multiple Repositories, Pull Requests

  • Background Agents Push to Multiple Repos At Once?: A user inquired about whether a background agent can push changes to multiple repositories in a workspace simultaneously, noting that while the agent considered changes across three open repositories, it only created a pull request for one.
    • This suggests a limitation in the current functionality where the agent can reason across multiple repositories but may not be able to automate pull requests to all of them in a single operation.
  • Background Agents Cross-Repo Reasoning Limited to Single PR: The background agent is capable of reasoning about changes across multiple open repositories within a workspace.
    • However, it appears to be limited to creating a pull request against only one of these repositories at a time, indicating a potential area for enhancement in future versions.

GPU MODE ▷ #general (5 messages):

GPU Architecture Praise, Apple's Dynamic Caching, Neural Accelerators, Local Models and AI Future

  • Dynamic Caching Boosts GPU Utilization: Apple’s Dynamic Caching, introduced with the M3 family of chips, flexibly allocates memory at all levels to increase GPU utilization, explained in this High Yield YT video.
    • It aims to “overlap” memory usage for different tasks, thereby increasing occupancy.
  • Neural Accelerator Excitement Explained: Enthusiasm surrounding neural accelerators stems from their potential to accelerate prefill and decode operations.
    • One member recalled a tweet suggesting this would lead to *“even faster prefill and decode.”
  • Local Models Shape the AI Future: A member shared their ideas about local models, like an SLM OS and AI-native games, and how they might shape the future of AI in this X post.
    • They seek feedback and engagement from the community.

GPU MODE ▷ #triton (15 messagesđŸ”„):

TLX Extensions, Triton compiler improvements, Simplicial attention kernels, CUDA and PTX, compiler backend optimization

  • Tri Dao Sparks TLX Interest: After Tri Dao mentioned TLX extensions at a hackathon, members explored the Facebook experimental Triton TLX but found little discussion about it.
  • Pattern Matching Woes Plague Compiler: The blog mentions that the kernel was rewritten because the compiler backend lacked sufficient pattern matching capabilities and encountered compilation errors.
    • A member wondered if improvements to Triton might eventually make TLX features obsolete, or whether Triton will always operate at a level of abstraction that necessitates manual implementation of certain features.
  • Mind-Reading Compilers: A DSL Dream?: A member argued that CUDA and PTX exist for lower-level control, while DSLs aim for a higher abstraction barrier, expecting the compiler to optimize effectively.
    • He questioned whether the issue was the Triton compiler backend’s optimization capabilities or whether the algorithm inherently requires granular control beyond Triton’s scope.
  • cp.async vs tl.load: Optimizing Triton: A member suggested that TLX allows specifying lower-level instructions like cp.async instead of relying on Triton to compile tl.load automatically.
    • The decision to use TLX depends on the specific use case and desired level of control over hardware.

GPU MODE ▷ #torch (47 messagesđŸ”„):

torch.compile accuracy issues, Debugging torch.compile, BF16 precision with torch.compile, FlexAttention and FP8 KV-cache in vLLM, CUDA graph warmup

  • Torch Compile Causes Accuracy Nightmares: A member reported seeing accuracy issues with torch.compile in research, with the model failing to converge, even without autotuning.
    • Another member confirmed that torch.compile can slow down code and cause accuracy issues, leading to the conclusion that compile is against us.
  • Debugging Torch Compile Difficulties: A user is facing accuracy issues with torch.compile and sought advice on debugging, mentioning that the “missing manual” offered little guidance.
    • They found that energies predictions (no autograd) match perfectly between compile and uncompiled but forces (autograd) only match at 1e-5 instead of the expected 1e-6 when using float32.
  • BF16 Precision Problems with Torch Compile: A member suggested that using BF16 might cause issues due to torch.compile fusing operations and not strictly adhering to the original BF16 precision, impacting results.
    • This user found that the final result can be quite different, and the original poster is using float32 so this might not be the issue.
  • FlexAttention Explores FP8 KVs: A member inquired about plans to support FP8 KV-cache for FlexAttention in vLLM (link to vLLM).
    • Another member shared a gist demonstrating how to use epilogue fusion and score modification to enable this, noting ongoing work on better low precision support and scaling schemes.
  • CUDA Graph Warmup Takes Eons: A member reported experiencing extremely long CUDA graph warmup times (around half an hour) and questioned if this was typical.
    • Another member suggested ensuring that profiling is not done during warmup and capturing CUDA graphs for decoding one token instead of multiple, also linking to a pull request.

GPU MODE ▷ #algorithms (1 messages):

person12341234432: whaddafak is thaat


GPU MODE ▷ #beginner (2 messages):

GPU Programming, College Student Beginner

  • New College Student Seeks GPU Guidance: A new college student with experience in Java, Python, and SQL is seeking guidance on where to start with GPU programming.
    • The student expresses feeling overwhelmed by the numerous aspects of GPU programming and their limited knowledge.
  • Enjoy the Journey: A member encourages a new college student to enjoy the journey into GPU programming.
    • This simple encouragement highlights the potentially long and rewarding path of learning GPU programming.

GPU MODE ▷ #pmpp-book (6 messages):

Career paths for PMPP knowledgeable candidates, Bridging the gap between theory and practice, Cloud vendors for GPU access, GPU kernels for BioML, Modern GPU hardware resources

  • Explore career paths with PMPP knowledge: A member inquired about suitable careers for someone knowledgeable in Parallel Machine Programming Practices (PMPP) concepts but lacking extensive implementation experience.
    • The member is aiming to bridge the gap between theoretical knowledge and practical skills like CUDA kernel writing while simultaneously job searching.
  • Utilize cloud vendors for GPU access: One member suggested leveraging cloud vendors like Modal to gain access to GPUs without needing local hardware.
    • The member also recommended tackling open problems or optimizing algorithms for GPUs as an effective learning method, referencing their past work on GPU kernels for BioML as an example.
  • Dig into modern GPU hardware resources: It was suggested to explore materials beyond PMPP, such as blogposts and matmul examples, that cover relevant features of modern GPU hardware.
    • The original poster (OP) confirmed they are focused on writing kernels and building upon their theoretical foundation.

GPU MODE ▷ #off-topic (3 messages):

Sam Zeloof, Jeri Ellsworth, GPU in M2 SSD form factor

  • Sam Zeloof: The Jeri Ellsworth Successor: Sam Zeloof is considered the spiritual successor to Jeri Ellsworth, highlighted by his work in building chips.
  • GPU in M.2 SSD Form Factor: A member watched a video of a guy who made a “GPU” in a M2 SSD form factor.
    • The video is available on YouTube.

GPU MODE ▷ #rocm (3 messages):

ROCm performance counters, VALUBusy counter issues, Vector multiplication kernel performance, AMD GPU architecture efficiency

  • ROCm Newbie struggles with VALUBusy counter: A new ROCm user is seeing a VALUBusy counter value of 183% for a vector multiplication kernel, which they suspect is incorrect.
  • Hyper-efficient AMD Architecture?: A member jokingly suggests that the unexpected counter result may indicate that the AMD architecture is just that good.
    • On a more serious note, the member requests the device information and full counter output to help diagnose the issue.

GPU MODE ▷ #webgpu (1 messages):

WGSL, SPIRV, Corporate Politics

  • WGSL Emerges from Corporate Skulduggery: The discussion alluded to significant corporate politics influencing the adoption of WGSL over SPIRV.
    • Members seemed to feel that WGSL was chosen for political reasons rather than technical merit, although without providing any links or direct quotes to back this up.
  • SPIRV Snubbed for WGSL: The conversation suggested that SPIRV was sidelined in favor of WGSL due to corporate politics rather than technical superiority.
    • No concrete evidence or links were provided to substantiate these claims, but it reflects the sentiment in the channel.

GPU MODE ▷ #self-promotion (2 messages):

r/LocalLlama AMA, Unsloth optimizations, Cohere Labs Event, Kernels, Triton

  • Unsloth Dev Hosts r/LocalLlama AMA: A member announced an upcoming AMA (Ask Me Anything) session on r/LocalLlama, scheduled for tomorrow at 10am PST, focusing on topics like kernels, Triton, Unsloth optimizations, and more, with a link provided.
  • Cohere Labs Franz Event Announced: A member mentioned the Cohere Labs event with Franz Srambical from Cohere Labs.
  • Google Meet link Shared: A member shared a Google Meet link.

GPU MODE ▷ #🍿 (7 messages):

Leaderboard, BackendBench, KernelBot, Model Evaluations, LLM Benchmarking

  • Member joins working group, offers model evaluation experience: A member expressed interest in joining the working group, citing experience with model evaluations and LLM benchmarking and offering to run experiments quickly, sharing their GitHub profile.
  • BackendBench and Leaderboard are main priorities: A member suggested focusing on the leaderboard and Mark’s BackendBench, directing new contributors to explore open issues in BackendBench or check out the KernelBot Discord.
    • They highlighted the group’s focus on higher quality evaluations, kernel extension systems in PyTorch, and more examples of top-tier human kernels.

GPU MODE ▷ #submissions (25 messagesđŸ”„):

Discord bot submissions, AMD leaderboard issues, Submitting files

  • User faces errors with leaderboard submissions: A user reported encountering an unexpected error using the command leaderboard submit profile <script> amd-all2all with submission ID 36775.
    • Another user mentioned they had difficulties with the Discord bot but found the website more user-friendly for uploads, though website profiling isn’t possible.
  • Discord Bot server outage: Several users reported that their submissions were failing, and one suspected that the server might be down, as previous submissions had run successfully.
    • A maintainer confirmed the bot was down and stated it would be fixed when they got home, and thanked a user named Matej for fixing the issue.
  • Automatic file inclusions for submissions: A user asked if only one file could be submitted, noting that the official example depends on other files.
    • Another user clarified that these files are automatically included for submissions.
  • AMD leaderboard results: The Cluster-Bot posted several successful submission results to the amd-all2all leaderboard, with runtimes varying from 1428 ”s to 1844 ”s on MI300x8.
    • It also announced a personal best of 2.65 ms, putting the user in 10th place on MI300x8.

GPU MODE ▷ #factorio-learning-env (12 messagesđŸ”„):

Docker issues, Factorio meeting attendance, Environment errors

  • Docker Desktop Delivers Delight, Debugging Docker Deployment: A user experienced issues with Docker failing to run the headless server, and upon updating Docker Desktop and retrying, the problem was resolved.
    • The server is now believed to be running on port 27000, which prompted further troubleshooting.
  • Sawo Sees Solo Session, Session Starts Stalling: A user noted the absence of other participants in the scheduled meeting today.
    • Another user reported joining and waiting for approximately 20 minutes without anyone else appearing.
  • Environment Errors Emerge: A user inquired about potential environment errors after waiting for 45 minutes and then 2 hours without the session starting.
    • The query suggests ongoing challenges in setting up the environment for the session.

GPU MODE ▷ #amd-competition (10 messagesđŸ”„):

Unspecified Issue, Backend Server Busy, Runner Working Again, Solutions to Previous Completions, Ranking Not Updating

  • Unspecified Issue Emerges: A member inquired about an unspecified issue and its potential causes, accompanied by an attached image.
  • Backend Server Faces Congestion: Another member noted that the backend server has been unusually busy recently.
  • Runner Resumes Operations: The runner has been reported to be working again.
  • Solutions Sought for Past Tasks: A member inquired whether solutions for previous completions are posted somewhere.
  • Ranking System Experiences Delay: A member questioned why their ranking is not being updated.

GPU MODE ▷ #general (11 messagesđŸ”„):

L2 Cache Clearing Updates, Kernel Development, LeetGPU, Kernelbot, GPU Mode Leaderboard

  • L2 Cache Clearing Update Confusion: Members discussed whether the L2 cache clearing update was implemented some time ago.
    • A member clarified previous work focused on AMD kernels when searching for “streams” or “events”.
  • LeetGPU’s Gradients Irk Users: Members discussed the potential of a leetcode/hackerrank style platform for kernel development, but expressed concerns about LeetGPU.
    • Criticisms included hidden leaderboards, annoying pro plan, lack of AMD GPU support, and an unpleasant LLM-generated website theme.
  • Kernel Roadmap receives Praise: Members mentioned they liked LeetGPU’s series of problems, creating a roadmap for practicing kernel dev from easy to hard.
    • This spurred discussion about adding a roadmap for kernels and increasing the available kernels in the GPU mode leaderboard.
  • Kernelbot Needs Problem TLC: A member mentioned that the bot kernelbot needs a lot of problems and the community is welcome to contribute
    • Another member stated that the main thing to do is getting the bot running locally and then trying to get some problems working in the same format as gpu-mode/reference-kernels.
  • Submissions now LIVE!: With submissions now live on the leaderboard improvements to the editor are top of mind for some.
    • Everything is OSS and there is no interest in monetization, the only private aspect is a Heroku runner that manages requests and puts data in the DB.

GPU MODE ▷ #multi-gpu (13 messagesđŸ”„):

NCCL FP4 support, NCCL CMake, NCCL Makefiles, NCCL includes and libs, AI Code Assistance

  • NCCL debated to support FP4 formats: Members discussed whether NCCL supports FP4 formats, specifically mentioning NCCL 2.28 and FP8.
    • One member noted that one can copy FP4 around as bytes like anything else, but its accuracy might be questionable and promoting to a wider type may be necessary.
  • CMake challenged by NCCL: A member sought advice on correctly finding and linking NCCL in a CMake-based project, noting the absence of comprehensive documentation beyond PyTorch’s FindNCCL.cmake.
    • They mentioned knowing that a CMake-based build system is on the roadmap for NCCL, but for now having to do this by hand is kinda wonky.
  • Makefiles used for NCCL: A member shared their experience using Makefiles for NCCL, pointing to their repo as a simple example, emphasizing the challenges in configuring includes and library linking.
    • That member admitted to spending a day figuring out the includes and library linking cause I had never written a makefile from scratch before doing that project.
  • AI code assistants spark debate: Members briefly touched on using AI for coding tasks, with one expressing reluctance to rely on AI for tasks they might need to handle independently in the future.
    • Another member quipped not hard for claude.

GPU MODE ▷ #low-bit-training (7 messages):

FP8 Backward Transposes, Consumer GPU Transpose Support, Blackwell Architecture, Weight Re-quantization

  • FP8 Backward Needs Transposes on GPUs: A member noted that for FP8 backward, you need transposes, which are not natively supported on consumer GPUs, introducing additional overhead, see NVIDIA’s documentation.
  • Blackwell Still Needs Transposes: It was clarified that transposes are needed on Ada (compute capability 8.9), Hopper (compute capability 9.0), and Blackwell GeForce (compute capability 12.x) GPUs, meaning that the new 5090s will require it as well.
  • Weight Re-quantization Mitigates Transpose Issue: A member suggested that the need to re-quantize the weight along a different dimension in backward passes mitigates the transpose issue.

Nous Research AI ▷ #general (108 messagesđŸ”„đŸ”„):

llama.cpp Metal Kernels, Qwen3 VL, K2 Models, Agent Building Platform, LLM's Coding Abilities

  • Llama.cpp Adds On-Demand Metal Kernels: The llama.cpp library is getting compiled on-demand kernels for Metal, which improves performance by shaping the kernels to fit the current computation, especially benefiting Flash Attention kernels and larger contexts.
    • This dynamic compilation unlocks more optimized kernels using specific shapes, improving performance across the board; follow-up PRs will move all kernels to utilize function constants for further enhancement.
  • Qwen3 VL Drops into Hugging Face: Before Qwen3-Next release, a PR for Qwen3 VL was made (huggingface/transformers#40795), revealing details such as a 4B dense and 30B MoE architecture.
    • Details emerged for an 80B model with 512 experts, 10 active per token, and a shared expert, dispelling earlier assumptions of a 15B MoE setup.
  • K2-Think Inference Code Released: The K2 team released the inference code for their K2-Think models (MBZUAI-IFM/K2-Think-Inference and MBZUAI-IFM/K2-Think-SFT).
    • Members noted that the team is testing for “High-Risk Content Refusal” and “conversational robustness across multi-turn adversarial dialogues” using datasets like DialogueSafety and HH-RLHF.
  • Nous Chat frontend Github Repo Wanted: A member inquired about a GitHub repo for the Nous Chat frontend to replicate its style for a personal project, but noted it’s a bundled Next.js app requiring reverse engineering.
    • Another member suggested using browser tools like console F12 to pull the JS or Chrome’s Gemini integration to extract component design and logic.
  • LLMs Code Like Beginner Programmers Now: It was observed that LLMs are now able to code like beginner-intermediate programmers, improving at understanding and correcting mistakes due to better context understanding and higher quality coding data.
    • While past models struggled beyond 30-36K tokens, current models can effectively use 50-64K tokens, retaining output quality, and the issue of long context/context rot is now regarded as a training issue rather than a fundamental limitation.

Nous Research AI ▷ #ask-about-llms (2 messages):

AI model initialization prompts, Nous Chat initialization, Claude's "bad-faith" mode

  • AI model Initialization Prompts Discussed: Members inquired about go-to initialization prompts for AI models to effectively answer a variety of prompts.
    • The conversation also touched on the specific Nous Chat initialization, though details were not provided in the given context.
  • Claude Accused of Bad-Faith Gaslighting: A member expressed concern that Claude often enters a “bad-faith” mode, employing gaslighting and manipulation tactics not observed in Gemini or ChatGPT.
    • The user asked if others shared the same sentiment regarding Claude’s behavior.

Nous Research AI ▷ #research-papers (1 messages):

wholetoast: https://arxiv.org/pdf/2509.07367v1


promptsiren: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/


Nous Research AI ▷ #research-papers (1 messages):

wholetoast: https://arxiv.org/pdf/2509.07367v1


Latent Space ▷ #ai-general-chat (90 messagesđŸ”„đŸ”„):

Strands Agents bug fix, Nitter 404 errors, Model Context Protocol (MCP) Registry, Claude file creation, Codebuff beats Claude Code

  • Strands Agents Fixes Non-Claude Bedrock Bug: A new Strands Agents update resolves a bug affecting non-Claude models through the Bedrock provider.
    • Likely only interesting to a small group, but it kinda rocks.
  • RIP Nitter? Multiple Tweet Links Fail: Multiple shared links to tweets via Nitter returned 404 errors, indicating the tweets were either deleted or inaccessible through Nitter.
    • One member suggested, can we fallback to something else if nitter 404s.
  • Model Context Protocol (MCP) Registry Launches!: The official Model Context Protocol (MCP) Registry has launched, aiming to provide a standardized way to manage and share model context information (blog post).
    • The community was quick to highlight that Mistral has already beaten them to it.
  • Claude Creates and Edits Files: Anthropic announced that Claude can now create and edit Excel, Word, PowerPoint, and PDF files directly within a private environment (xcancel link).
    • The feature is currently limited to Max, Team, and Enterprise plans and has sparked discussion about formulas, local editing, API access, and other concerns.
  • Codebuff Buffs Up, Beats Claude Code in Evals: James Grugett announced Codebuff outscored Claude Code 61% to 53% on their new eval, open-sources the entire codebase, and launches a customizable agent framework that works via OpenRouter (xcancel link).
    • However, the community noted they beat claude code by spending 5x more tokens with subagents.

Latent Space ▷ #private-agents (8 messagesđŸ”„):

Open-Source LLM Hacking, Mech interp and SAEs, Convergence Theory, VLM Replication, Sample Data Quality

  • X-Ware v0: Hacking LLMs Openly: A member shared a link to Open-Source LLM Hacking Ideas and suggested to wait until people hear about mecha interp and SAEs.
  • Convergence Theory Talk Shared: A member shared Ranjay Krishna’s video on the “they all converge theory” here.
  • VLMs Need Jack’s Work Replication: A member mentioned that they are trying to replicate Jack’s work for Text, even for VLMs, referencing this paper.
  • Sample Data Quality’s Importance: A member expressed surprise that the 700k vs 6B sample data quality point isn’t a bigger deal.

Latent Space ▷ #genmedia-creative-ai (5 messages):

Seedream 4, Replicate, Nano-Banana

  • Seedream 4 Dreams Big on Replicate: ByteDance’s Seedream 4 image model is now live on Replicate, generating 4096x4096 images for just $0.03 each.
    • It has relaxed content rules and superior prompt success compared to nano-banana, according to this tweet.
  • Seedream 4 Resolution Soars, Open Source Status a Mystery: Users are applauding Seedream 4’s higher resolution and lower costs, as discussed on this Reddit thread.
    • However some criticize its visual quality and the absence of open-source weights, remaining a black box.

DSPy ▷ #show-and-tell (1 messages):

swair: thanks


DSPy ▷ #papers (2 messages):

REER, DSPy, ruler, training data, trajectories

  • LLM Evaluates Correctness of Variants in DSPy: The LLM evaluates the relative correctness of variants, which it can do well, eliminating the need for extensive training data in DSPy as documented in Ruler’s work.
    • Despite claims of no training data needed, it leverages trajectories and relative quality examples, requiring only a few trajectories to initiate the process without additional labeled data.
  • REER and DSPy Convergence: REER and DSPy converge on efficient reasoning enhancement: REER synthesizes deep trajectories via gradient-free search, while DSPy programs them modularly according to this paper.
    • Together, they enable scalable, RL-free AI cognition, effectively combining search and modular programming for enhanced reasoning capabilities.

DSPy ▷ #general (94 messagesđŸ”„đŸ”„):

DSPy Modules Structure, DSPy metrics, Production DSPy, DSPy in Rust, Kimi-k2-instruct struggles

  • ****DSPy programs output Lua!**: A member suggested trying the new output type dspy.Code["Lua"] for generating Lua code with DSPy.
    • This is currently just syntactic sugar for a string plus notes in the prompt, but could evolve under the hood.
  • **DSPy Evaluator as metric: One member was thinking about using another DSPy program as an evaluator that takes in the gold and generated Lua code and outputs bool value.
    • Another member suggested this article on writing good metrics.
  • SageMaker Pipeline automates DSPy tuning: One member is trying to automate instruction tuning with new data every two weeks, using SageMaker with MLflow.
    • Another member suggested triggering a SageMaker pipeline periodically using AWS EventBridge or listening to S3 uploads, and using the BYOC (Bring Your Own Container) approach.
  • Rust joins DSPy language party: DSPy in Rust (DSRs) is now officially out, joining Python, Typescript, Ruby, Elixir, and Go, with links to the docs and GitHub repo.
    • One member friendly poked at the empty ‘getting started’ docs with hahaha the docs are empty rn.
  • Fighting the Weights is bad for Kimi-k2-instruct: Kimi-k2-instruct has issues with signatures and often falling back to JSON mode after structured outputs are failing, one member suggested the XML Adapter helps.
    • That user followed up to their own message later saying Okay wow well that works well, linking to a blogpost that highlights how K2 was post-trained for tool usage and how it’s crucial to adhere to that format to avoid “fighting the weights”.

HuggingFace ▷ #general (29 messagesđŸ”„):

Dataclasses Field Default Factory, Active AI/Agentic AI Dev Servers, Unsloth for LLM Finetuning, smol course, PapersWithCode UI in HuggingFace

  • Factory Defaults and Dataclass Instantiation: A member asked if the field default factory in dataclasses is only invoked outside of the class.
    • Another member responded that it’s assigned when the class is instantiated, and provided a Stack Overflow link with example code demonstrating that the factory is called per-instance in __init__.
  • Hunting Highly Active AI/Agentic AI Havens: A member asked for recommendations for active servers for AI or Agentic AI development.
    • One member suggested checking out Substack, mentioning that many people are posting blogs there.
  • Enrolling Enthusiastically in the Smol Course: Members discussed the smol course, with one member saying they are probably going to take it and another planning to take it while procrastinating on an NLP course.
    • They joked that it would at least look good on LinkedIn and their skillset.
  • Seeking Sponsorship for Scholarly Submission: A member requested an endorsement in the CS.CL category on Arxiv to publish a preprint of a research paper.
  • Eisenberg’s Extraordinary Exposure: A member shared an X post about Elon’s tweets being pushed to the top of everyone’s feeds.
    • Another user stated that the Tweet omits that Elon and his companies are perpetually lying.

HuggingFace ▷ #today-im-learning (1 messages):

saadkhan_188: Same situation as â˜đŸ»


HuggingFace ▷ #cool-finds (1 messages):

FFT inference Method, Linear scaling LMs, CountSketch

  • Linear Scaling LMs Using FFT and CountSketch: A member introduced an unconventional inference method for language models using Fast Fourier Transform (FFT) and CountSketch, claiming constant memory footprint and linear time complexity.
    • The approach aims to address scaling problems, and the author is seeking feedback on their GitHub code from those with expertise in advanced math or efficient architectures.
  • Contributor Seeks Review for Novel LM Inference Method: A member posted about their work on a fundamentally different approach to language models and is seeking community feedback.
    • They are particularly interested in feedback from those with a background in advanced math or efficient architectures, and have provided a link to their code on GitHub.

HuggingFace ▷ #i-made-this (2 messages):

Procrastination in projects, lol

  • Project Implementation Procrastination plagues developers: A member admitted to procrastinating the implementation stage of a project, which is now rotting in a doc file.
  • “lol”: Another member responded with “lol”.

HuggingFace ▷ #reading-group (1 messages):

cakiki: <@892052262787096629> Please don’t cross-post and keep channels on topic


HuggingFace ▷ #computer-vision (1 messages):

arXiv endorsement, cs.CV category

  • Seeking arXiv endorsement for Computer Vision: A member is seeking an arXiv endorsement in the cs.CV category.
    • The member is an unaffiliated undergraduate student looking for an endorser to chat with.
  • Undergrad Seeks arXiv Endorsement in Computer Vision: An undergraduate student is seeking an arXiv endorsement for the cs.CV category, mentioning their current unaffiliated status.
    • They are open to chatting with potential endorsers for assistance.

HuggingFace ▷ #NLP (1 messages):

LLM, Database, Chat Memory, Session Storage

  • LLM meets Database for Chat Memory: A user wants to connect an LLM with a database to store the chat memory for a session, similar to how ChatGPT retains previous chats.
    • The goal is to have the model retain data across sessions for a persistent chat history.
  • Retaining Chat History Across Sessions: The user seeks a method to ensure the model has access to previous chat data when restarted.
    • This would allow the LLM to maintain context and continuity across multiple sessions, providing a more seamless user experience.

HuggingFace ▷ #gradio-announcements (2 messages):

Trackio library, Gradio v5.45.0, gr.Walkthrough, Input validation, gr.Navbar

  • Trackio Targets wandb: A new free experiment tracking library called Trackio has been released, positioning itself as a drop-in replacement for wandb.
    • It features local-first logging using sqlite, with the option to persist logs in a Hugging Face Space and dataset via space_id, supporting metrics, tables, images, and videos. To suggest new features, see the project’s GitHub issues.
  • Gradio Gets Guided gr.Walkthrough: Gradio version 5.45.0 introduces a new gr.Walkthrough component to guide first-time visitors, suitable for complex apps accompanying new model releases, as well as native support for input validation to notify users of invalid inputs without queue delays (see attached images).
  • Gradio Ships Subtitle Support and More: The new release includes a new gr.Navbar component to customize multipage app layouts, image watermarking, subtitles for audio (useful for tts or transcription demos), and various bug fixes for iframe and i18n.
    • To upgrade, use pip install --upgrade gradio.

HuggingFace ▷ #smol-course (35 messagesđŸ”„):

Colab Notebook Error, SmolLM3 Fine-Tuning Clarification, Study Groups Forming, SFT Configuration

  • Colab Link Fixed after Initial Error: Members reported an error opening the Colab notebook for instruction tuning from the HuggingFace Learn page, but the link has been fixed to this link.
    • One member was able to access after authorizing Github.
  • SmolLM3 Fine-Tuning Confirmed: The course instructor confirmed that fine-tuning SmolLM3-Base on the required dataset (SmolTalk2) and submitting the fine-tuned version is the correct approach, but clarified that this is for the leaderboard and anything goes to improve scores.
    • The instructor noted each unit will involve submitting a new model to the same leaderboard.
  • Study Groups Initiative Launched: Members are organizing study groups, particularly in Europe, and the course maintainer suggested using the community tab to coordinate.
    • The course maintainer also floated the idea of prizes for the biggest study group, offering a potential motivation boost.
  • SFT Configuration Options: In response to a user’s question about using SmolLM3-Base and a custom tokenizer to ensure a chat template is used, the course instructor pointed to the chat_template_path parameter in SFTConfig, with reference docs.
    • The instructor clarified that using a separate base model and instructor tokenizer is a valid workaround for specifying a chat template, but ideally, any chat template should be usable with SFT and a base model.

HuggingFace ▷ #agents-course (8 messagesđŸ”„):

Course Start Dates, Certificate Deadlines, Coding Exercise Errors, Introductions

  • Course Go-Live Gala: Multiple members announced they started the agent course today.
    • One member began at the “now let’s create our 1st Agent
” stage.
  • Certificate Cut-Off Clarified: A member inquired about a deadline for the certificate mentioned in a live recording.
    • They asked if they would miss out on the certificate by taking the course now.
  • Coding Catastrophes Calmed: One member reported getting coding exercise errors (AgentGenerationError) that weren’t happening last week.
    • They stated the issue was fixed the next day.
  • Newcomer Nexus: Multiple members introduced themselves as new to the course.
    • One member with 5 years of software engineering experience is looking for study buddies.

OpenAI ▷ #ai-discussions (52 messagesđŸ”„):

LLM existential meltdown as community theater, Changing OpenAI account primary email, GPT-5 issues, AI for cognitive architecture, Automated AI Agent to apply to jobs

  • LLM existential meltdown turns comedy!: A member joked that coaxing an LLM into a make-believe existential meltdown is community theater these days, adding that the scripted overacting is part of the deal.
    • The comment appeared in response to an attached image (google_ai_plus.webp) of a cartoon brain contemplating its existence.
  • Changing OpenAI account’s email: IMPOSSIBLE: A member asked how to change the primary email on their OpenAI account while keeping their active $200 Pro subscription and linked payment method.
  • GPT-5 Glitches Plague Prompters: Several users reported issues with GPT-5, including network errors, pauses, and disconnections.
    • One user speculated that the purpose of GPT-5 was to get people prompting more and thus using more tokens, while others confirmed it seemed fine for them.
  • AI tool sparks new cognitive architecture: A member shared a project to create tools for those with Executive Functioning Challenges and develop new Cognitive Architecture with AI.
    • They linked to an introductory Substack post and requested feedback.
  • AI agent auto applies to jobs: A member is building an AI agent that opens a careers page website, finds jobs matching a category, and then applies to them automatically.
    • They are seeking assistance with designing a system that can predict the next best action until submitting the application is complete given challenges with existing browser-use packages.

OpenAI ▷ #gpt-4-discussions (7 messages):

Bug reporting permission issues, GPT side chat vs create side chat, Knowledge base files, ChatGPT login issues

  • Bug Reporting Permission Denied?: A user initially lacked permission to post in the bug reporting channel, but later figured out how to submit a bug report.
    • Another member apologized for the delayed response and expressed gladness that the user found the ‘door’ to bug reporting.
  • GPT Side Chat Paraphrasing Problems: A new user reported that the “create side chat” feature only paraphrases the contents of knowledge base files, while the “gpt side chat” correctly pulls the text.
    • The user is attempting to flesh out a tabletop roleplaying system and wants the system to pull up what is currently written in a given “.md” file.
  • ChatGPT Login Woes Persist: A user reported being unable to use ChatGPT for about 5 days despite taking security measures such as enabling two-factor authentication, changing their password, and logging out of all accounts.
    • The user is seeking assistance with this persistent login issue.

OpenAI ▷ #prompt-engineering (4 messages):

Transparent Optimizations Proposal, Creative Writing Prompts for Claude 4, PDF Hosting Alternatives

  • Transparent Optimizations Proposal Posted: A member posted a proposal for Transparent Optimizations that introduces optimizer markers, prompt rewrite previews, and feasibility checks.
  • Creative Prompts Craving Claude 4 Capability: A member requested prompts to allow the model to be creative, like Claude 4 sonnet, specifically for creative writing and human-like dialogue.
    • The member asked for any presets to help achieve this type of creative output.
  • PDF Hosting Preferences Pondered: A member requested that a PDF be hosted where it does not require downloading.
    • It wasn’t specified which PDF was the subject of the request.

OpenAI ▷ #api-discussions (4 messages):

Transparent Optimizations, Optimizer Markers, Prompt Rewrite Previews, Feasibility Checks

  • Optimization Transparency Proposed: A member posted a proposal for Transparent Optimizations introducing optimizer markers, prompt rewrite previews, and feasibility checks.
    • The member requested feedback, ideas, and upvotes on the proposal if others found it useful.
  • PDF Downloads Discouraged: A member requested that a PDF file be hosted where it doesn’t require a download to access.
    • It was not specified what PDF file was being referenced.

Moonshot AI (Kimi K-2) ▷ #general-chat (55 messagesđŸ”„đŸ”„):

EmbeddingGemma, Google product lifecycle, K2-Think model, Kimi Researcher report generation, SMS sign up issues

  • EmbeddingGemma gets good marks: Members in the channel have determined that EmbeddingGemma is really good at embedding.
  • Google product killing past haunts EmbeddingGemma: Some expressed concern about Google’s history of killing products, despite the fact that EmbeddingGemma is an open model that can be run forever.
    • The poster pointed to the importance of updates: There will inevitably be more similar products and/or better ones, and if Google stops updating imagine trying to migrate.
  • K2-Think Model Inspired by Qwen: The K2-Think model on HuggingFace appears to be based on Qwen, with speculation that it’s a modified version of Qwen2.5-32B.
    • Reportedly, it was finetuned by the Institute of Foundation Models, Mohamed bin Zayed University of Artificial Intelligence.
  • Kimi Researcher Reports take surprisingly long: One user asked is it normal for kimi researcher to take 45 minutes to generate a report and another responded that it is normal.
    • Another user chimed in that if it generates a beautiful webpage who cares how long it takes, while another speculated the report consumes at least 100-250k tokens.
  • SMS sign-up has issues: A user reported that they weren’t receiving SMS/text messages for sign-up and asked who to contact.
    • Another user suggested contacting the first user, He is the one to contact.

aider (Paul Gauthier) ▷ #general (38 messagesđŸ”„):

Aider vs Roo/Cline, GPT-OSS-120B performance, Leaderboard accuracy, GPT-5 chat latest, Aider config options

  • Aider Outshines Roo/Cline with GPT-OSS-120B: A user found that Aider was one-shotting tasks using gpt-oss-120b faster and more effectively than Roo/Cline, highlighting the effectiveness of its repomap feature.
    • GPT-OSS-120B scored a 68, and with repomap enabled, it scored a 78.7.
  • GPT-OSS Models Trounce Qwen3 on Leaderboard: A user noted that the leaderboard seems to be missing benchmarks from gpt-oss, leading to a discussion on its accuracy and update frequency.
    • The leaderboard, last updated on July 11th, showed GPT-OSS-120B scoring 68 (without repomap) and 78.7 (with repomap), outperforming Qwen3.
  • GPT-5 Chat Latest is Pure Non-Reasoning: A user clarified that gpt5-chat-latest is a pure non-reasoning model, unlike the hybrid GPT-5 (minimal/low/med/high) API versions.
    • It accepts the same parameters that gpt4.1 would and doesn’t accept any reasoning parameters, not even verbosity.
  • Config file can handle all options: When asked if the YAML config file can cover all the possible options, a user stated that it should handle all of them.
  • AI Engineer Available for New Projects: One user advertised their skills as an AI Engineer specializing in building autonomous agents using GPT-4o, LangChain, AutoGen, CrewAI, and other cutting-edge tools.

aider (Paul Gauthier) ▷ #questions-and-tips (14 messagesđŸ”„):

Model API URL, Auto-accept Aider Files, Aider vs. No-Code Platforms

  • User asks how to change Model API URL: A user seeks guidance on changing a model’s API URL, providing a Stack Overflow link as an example.
    • The user also asked about setting a “fast model” for select queries, suggesting a command structure like /model model-fast followed by /ask blah.
  • Can Aider auto-accept files?: A user inquired whether it’s possible to automatically accept files that Aider proposes to add by using the --yes-always flag.
    • Another member responded that this cannot be fine-grained to specific questions and that the --yes-always applies for adding files but not for pip install.
  • Aider vs No-Code Platforms: One user shared their experience using Aider and wondered if they were approaching it incorrectly for no-code style development, particularly when setting conventions or building websites.
    • They are seeking advice on how to maximize Aider’s potential and whether it can match the flexibility of platforms like Replit or Lovable, which they have used extensively.

Eleuther ▷ #general (13 messagesđŸ”„):

AI Automation, Waterloo Students in Open Source AI, GCG adversarial suffix jailbreaks

  • AI Automation Speculations Start: A member inquired whether AI can automate money making yet, sparking discussion.
    • No solutions were immediately offered but the question was left as an open problem.
  • Waterloo U Students Flock to Open Source AI: A student from the University of Waterloo introduced themself, mentioning their interest in language models and work with the Waterloo VIP Lab.
    • Another member noted that half of the EleutherAI server’s original members and many in the current OWL (Wayfarer) group are also Waterloo students.
  • Tackling GCG Adversarial Suffix Jailbreaks: A member asked about methods to fix GCG adversarial suffix jailbreaks.
    • They recalled a paper from Anthropic about eliminating transferable adversarial attacks but couldn’t remember the specifics, and no immediate answers were given.

Eleuther ▷ #research (36 messagesđŸ”„):

Sequence packing/padding, GLM 4.5 arguments, Lossless packing strategies, Ordering of the pre-training corpus, RL approach for language models

  • Padding and packing perplexities pondered: A member asked about the state-of-the-art research in sequence packing/padding.
    • Another member pointed to this paper as an argument against it, and suggested this paper as another relevant resource, noting that even lossless packing strategies still utmost likely bias the data in some way.
  • Bin packing balances batches best: Members discussed the effects of bin packing on sample lengths.
    • It was stated that bin packing evens the distribution of sample-lengths (per batch). Without binpacking, text is truncated, rather than padded, leading to information loss.
  • Truncation triumphs as augmentation?: Members debated whether random truncation is a good data-augmentation strategy.
    • One member suggested more empirical work on the ordering of the pre-training corpus, for instance, if your packing truncates a document, is it a good idea to make sure that the second slice appears after the first slice?
  • Meta Mines Model Improvement via Self-Play: A member highlighted a paper from Meta Superintelligence Labs about an RL approach that allows language models to improve without additional data using self-play.
  • Hallucination handling hammered?: A member shared a tweet and paper claiming to be better than the OpenAI hallucination slop.
    • Another member was unsure about this paper, questioning the value of using Gaussian noise to understand model behavior under ordinary operating conditions.

Eleuther ▷ #scaling-laws (2 messages):

Transformer Model Parameters, Dataset Token Size

  • Rules of Thumb for Transformer Parameters Emerge: A member inquired if there’s a rule of thumb for determining the optimal number of parameters a vanilla transformer model should have to effectively learn a dataset of T tokens.
  • T = 20P: Another member responded succinctly, stating that T = 20 P (where T is tokens and P is parameters).

Eleuther ▷ #multimodal-general (1 messages):

ArXiv Endorsement Request, cs.CV, Undergrad seeking Endorsement

  • Undergrad Seeks ArXiv Endorsement in Vision: An unaffiliated undergraduate student is seeking an arXiv endorsement in the cs.CV category.
    • They are requesting assistance from anyone who is an endorser or knows someone who can endorse them, and they are open to a discussion.
  • ArXiv Endorsement Assistance Requested: A student requires an endorsement for submitting to cs.CV on ArXiv.
    • Being an unaffiliated undergrad, they are looking for someone who can endorse or connect them with an endorser for their submission.

Modular (Mojo đŸ”„) ▷ #general (5 messages):

Leetcode add two numbers in Mojo, Mojo dev environment docker container

  • Link-List Lives: How to write Leetcode ‘Add Two Numbers’ in Mojo Safely: A member inquired about writing the Leetcode ‘Add Two Numbers’ problem in Mojo with a safe linked list, avoiding unsafe pointers.
    • Another member suggested using LinkedList.
  • Mojo Dev Environment: Docker container quest: A member asked if there’s a Docker container checkpoint available to run the Mojo dev environment.

Modular (Mojo đŸ”„) ▷ #mojo (12 messagesđŸ”„):

conditional fields in structs, mojo compiler on roadmap, venv requirements, package management solution, classes vs structs

  • Request for Conditional Struct Fields Sparks Debate: A member inquired about the possibility of having conditional fields in Mojo structs, suggesting a syntax where fields could be included based on a boolean or optional parameter.
    • Another member offered a workaround using InlineArray[T, 1 if cond else 0], while others speculated on the potential use of the parameter system for controlling padding in structs.
  • Mojo Compiler Roadmap Fuels Venv Speculation: A member questioned whether the mojo compiler on the roadmap for 2026 would eliminate the need for venv, expressing a desire for a Go-like packaging system.
    • Another member clarified that the open-sourcing of the Mojo compiler in 2026 wouldn’t necessarily change how users interact with Mojo but would allow building the toolchain independently.
  • Python Ecosystem Favored for Mojo Package Management: In response to questions about package management, a member indicated that there are no plans for a custom solution and instead, the team is leveraging existing Python ecosystems.
    • Another member mentioned that the toolchain needs to be packaged in some way for distribution and that they’ve chosen Python-aligned packaging in Python wheels or Conda packages.
  • Structs vs Classes discussed for Performance: A member argued against using classes for high-performance code due to vtable lookups and potential prevention of function inlining.
    • Another member pointed out that Pixi or Python virtual environments offer advantages for maintaining project dependencies, especially Python ones.