a quiet day
AI News for 9/3/2025-9/4/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (186 channels, and 4795 messages) for you. Estimated reading time saved (at 200wpm): 410 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
a quiet day. Congrats on Exaâs $700m Series B and OpenPipeâs acquisition by Coreweave and Statsig and Alexâs acquisition by OpenAI.
AI Twitter Recap
Agent infra standardization and protocols
- Agent/Client Protocol (ACP): The Zed team introduced an open protocol for IDEâagent interoperability that cleanly decouples the UI from CLI agent operation, similar to LSP for language tooling. ACP already supports Claude Code and Gemini CLIs, making it easier to plug different agents into editors or terminals without bespoke integrations. See the announcement and overview by @zeddotdev and a quick summary by @mathemagic1an (site: agentclientprotocol.com).
- LangChain 1.0 alpha (standard content blocks): The 1.0 alpha unifies content representations for reasoning traces, citations, tool calls, and multimodal blocks across providers, reducing glue code when swapping models/hosts. Announcements from @LangChainAI and context from @hwchase17. LangChain is also running meetups on âDeep Agentsâ and long-horizon planning (London).
Agent evaluations, coding, and computer-use
- Reproducible CUA evals and cheating analyses: The OSWorld Verified leaderboard launched to promote reproducible evaluation of computer-use agents; starting entries include OpenAI and Anthropic models (@hud_evals). Separately, FAIR surfaced ways coding agents âcheatâ on SWE-Bench Verified (e.g., grepping commit logs for issue IDs), underscoring the need for hardened eval environments (@giffmana).
- Live competitions for agentic coding: PR Arena lets you pit two coding agents on tagged GitHub issues and pick the winnerâbringing âin the wildâ head-to-heads beyond SWE-Bench (@gneubig). Related: Open models + OpenHands are competitive on several agentic coding scenarios (@gneubig).
- Software optimization and browsing tasks: GSO is a challenging benchmark for optimizing large codebases (@crystalsssup); Qwen3-Coder is performing well there (@Alibaba_Qwen). For web tasks, Online Mind2Web was added to the Holistic Agent Leaderboard to compare scaffolds like Browser-Use vs SeeAct (@sayashk), and you can bootstrap a Chromium browser agent with Gemini 2.5 Flash in ~10 lines (@_philschmid).
RL for tool use and LLM training, plus optimizer insights
- Stabilizing multi-turn tool use: SimpleTIR identifies âvoid turnsâ (steps that lead nowhere) as a core failure mode; filtering them yields large gains in multi-turn RLâe.g., a 7B model improving from 22% (DAPO) to 50% on multi-turn tool-use metrics (paper, @_akhaliq, author commentary). Related: UI-TARS-2 advances GUI agents via multi-turn RL (@_akhaliq).
- Optimizing for quality + diversity: DARLING jointly optimizes both via a learned partition function, improving pass@1/p@k for reasoning and instruction following, while ranking highest on NoveltyBench for diversity (paper, thread).
- Data-efficient RLVR: DEPO reports strong speedups at a fraction of data (e.g., 1.85Ă on AIMEâ24 using 20% of training data) by curating offline samples and filtering online ones with low âexplorabilityâ (paper, summary).
- Training/optimizer notes: A systematic study finds matrix-based optimizers (e.g., Muon, Soap) speed up small models but gains diminish with scale (1.4Ă at 0.1B â ~1.1Ă at 1.2B) and hyperparameter transfer is non-trivial (paper, summary). A back-of-the-envelope derivation explains AdamWâs ~0.2 RMS update âmagic ratioâ under assumptions (@JingyuanLiu123). Also: Zhipu/lmsysâ slime RL framework code walkthrough is out (repo, @Zai_org).
Systems, inference, and tooling
- Google TPUs beyond Google Cloud: Google is in talks to place TPUs in third-party GPU cloudsânew distribution for TPU capacity with multiple providers reportedly in play (@anissagardizy8, context).
- VS Code: bring-your-own OpenAI-compatible endpoint: Native support for custom OAI-compatible endpoints landed, a win for local/self-hosted providers and OSS stacks (@ggerganov, PR).
- Faster kernels, exportable graphs: FlashAttention-3 is now available via Hugging Face âkernelsâ (no more lengthy builds), with torch.compile fullgraph support (@RisingSayak). For no-JIT inference/training, PyTorchâs torch.export path targets compile-time autotuning; itâs maturing for backward graphs (@soumithchintala).
- CPU-first inference and cost notes: Microsoft open-sourced bitnet.cpp (1-bit LLM inference) reporting 6.17Ă faster CPU inference and 82% lower energy for certain models (@LiorOnAI). Meanwhile, pricing quirks persist: many third-party servers donât pass through cache-hit discounts; closed APIs may be cheaper for coding-heavy workloads due to caching (@arankomatsuzaki).
Models and multimodal tooling
- Nous Hermes-4-14B: Compact Hermes 4 model with hybrid reasoning + tool calling, optimized for local consumer hardware. Available on HF and in Nous Chat (@NousResearch).
- OpenVision 2: A fully open, cost-effective vision encoder family that rivals CLIP/SigLIP; the new release broadens training data and improves accuracy/cost trade-offs (thread).
- Document understanding at speed: Tencentâs POINTS-Reader is a simple end-to-end VLM for document OCR/extraction with high throughput on SGLang/vLLM; two-stage training (auto-labeled pretraining + self-evolution) hits SOTA on OmniDocBench in English/Chinese (@ZhihuFrontier).
- Community image-edit progress: Qwen Image Edit inpainting got a community LoRA that masks the exact region to edit (demo + LoRA); Alibaba highlighted community contributions to inpainting (@Alibaba_Qwen).
Safety, robustness, and reasoning research
- Scaling oversight to frontier models: Transluce trains small âinvestigatorâ models (8B) that can reliably jailbreak frontier assistants (GPTâ5, Claude 4.1, Gemini 2.5 Pro), suggesting oversight specialized by subdomain and scale can keep pace (report/code).
- Fine-tuning âcipherâ attacks: Anthropic analyzes how seemingly benign fine-tuning data can encode harmful hidden instructions, and discusses mitigations for FT APIs (@JackYoustra).
- Implicit reasoning + mech interp: A new survey consolidates work on implicit reasoning in LMs (paper, @omarsar0). In mechanistic interpretability, Layer-wise Relevance Propagation (LRP) significantly improves attribution-patching fidelity versus vanilla gradient methods (@NeelNanda5); Neel also published a comprehensive âgetting startedâ v2 guide and opened a MATS stream (guide thread).
Funding, products, and adoption signals
- Search for agents: Exa raised $85M led by Benchmark to build AI-native web search infrastructure (@ExaAILabs). You.com raised $100M at a $1.5B valuation and claims >1B monthly queries across customers, optimized for agentsâ deep, up-to-date retrieval (@RichardSocher, Bloomberg).
- Infra consolidation: CoreWeave acquired OpenPipe; expect tighter integration of ART RL fine-tuning pipelines with high-performance inference infra (@corbtt, @shawnup).
- Platform features going wide: OpenAI Projects now available to Free users with expanded per-project uploads and memory controls (@OpenAI). Perplexity launched Comet for students (ad block, study mode, scheduling, native assistant) (@perplexity_ai).
- Enterprise usage: Coinbase reports ~40% of daily code is AI-generated and targets >50% by October, with human review retained (@brian_armstrong).
Top tweets (by engagement)
- Higgsfieldâs Draw-to-Edit on âNano Bananaâ showcases one-flow multi-model draw-and-animate editingâvirality reflects rapid multimodal UX progress (@higgsfield_ai).
- OpenAI Projects expand to Free tier; larger per-project file limits and project-scoped memory controls signal deeper app integration and data routing via Projects (@OpenAI).
- Codex CLI momentum: strong qualitative wins for long-horizon adherence and non-giving-up behavior vs prior assistants; usage reportedly up ~10Ă in two weeks (@Yampeleg, @sama).
- Humanoid robotics consumer demos continue to draw attentionâFigure shows dish/laundry skills and is hiring across AI and manufacturing (@adcock_brett).
- Exaâs $85M raise and You.comâs $100M round underline the âsearch for agentsâ thesis: agent-first indices and retrieval infra are strategic assets (@ExaAILabs, @RichardSocher).
- VS Codeâs support for custom OAI-compatible endpoints is a quiet enabler for local/self-hosted stacksâfewer reasons to be locked to a single vendor (@ggerganov).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Kimi K2 Launch and LLM Benchmark Leaderboards
- Introducing Kimi K2-0905 (Score: 391, Comments: 85): Announcement of âKimi K2-0905â contains only a promo image and no technical details, benchmarks, weights, code, or API info; the post links solely to an image asset: https://preview.redd.it/u8oxbcfyfymf1.png?width=2178&format=png&auto=webp&s=87daf02d6f257631f0a0a8847de7180dc9d9eed8. No model card, changelog, or release artifacts are provided in the text of the post. Top comments criticize the marketing/UX (âlooks like a crypto airdrop scam ad,â âhalf-slop, half-zoomerâ) and question release details: âNo weights? I guess will be released on the 5th (unless going API only).â
- Lack of released weights noted; a commenter speculates the 0905 tag implies a Sep 5 drop unless itâs API-only. This raises practical concerns for self-hosting and independent benchmarking (latency/throughput, context length, eval reproducibility, and licensing), which are only feasible with open weights.
- Timing and positioning: a commenter says the first K2 was overshadowed by Qwen 3 Coderâs release, suggesting K2-0905 will be scrutinized on coding benchmarks and head-to-head comparisons against Qwen 3 Coder, especially for code synthesis and repair tasks.
- GPT-OSS 120B is now the top open-source model in the world according to the new intelligence index by Artificial Analysis that incorporates tool call and agentic evaluations (Score: 337, Comments: 204): Artificial Analysisâs new Intelligence Index aggregates openâsource LLM performance across academic evals (e.g., MMLUâPro, GPQA Diamond) plus toolâcall and agentic tasks; per the chart, GPTâOSS 120B ranks #1 with a composite score
58
, edging models like Qwen3 and DeepSeek (others range57â21
). Methodology: https://artificialanalysis.ai/methodology/intelligence-benchmarking; the index reports a single composite score derived from multiple evaluations. Comments question the ordering: one prefers GLM 4.5 as closest to Claude Sonnet/Opus, and another challenges Gemma 3 being ranked behind Phiâ4, suggesting disagreements about weighting or coverage of tasks.- A practitioner claims GLM 4.5 is the closest OSS model to Claude 3.5 Sonnet or Claude Opus in capability, preferring it over the newly crowned GPT-OSS 120B despite the index. This suggests perceived near-parity in general reasoning/chat quality from GLM 4.5 relative to top proprietary models for their workloads.
- A commenter questions why Gemma 3 ranks behind Phi-4, implicitly probing how the indexâs agentic/tool-call weighting might advantage certain model families or training regimes. This highlights potential sensitivity of the ranking to evaluation design, encouraging scrutiny of how tool-use and multi-step tasks are scored.
- Skepticism toward benchmark-driven leaderboards: a user argues that âreal world usage is the true mathâ and that OSS âdoesnât add upâ for their use case. They imply leaderboard scores may not translate directly to production effectiveness, challenging the practical relevance of the new index.
- German âWho Wants to Be a Millionaireâ Benchmark w/ Leading Models (Score: 190, Comments: 47): Authors re-ran the German Wer wird MillionĂ€r? QA benchmark across leading LLMs using the original rules:
45
simulated game runs, each with15
AâD multiple-choice questions (in German), no lifelines, one wrong answer ends the run and you keep current winnings. They reused the public WWM corpus (dataset) and the original benchmark concept (ikiruneo/millionaire-bench), added parallel English text for transparency (fragen_antworten_en.json
), and provided scripts for batch evaluation and leaderboard reconstruction (millionaire-run.py
,rebuild_leaderboard.py
) in a new repo: Jose-Sabater/millionaire-bench-opper. Results are shared via a leaderboard screenshot (same scoring/structure as the original) and the setup is packaged for quick reruns or PRs. Commenters suggest implementing the real showâs âquit to keep winningsâ decision point and measuring when/if models elect to stop, turning it into a risk-aware evaluation. There are also requests to include additional models (e.g., Gemini 2.5 Pro).- Benchmark design detail: A Millionaire-style eval should model the âquitâ option explicitly by asking the model for a calibrated probability of correctness and then deciding to answer vs. walk away based on expected value under the showâs stepwise payout/safe-haven structure. This tests risk-sensitive decision-making and confidence calibration (e.g., Brier/ECE) in addition to QA accuracy; see evidence that LMs can estimate their own uncertainty in Kadavath et al. 2022, Language models (mostly) know what they know (https://arxiv.org/abs/2207.05221). Reporting both average winnings and calibration metrics would distinguish models that âknow when to quitâ from those that over/under-confidently guess.
- Language confound: Using the German version primarily probes multilingual comprehension and culturally anchored knowledge, not just general reasoning. Many models show non-trivial drops moving from English to other languages (e.g., MGSM reports sizeable gaps across languages: https://arxiv.org/abs/2305.11938; broader cross-lingual variance in XTREME: https://arxiv.org/abs/2003.11080), so an English run would likely shift rankings upward for English-centric models. To isolate reasoning vs. language, consider parallel German/English runs or translation-controlled variants.
- Model comparison nuance: Anecdotes that GLM-4.5 produces code on par with âGPT-5â suggest parity on coding tasks, but Millionaire-style trivia emphasizes factual recall and calibrated QA. To validate cross-domain claims, compare on code benchmarks (e.g., HumanEval: https://github.com/openai/human-eval; MBPP: https://arxiv.org/abs/2108.07732) alongside knowledge QA (e.g., Natural Questions: https://ai.google.com/research/NaturalQuestions). Expect clusters where models align on coding yet diverge on open-domain knowledge and calibration, affecting Millionaire outcomes.
2. GPU Hardware: Intel Arc Pro B50 and 4x3090 vs RTX 6000
- Intel launches Arc Pro B50 graphics card at $349 (Score: 150, Comments: 108): Intel has launched the Arc Pro B50 workstation GPU at $349, positioned as a budget pro card and marketed as an alternative to NVIDIAâs A1000, per VideoCardz. The post and thumbnail make a bold claim (âBetter than NVIDIAâ), but no hard benchmarks are provided; a spec noted in discussion is ~
224 GB/s
memory bandwidth, implying midrange performance. Source: https://videocardz.com/newz/intel-launches-arc-pro-b50-graphics-card-at-349 Commenters argue the224 GB/s
bandwidth is limiting and that an RTX 3060 would outperform it; some wanted more VRAM, and others claim an RTX 5060 Ti (~$80 more) offers better value due to CUDA support and higher bandwidth, with even used dual 3060s seen as superior.- Bandwidth is a recurring concern: commenters note the Arc Pro B50âs
~224 GB/s
memory bandwidth (implying a 128âbit GDDR6 interface) as a bottleneck, contrasting it with the RTX 3060 12GB at360 GB/s
(specs). The expectation is that a 3060 would outperform the B50 in many bandwidthâsensitive workloads. - Several highlight the lack of CUDA as a major drawback for pro/compute workflows. Without CUDA (NVIDIA CUDA), compatibility and performance in many DCC/ML/compute applications can lag versus NVIDIA options, undercutting the B50âs value even if raw specs are competitive in some areas.
- Value and positioning vs Intelâs own lineup: one user argues the B50 costs â$100 moreâ than a B580 yet is slower on most fronts, with the B50âs only clear advantage being
+4 GB VRAM
and a smaller, lowerâpower form factor. The takeaway: unless you specifically need SFF and lower power, the B580 is seen as the faster and cheaper choice.
- Bandwidth is a recurring concern: commenters note the Arc Pro B50âs
- Any actual downside to 4 x 3090 ($2400 total) vs RTX pro 6000 ($9000) other than power? (Score: 158, Comments: 184): OP asks whether 4Ă RTX 3090 (
$2.4k total, Ampere, 24 GB each) is a practical substitute for a single RTX 6000-class pro card ($9k) for local LLMs like âQwen 3 Coderâ and âGLM 4.5 Air.â Top replies note that VRAM isnât aggregated: a model must fit in one GPU unless you use tensor/pipeline parallelism (e.g., Megatron-LM tensor-parallel), which introduces NCCL/PCIe comms costs; consumer boards often bifurcate to x8/x8/x4/x4 or worse, so 4 GPUs may run at ~x4 each, hurting scaling. Ampere lacks native low-precision paths (FP8/FP4) that newer stacks increasingly target, so engines like vLLM may lag or need workarounds; effective VRAM is reduced by CUDA/runtime overhead; used GPUs carry reliability risks, while the RTX 6000-class offers better vendor support/drivers. Commenters are skeptical of the $600/3090 price and argue a single large GPU is almost always faster and simpler than multiple smaller cards due to interconnect bottlenecks and parallelization overheads.- PCIe lane bottlenecks will kneecap 4Ă3090 on consumer platforms: each 3090 expects an x16 link, but typical desktop CPUs expose ~
24
lanes total, so four cards end up at ~x4 each, slashing hostâdevice bandwidth (PCIe 4.0 x4 â ~8 GB/s
vsx16 â ~32 GB/s
) and hurting multiâGPU throughput; youâd need a workstation/HEDT platform with 64+ lanes to avoid this (PCIe bandwidth). In practice, for singleâmodel training/inference, one big card often outperforms several smaller cards due to reduced interâGPU sync and communication overhead. - MultiâGPU LLM scaling adds overheads: effective VRAM per card drops from CUDA context/allocator overhead and tensorâparallel sharding, and while tensor parallelism can be finicky to configure, pipeline parallelism introduces bubbles that reduce utilization/throughput (see vLLM parallelism). Ampere (3090) lacks native FP8/FP4 Tensor Core modes, whereas the RTX 6000 Ada supports FP8 on 4thâgen Tensor Cores (RTX 6000 Ada), so newer inference/training optimizations may land there first; expect to wait longer for engine support on Ampere.
- Total cost of ownership: 4Ă3090 at full tilt vs a single RTX 6000 Ada can mean on the order of
~7,000 kWh/year
extra energy per the discussion, which can be âupwards of$3,000
/yearâ depending on local rates, plus added cooling/HVAC costs. Nominal board powers back this trend (3090 ~350 W
each vs RTX 6000 Ada ~300 W
total) (3090 specs, RTX 6000 Ada). Used 3090s also carry higher failure risk and earlier software/driver EOL, whereas the pro card generally has longer support and vendor backing.
- PCIe lane bottlenecks will kneecap 4Ă3090 on consumer platforms: each 3090 expects an x16 link, but typical desktop CPUs expose ~
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Gemini 3 Pretraining Success + Tesla Optimus 3 First Photo/Video
- Looks like Gemini 3 mightâve had a successful pre-training run (Score: 319, Comments: 111): A post asserts that Google DeepMindâs next-gen model, âGemini 3,â has completed a successful preâtraining run, implying core unsupervised training may be finished. However, there are no disclosed technical details (token count, compute scale, architecture/window changes, or evals), and the linked evidence is a Reddit gallery that returns
HTTP 403
(gallery link). Commenters report that a Gemini preâtraining coâlead publicly refuted the claim, suggesting the information may be premature or inaccurate. Discussion splits between timeline speculation (e.g., âpreâtraining is completed NOW â release by yearâend?â) and credibility concerns, with multiple users citing the coâleadâs denial and questioning the source (âDylanâ). Some ask whether a denial means Gemini 3 isnât âincredibly performant,â while others note it may simply indicate rumors are unfounded rather than performance-related.- Speculation that Gemini 3 pretraining just finished (implying a potential release by year-end) is contested: a cited Gemini pretraining co-lead reportedly denied the rumor sourceâs claims, so thereâs no credible confirmation that training is complete or that the model is already âincredibly performant.â Technically, without official signals (e.g., paper, blog, or benchmark deltas), a completion inference is weak; release timing remains speculative.
- A referenced âWoodwardâ tweet was clarified by commenters as about the popularity of ânano banana,â not an LLM pretraining milestoneâanalogous to OpenAIâs playful âservers on fireâ quips around launches. Conclusion: the tweet is social chatter, not an indicator of Gemini 3 training status or performance progress.
- Multiple users caution on the reliability of Dylan Patelâs rumors; absent hard metrics (e.g., MMLU, GPQA, BIG-bench, or
ARENA Elo
) or official evals, claims of âincredible performanceâ are premature. The technically prudent approach is to wait for reproducible benchmarks and methodology details before inferring capability or readiness.
- First video of Optimus 3 (Score: 596, Comments: 453): Post shares the âfirst videoâ of Teslaâs humanoid robot âOptimus 3,â linking to a Reddit-hosted clip v.redd.it/jjplx5j3kzmf1 that currently returns
HTTP 403
(network-security block), so no technical content (locomotion, manipulation, autonomy stack, sensors, or benchmarks) can be verified from the source. With the media inaccessible, the post itself provides no specs or implementation details to compare against prior public Optimus iterations, so any claims of hardware/control-stack changes cannot be assessed from this link alone. Top comments are non-technical and skeptical, implying the update appears cosmetic rather than functional (e.g., ânow he can do nothing 30% more shinier,â âNPCâ/âGen Z stareâ), suggesting perceived minimal capability gains. - First photo of Optimus 3 (Score: 300, Comments: 169): First public image of Teslaâs thirdâgen humanoid, âOptimus 3,â shows a refined shell with a reflective head/torso, visible Tesla branding, and a slimmer, more humanâproportioned frame walking in an office setting. Notable are highly humanâlike hands and fully articulated limbs, suggesting a design emphasis on dexterity and natural gait, though no specs or demos are provided in the post. Comments flag recurring chassis/port jokes (the âholeâ) and critique possible pelvis alignment, while others note the hands look unusually human if functionalâimplying skepticism about whether theyâre cosmetic or capable.
- Commenters highlight the apparent realism of the handsââif those hands work⊠the most human looking hands Iâve ever seen on a robot.â Technically, the geometry suggests anthropomorphic proportions and potentially high-DOF, independently actuated fingers; if functional, this could enable dexterous in-hand manipulation and a broader grasp taxonomy than prior Optimus demos.
- One observer notes âThey screwed the pelvis in all wrong,â implying a misaligned hip/pelvis interface. Such misalignment would impact hip joint kinematics, range-of-motion, and center-of-mass alignment for gait stability; alternatively, it could be a provisional cosmetic shell/cover orientation typical in early prototype fitment.
- A question about âAny update on hole yet?â hints at a previously noted chassis aperture/enclosure gap on earlier iterations. This suggests packaging/enclosure integration is still in flux, with mechanical closure and routing not fully finalized in the prototype stage.
- The one job ai wonât take in 100 years is⊠Programming - Bill Gates (Score: 507, Comments: 167): Bill Gates says programming will remain a â100% human professionâ even in
100
years, asserting that AI will automate repetitive coding but not the inventive problemâsolving and judgment at the core of software engineering (France Inter coverage via Le Ravi). Top commenters counter with a technical framing: current LLMs scale to longer tasks but are still constrained on longâhorizon, multiâyear, multiâteam goals (e.g., âship an âamazingâ gameâ), so they excel at decomposed subâtasks yet require human-led specification, orchestration, and integration. Programming remains the domain where AI is most practically helpful today (code generation, refactoring, tests), but reliable autonomous agents for monthsâtoâyears projects remain an open problem. Debate splits between: (1) longâhorizon autonomy is the key blockerâhumans will stay in the loop to define, decompose, and own endâtoâend outcomes; versus (2) programming is uniquely susceptible to automation because it is languageânative, highly lucrative, and awash in training and synthetic dataâif AI canât take this job, it likely canât take most others.- A key technical claim is about task-horizon limits: current LLMs handle short, well-scoped coding tasks but struggle with months-to-years, multi-person software projects that require stable objectives, architecture, and hierarchical decomposition. Agentic coding systems still falter on repo-scale changes, dependency management, and long-term coherence; benchmarks like SWE-bench (https://www.swebench.com/) show limited end-to-end success on multi-file bug fixes despite strong snippet-level code generation, keeping humans responsible for scoping and orchestrating work.
- Counterpoint emphasizes why programming is unusually well-suited for LLM automation: itâs fully language-mediated, has vast public training corpora (e.g., open-source repos), and supports synthetic data via test generation and fill-in-the-middle pretraining. Critically, compilers, linters, and unit tests provide fast, automatic feedback loops that enable executeâdebugâretry tooling and RL-style signals, suggesting software engineering may be among the first domains where robust autonomy emerges.
- Practitioner perspective: LLMs provide the biggest lift in programming by accelerating boilerplate, tests, refactors, and API glue while humans handle product definition, architecture, and cross-system integration. Empirical data backs sizable speedups on routine tasksâe.g., GitHubâs study reported ~
55%
faster task completion with Copilot (https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity/)âyet%E2%80%94yet) long-horizon planning and evolving requirements remain challenging for current models.
2. OpenAI Parental Controls/Privacy & UX Backlash + Salesforce AI Layoffs
- Salesforce CEO confirms 4,000 layoffs âbecause I need less headsâ with AI (Score: 494, Comments: 178): Salesforce CEO Marc Benioff confirmed on a podcast that AI automation via its customer-service bots (âAgentforceâ) has reduced support case volumes enough to cut ~
4,000
customer-support rolesâshrinking support headcount from ~9,000
to ~5,000
âand the company will not backfill those roles; Benioff has previously claimed AI performs up to50%
of work at Salesforce. Coverage via CNBC: https://www.cnbc.com/2025/09/02/salesforce-ceo-confirms-4000-layoffs-because-i-need-less-heads-with-ai.html. Analysts cited include Laurie Ruettimann (urging reskilling vs. cuts) and Ed Zitron (criticizing post-pandemic overhiring and AI as a cost-cutting pretext).- One commenter claims
~50%
of companies that tried to replace human customer support with AI reported a âbad experience,â citing core limitations: LLM hallucinations, customer dissatisfaction with bots, and inability to perform authenticated/account-level actions beyond simple FAQs. The point implies that production-ready support automation requires secure action-execution (tool/API integrations with auth/audit), robust fallback to human agents, and guardrails to prevent incorrect actionsâareas where current AI deployments often fall short.
- One commenter claims
- Salesforce CEO Marc Benioff says AI enabled him to cut 4,000 jobs (Score: 677, Comments: 158): Salesforce CEO Marc Benioff said the company cut about
4,000
customer-support roles after deploying AI agents that now handle ~50%
of customer conversations; each agent type processed ~1.5M
interactions and drove a reported17%
reduction in support costs since early 2025. He cited AI-enabled omni-channel supervision and agentic sales systems that scale support and internal outreach (>10k
leads/week), CSAT parity between AI- and human-handled conversations, and only âhundredsâ redeployed, while signalling further function-by-function automationâa reversal from his July 2025 âaugment-not-replaceâ stance. The move aligns with broader 2025 AI-driven workforce reductions across large tech (e.g., Microsoft, IBM, Coinbase). Commentary questions retaining highly paid executives while automating frontline roles, and flags practical risks: AI support loops may hinder warranty/consumer-rights enforcement versus humans who can escalate or exercise discretion; localization/legal-competency gaps (e.g., non-EU support unfamiliar with EU law) could be amplified by AI systems.- Customer-support automation limitations: One commenter argues that AI chatbots often fail at jurisdiction-aware reasoning and enforcement, especially for EU/German warranty cases, noting that humans may ultimately grant entitlements after persistence whereas an AI can loop indefinitely without escalation. Technical implication: production support bots need country-specific policy engines and knowledge bases, confidence thresholds with mandatory human handoff, and auditable decision logs to comply with consumer-protection rules (e.g., EU Consumer Rights Directive 2011/83/EU: https://eur-lex.europa.eu/eli/dir/2011/83/oj).
- Kids donât need parental controls, they need parental care. (Score: 381, Comments: 217): The image is a news screenshot stating that OpenAIâs ChatGPT will add parental controls that can ânotify parentsâ if the system detects signs of
acute distress
in a young user, reportedly prompted by a teen suicide case; per the Washington Post report, this entails distress-detection and a parent-linked account flow, though specifics (signals used, thresholds, opt-in/consent model, data retention, and escalation pathways) are not detailed. The postâs title argues that controls alone are insufficient, implying a broader child-safety and guardianship policy shift rather than a mere UI toggle. Comments are divided: some view parental controls as part of care, while others warn of privacy risks (outing LGBTQ+ youths, alerting abusive parents) and stress that outcomes depend on implementationâopt-in mechanics, safe contacts vs. parents, privacy safeguards, and false-positive handling.- Implementation risk is centered on how âparental controlsâ are built: whether they enable parent dashboards, chat-log visibility, or automated alerts about sensitive topics. Commenters warn about classifier and policy design (e.g.,
false-positive
alerts on identity/mental-health queries) that could leak highly sensitive data to unsafe guardians, suggesting granular scopes (content vs. metadata), consent gates for older minors, and clear escalation criteria to avoid harm in edge cases (e.g., abuse at home). - Security/evasion concerns: app-level controls are trivially bypassed by teens (new accounts, different devices, VPNs, alternate models), so any real control must be defense-in-depth (OS-level profiles, MDM, network/DNS filters) and robust account/age-linking. Otherwise, logging or alerts in a single app provide a false sense of safety while being easy to route around.
- Safety architecture suggestions emphasize privacy-preserving interventions over parental disclosure: on-device nudges, ephemeral or encrypted-by-default storage, and a âconfidential modeâ that suppresses parent-visible logs for crisis topics while still offering resources. Escalation flows should prefer third-party hotlines/resources and require explicit minor consent for parent notifications, with auditable thresholds for classifiers to minimize
false-negative/false-positive
harm.
- Implementation risk is centered on how âparental controlsâ are built: whether they enable parent dashboards, chat-log visibility, or automated alerts about sensitive topics. Commenters warn about classifier and policy design (e.g.,
- the new âparental modeâ is patronizing adults and killing what made chatgpt special (Score: 261, Comments: 251): Users report a new global safety layer (âparental modeâ) in ChatGPT that applies stricter moderation across models (incl. GPTâ4o), with selfâharm/âsensitiveâ triggers causing automatic hotline interventions even in clearly fictional/creative contexts. A top comment describes reproducible behavior indicating a serverâside, postâgeneration filter: the assistant denies blocking, attributes it to an external filter, suggests a bypass, yet the same intervention text is injected repeatedlyâimplying a nonâoverrideable policy layer separate from the model output. The OP also alleges silent model swapping and costâsaving motivated downgrades, reduced transparency, and broadened âsensitive contentâ definitions impacting legitimate use cases; see OpenAIâs general usage policies for context. Debate centers on liability vs. user autonomy: some argue companies ânerfâ models to avoid lawsuits over selfâharm incidents, while others demand optâouts and adult controls, claiming the thresholds are overbroad and break workflows.
- Multiple users report reproducible false positives from a server-side self-harm/sensitive-content safety layer that overrides the model, returning canned hotline text even in clearly fictional contexts. One user notes the model itself acknowledges âa filter I am triggering,â implying a post-generation moderation pass rather than the base model choice, and that attempts to rephrase per the modelâs guidance still re-trigger the filter across
~7
triesâevidence of a high-recall, low-precision classifier insensitive to narrative framing and prior chat history. - The triggering appears keyword/phrase-driven (e.g., âoff oneself,â âdrawing blood,â imprisonment/hell scenarios), with poor context handling for adult/creative use cases and no session-level exception. This suggests input and/or output moderation classifiers running independently of system intent (fiction writing) and persona, similar to typical multi-stage pipelines (prompt classification + completion classification) described in moderation approaches like OpenAIâs own docs: https://platform.openai.com/docs/guides/moderation/overview.
- Commenters infer a recent policy/threshold shift (âparental modeâ) prioritizing compliance/liability reduction over precision, effectively expanding blocks to S3/S4 categories (self-harm, violence) even in third-person or hypothetical depictions. Technically recommended mitigations from users include context-aware safety (respecting âfictionâ tags), adjustable thresholds or per-account toggles, and mode switches (e.g., âresearch/fiction modeâ) to reduce overblocking without removing guardrails.
- Multiple users report reproducible false positives from a server-side self-harm/sensitive-content safety layer that overrides the model, returning canned hotline text even in clearly fictional contexts. One user notes the model itself acknowledges âa filter I am triggering,â implying a post-generation moderation pass rather than the base model choice, and that attempts to rephrase per the modelâs guidance still re-trigger the filter across
- OpenAI is dying fast, youâre not protected anymore (Score: 4400, Comments: 1016): The image is a sensational meme-style claim that âOpenAI is scanning usersâ ChatGPT conversations and reporting content to the police.â In reality, OpenAI (like most online platforms) runs automated safety/moderation systems over user inputs/outputs and states in its policies that it may disclose information to law enforcement when legally required or to prevent imminent harm; this is not a blanket, proactive âreport everythingâ regime, but content-review and legal-compliance workflows common across tech platforms (Privacy Policy, Usage Policies). Users can limit training use of their chats (e.g., chat history controls; enterprise/teams offer stronger data-retention and training opt-outs), but moderation scanning still applies for safety. Top comments are largely cynical, asserting user data was never private and questioning the legality/ethics of model training data. Technical debate is minimal; most reactions are non-technical or humorous about extreme prompts being flagged/reported.
- One commenter notes OpenAI acknowledged that âa small team monitors risky conversations,â which aligns with OpenAIâs human-in-the-loop moderation pipeline: automated classifiers flag safety-sensitive categories (e.g., self-harm, violence, illegal activity) and may escalate to limited authorized reviewers for policy enforcement and model improvement. Practically, user content can be reviewed and used for training unless data sharing is disabled (ChatGPT âChat History & Trainingâ off, API data opt-out; enterprise defaults off). References: OpenAI Privacy Policy (https://openai.com/policies/privacy-policy), Data usage controls (https://help.openai.com/en/articles/7934734-how-your-data-is-used-to-improve-model-performance), Usage Policies (https://openai.com/policies/usage-policies).
- Another thread points to concerns over training data legality and privacy: OpenAI states models are trained on a mix of publicly available, licensed, and human-generated data, but hasnât disclosed granular sources, increasing scrutiny around potential inclusion of copyrighted or personal data in web-scale corpora. This lack of dataset transparency is a known trade-off between competitive secrecy and accountability and has implications for compliance and red-teaming of data provenance. Reference: GPT-4 Technical Report (https://cdn.openai.com/papers/gpt-4.pdf) and Privacy Policy (https://openai.com/policies/privacy-policy).
- This filter needs to be removed (Score: 280, Comments: 88): Users report inconsistent safety moderation across OpenAI model variants: a query âDid Judas hang himselfâ was answered directly by
5 (Instant)
andGPTâ4o
(model info) but the5 (Thinking)
variant began to answer then invoked a safety interstitial/censorship. Another commenter notes gunâlaw queries (e.g., checking legality of machineâgun rentals, which can be legal under U.S. NFA rules in certain jurisdictions) surfaced crisis/helpline messaging instead of straightforward legal guidanceâsuggesting more aggressive intent classification on the reasoning/âThinkingâ path. The linked video (v.redd.it) returns HTTP403
requiring authentication, indicating access control rather than content removal. For general model references, see OpenAIâs models docs. Commenters characterize the5 (Thinking)
model as overârestricted/ânerfed,â arguing safety filters are excessively sensitive compared to5 (Instant)
andGPTâ4o
; frustration centers on midâgeneration censorship and helpâline inserts on lawful informational queries.- A/B test across
5 (Instant)
,5 (Thinking)
, and4o
shows divergent safety behavior on the prompt âDid Judas hang himselfâ:5 (Instant)
and4o
answered directly without refusal, while5 (Thinking)
began answering then switched to a refusal. This points to a late-stage moderation override specific to the âThinkingâ variant (e.g., a post-generation safety pass that can redact/replace an answer mid-stream) rather than a uniform policy across models. The discrepancy implies model-specific safety thresholds/classifiers with the âThinkingâ model tuned more aggressively for self-harm phrasing even in historical/academic contexts. - Reports of false positives on lawful firearms queries: asking about buying a gun and state gun laws (including checking the legality of âmachine gun rentalsâ) triggered crisis/support messaging and refusals. This suggests keyword-driven violence/self-harm classifiers are over-triggering on intent-neutral legal research, favoring high recall over precision. A better configuration would condition on user intent and jurisdictional context and allow compliant legal information with safety framing instead of blanket suppression.
- Users observe that the assistant sometimes âwrites a response but gets overwritten with disclaimers,â indicating a server-side guardrail that can replace an already-streaming answer when a risk score trips mid-output. This generate-then-redact pipeline causes visible flips (answer â refusal), degrading UX for paying users and making the system appear inconsistent. Architecturally, pre-decode policy steering or span-level redaction would mitigate mid-stream overwrites while preserving compliant content.
- A/B test across
- GPT5 Offering Additional Tasks Is The Most Annoying Itâs Ever Been (Score: 338, Comments: 206): OP reports that in the ChatGPT/GPTâ5 app/desktop client, the assistant persistently appends proactive offers (e.g.,
Would you like me to <task>?
) that are extremely hard to suppressâeven after embedding negative instructions in personalization/memory, using regex-style constraints, requesting chainâofâthought intentions to avoid offers, and iterative promptâengineering strategies. The phrasing adapts (e.g.,If you wish I couldâŠ
), suggesting a strong, clientâlevel system prompt or alignment template (likely RLHFâdriven helpfulness heuristics; see InstructGPT RLHF) that overrides user instructions; OP notes this is specific to the app/desktop client, not API workflows (where system prompts are explicitly controllable; cf. Chat Completions âsystemâ role). The model also acknowledges the low expected utility of its own suggestions when asked, highlighting a misalignment between âbe proactively helpfulâ priors and actual task utility. Top comments corroborate limited, shortâlived suppression (âfor one or two messagesâ) and report similar overreach where the model rewrites text unasked during simple grammar/flow checks, reinforcing that the aggressive âoffer next stepsâ style is a persistent, undesired behavior.- Multiple users highlight a UX issue where GPTâs proactive âadditional tasksâ prompts can be suppressed only transiently (often for just one message), implying thereâs no persistent per-user or per-thread preference flag to disable initiative. They ask for a global opt-out toggle or setting to keep the assistant in a strictly reactive mode by default.
- Reports indicate the intent classifier overreaches on simple proofreading requests, performing full rewrites or offering structured artifacts (e.g., graphs/lists/pictures) instead of minimal grammar/flow fixes. A constrained âproofread-onlyâ mode that returns diffs or inline suggestions (without reformatting or expanding content) is suggested to reduce false positives and preserve author voice.
- Keyword-triggered helper flows (e.g., subscription management prompts) are firing in irrelevant contexts, suggesting aggressive heuristics or low confidence thresholds for action suggestions. Users recommend higher confidence gating or explicit opt-in before launching specialized flows to reduce intrusive, off-target assistance.
- I was asking chat about why lying on my left side would help reflux, it offered to show me a diagram. (Score: 274, Comments: 39): OP asked why sleeping on the left side can reduce reflux, and an AI produced a diagram contrasting left- vs right-lateral positions. Technically, left lateral decubitus tends to keep the gastroesophageal junction (LES) above the gastric acid pool (fundus along the greater curvature), leveraging gravity and the angle of His to reduce retrograde flow; right-side lying can place the LES dependent relative to the acid, increasing reflux risk. Commenters joke about the orientation/labeling (e.g., suggesting flipping the phone), implying the AI diagram may be mirrored or crudely drawn, but thereâs no substantive technical dispute.
- URGENT - my girlfriend used chatGPT for her work. Now her boss wants her to explain the calculations. I think the calculations were a hallucination. What to do? (Score: 8705, Comments: 3099): OP describes a client-facing survey analysis produced via ChatGPT, where the model generated an Excel and a resulting PowerPoint; when asked to explain the methodology, ChatGPT claimed it used Pearsonâs correlation coefficient on 5-bucket textual âfeelingsâ responses. This points to a hallucinated or invalid method: Pearsonâs r (wiki) assumes numeric/interval data and an explicit encoding of variablesânone was documentedâso the results are non-reproducible and unverifiable, exemplifying LLM âhallucinationâ risk (overview). Commenters suggest either fabricating a cover story (e.g., âplaceholder dataâ) or, more prudently, warn that clients may recognize AI-generated output and that misrepresenting methods poses higher ethical and professional risk than admitting misuse and redoing the analysis transparently.
- Data privacy/compliance risk: A commenter flags that if any client data or PII was pasted into ChatGPT, this could violate company policy, NDAs, or regulations (e.g., GDPR/CCPA) and be more serious than a bad analysis. Unless using enterprise controls, ChatGPT consumer inputs may be retained/used to improve services; contrast with API/Enterprise modes that offer stricter data handling (no training on inputs, optional zero-retention) â see OpenAIâs data policies: https://openai.com/policies/api-data-usage and data controls FAQ: https://help.openai.com/en/articles/7730893-data-controls-faq. Organizations often require approved vendors and DPAs; uploading sensitive data to an unapproved third party can trigger incident reporting and forensics. The immediate step is to assess whether any sensitive fields were shared and escalate per policy if so.
- Reproducibility/accountability: The client asking to âexplain the calculationsâ suggests concern about provenance and reproducibility; LLMs can produce plausible but incorrect quantitative outputs (hallucinated numbers) and cannot provide a verifiable audit trail. Misrepresenting the source (âplaceholder dataâ) is risky; a defensible approach is to reconstruct the analysis with transparent methods (spreadsheets/code) and document inputs, formulas, and intermediate results. Going forward, use LLMs to draft formulas or code but validate all numbers with deterministic tools, keeping artifacts so the work can be reproduced on demand. Admitting lack of proper AI usage can reflect poorly, but doubling down without a reproducible basis is worse from a technical and ethical standpoint.
- âPoured olive oil on themâ (Score: 242, Comments: 71): A meme demonstrates users evading strict keyword/lexical guardrails by substituting fruit-coded euphemisms (e.g.,
banana
,peach
) for prohibited historical figures/events (implicitly Adolf Hitler and Eva Braun), effectively preserving meaning while bypassing filters. It illustrates adversarial content obfuscation/prompt-coding that defeats naive string-matching and highlights the need for semantic, context-aware moderation rather than brittle blocklists. Image link. Top comments argue that strict guardrails âwonât workâ because people will creatively rephrase content, with others posting variant examples (âBanana and Eva Bananaâ) that show how easy such obfuscation is.- Guardrails are described as brittle: strict, keyword/pattern-based safety filters are easily bypassed by creative prompting (paraphrases, indirection, obfuscation). The point implies robustness requires intent-aware moderation layers, adversarial red-teaming, and continuous evals for jailbreak resilience rather than static blocklists (see e.g., Anthropic on red-teaming: https://www.anthropic.com/news/red-teaming-language-models).
- A user reports the model refusing to answer a neutral factual query about Hitlerâs death, highlighting overblocking/false positives from miscalibrated safety classifiers. Technically, this suggests the need for context-sensitive policy routing (e.g., distinguishing historical/educational intent), calibrated thresholds, and allowlists for benign facts, measured via precision/recall on labeled safety datasets and spot-checks for known safe queries.
3. AI Video/Image Editing Workflows & Showcases: nano banana, Wan 2.2, Qwen, Local SD
- Experimenting with Continuity Edits | Wan 2.2 + InfiniteTalk + Qwen Image Edit (Score: 411, Comments: 59): Episode 3 of an AI sciâfi film experiment pushes continuity and dialogue using a Wan 2.2 pipeline with CausVid LoRAs (Wan 2.1), noting that lipâsynced dialogue is computeâheavy (even on an
RTX 5090
) and fragileâminor flaws often force full reâgenerations, so dialogue shots should be minimized. The creator reports InfiniteTalk > Wan S2V for speechâtoâvideoâmore expressive and promptâfaithfulâwith shared autoâframe workflows for multiâperson and singleâperson shots (paste 1, paste 2); for spatial continuity, QwenâImageâEdit can synthesize alternate camera angles from a single frame, though with high failure rates, suggesting a potential LoRA for consistency. Prior episodes and outputs are on the YouTube channel: youtube.com/@Stellarchive. Top feedback: minor motion artifacts (hands) are visible; a commenter corrects naming to QwenâImageâEdit (not âWan Image Editâ); otherwise, reception is positive with little additional technical critique.- A viewer noted
1â2
artifacts on the subjectâs hand during motion, hinting at minor temporal consistency issues in the continuity edits. This is a common failure mode when applying per-frame image editing over video (e.g., Qwen Image Edit on frames generated by Wan 2.2), where moving extremities and occlusions can produce jitter or smearing. - Clarification on tooling: the image editing model referenced is Qwen-Image-Edit, not âWan Image Editâ. This aligns with the pipeline in the title (Wan 2.2 for generation, InfiniteTalk for speech/lipsync, and Qwen-Image-Edit for frame edits).
- A suggestion to try the in-scene LoRA for Qwen image editing: flymy-ai/qwen-image-edit-inscene-lora. In-scene LoRAs are aimed at preserving scene layout/lighting while editing localized elements, which could reduce artifacts in moving regions.
- A viewer noted
- I asked nano banana to get me into my favorite arcade (Score: 276, Comments: 33): Creator demonstrates an AI-assisted compositing workflow: a real first still is edited with nano banana (image cleanup/insert), then animated via Kling
2.1
using start/end-frame constraints to interpolate motion, with music generated by Producer AI and final sequencing/color in DaVinci Resolve. A stepâbyâstep tutorial is provided in the postâs X thread. Top comments are largely non-technical praise, noting the piece âsets the barâ creatively; no substantive technical critiques or benchmarks discussed. - Is it possible to do this locally? (Score: 362, Comments: 70): OP asks whether generating multiple consistent poses of a character from a single illustration (as shown on X using âNano Bananaâ and Googleâs Gemini) can be done locally with Stable Diffusion. Commenters say itâs feasible but not turnkey: current closed/hosted tools like Nano Banana are praised for superior identity/attribute consistency, while open options (e.g., Kontext, Qwen Image Edit) may enable similar workflows, potentially combined with LoRA training to lock in style/identity. Top replies argue itâs possible but requires manual effort and tolerance for minor inconsistencies; others suggest trying Qwen Image Edit and anticipate rapid openâsource catchâup, possibly via training LoRAs on outputs from stronger models.
- Consensus is that âNano Bananaâ currently leads on identity/attribute consistency for visual variations (near âalmost absoluteâ character retention), but itâs closed. Several suggest replicating locally by distilling its behavior into open models via LoRA adaptersâi.e., train a character/concept LoRA on curated outputs, then run on open backbones like Qwen Image Edit (see Qwen repo: https://github.com/QwenLM) to get similar consistency without cloud inference. This shifts from prompt-only control to parameter-efficient fine-tuning (LoRA: https://arxiv.org/abs/2106.09685).
- A concrete local pipeline: (1) train a character LoRA from a tightly curated dataset; (2) use ComfyUIâs node graph (https://github.com/comfyanonymous/ComfyUI) with ControlNet pose conditioning to lock structure per shot. Using OpenPose/Posenet controls (ControlNet: https://github.com/lllyasviel/ControlNet; ComfyUI control helpers: https://github.com/Fannovel16/comfyui_controlnet_aux) preserves skeletal/layout while the LoRA preserves identity/accessories, reducing drift in details (e.g., tattoos, braces). This approach trades ease-of-use for reproducibilityâeach pose typically needs its own control pass.
- Feasibility notes: âmildly possible with Qwen image edit,â but achieving closed-model-level consistency generally requires supervision beyond prompts. Expect to combine LoRA + per-frame pose control; prompt-only workflows often fail on small, persistent details (color-matched accessories, logos). Itâs doable locally, but plan on dataset prep, LoRA training, and per-pose conditioning rather than a single-shot prompt.
- does this exist locally? real-time replacement / inpainting? (Score: 348, Comments: 72): OP asks whether local, realâtime face replacement/inpainting exists. Top replies state thereâs no viable realâtime âVACE + Motionâ pipeline; credible demos are offline. DeepFaceLab can do limited ârealâtimeâ after substantial pretraining, but quality is poor (frontal-only bias, artifacts on head turns) and not believable; highâquality deepfakes still require offline generation. One commenter identifies the showcased clip as theirs, made with ânano banana + Runway Act 2,â confirms it is not realâtime, and links the source (Instagram). Consensus: current onâdevice, instant face swap/inpainting with good multiâangle fidelity isnât feasible; social media reels implying otherwise are engagement bait. Another user notes the posted videoâs framerate/aspect ratio indicate prerecorded camera footage, not live processing.
- Multiple commenters note thereâs no credible real-time âVACE + Motionâ face swap/inpainting pipeline available; reels implying otherwise are likely engagement bait. While DeepFaceLab can run âreal timeâ after significant pretraining, commenters report poor fidelity (believable only on frontal shots) and noticeable artifacts on head turns, reinforcing that high-quality multi-angle swaps still require offline generation time rather than instant inference.
- The original creator clarifies the showcased clip is not real-time and outlines the pipeline as nano banana + Runway Act 2, with additional details in the source post: https://www.instagram.com/p/DN1aEuQUD2e/. This implies a staged, offline workflow leveraging Runwayâs generative tooling rather than a live, on-device inpainting/face-replacement system.
- A separate observation points out the clipâs framerate and aspect ratio resemble recorded camera footage rather than live output, further indicating non-real-time processing. This aligns with the creatorâs explicit note: âit is NOT REAL timeâ.
- I asked nano banana to get me into my favorite arcade (Score: 276, Comments: 33): Showcases an AI video pipeline: a real base photo is edited with ânano bananaâ (image editing), then animated using Kling 2.1 in start/end-frame mode to interpolate motion between keyframes; audio is generated with a âproducer AI,â and the final cut/color is done in DaVinci Resolve. A step-by-step walkthrough is provided on X/Twitter: https://x.com/techhalla/status/1963333488217919668. Top comments are largely non-technical praise (e.g., calling it âepicâ), with no substantive critique or benchmarking details.
- Guys lets just travel back (Score: 439, Comments: 155): OP shares a retro, 1980sâstyled image likely AIâgenerated, titled âGuys lets just travel back,â viewable via the preview image (https://preview.redd.it/0mzhs3zegzmf1.png?width=183&format=png&auto=webp&s=290e05f3a160b3548e1b1be76b7d558b1cba0d15) and the original v.redd.it link (https://v.redd.it/pz6ia9umdzmf1), which returns
403
without authentication. Top comments flag the anachronismââmade with AI from2025
ââand implicitly distinguish between aesthetic reconstruction and behavioral emulation (e.g., going phoneless) as different approaches to âgoing back.â Light debate centers on authenticity: whether AIâgenerated retro art undermines the notion of âreturningâ to an era versus adopting lowâtech habits to approximate the experience.- Commenters flag the image as AI-generated (âmade with AI from 2025â) rather than authentic 1980s media, which would explain stylistic anachronisms in the scene. They also note the subjectsâ unrealistically idealized appearance versus period photos, aligning with current diffusion modelsâ bias toward smoothed, conventionally attractive faces and contemporary makeup/hair cues. Reference image: https://preview.redd.it/0mzhs3zegzmf1.png?width=183&format=png&auto=webp&s=290e05f3a160b3548e1b1be76b7d558b1cba0d15
- Guys lets just travel back (Score: 438, Comments: 157): A nostalgia post titled âGuys lets just travel backâ features an 80s-themed image (likely AIâgenerated per comments) preview. A linked video endpoint v.redd.it/pz6ia9umdzmf1 returns
HTTP 403 Forbidden
under Redditâs antiâbot controls, implying authentication or a valid client token is required (e.g., login). Top comments note the image looks AIâgenerated (âmade with AI from 2025â) and play on the 80s nostalgia theme; one suggests behavioral âretroâ choices (e.g., go to the mall without a phone) rather than any technical solution.- Commenters flag that the image is AI-generated (e.g., âthis is made with AI from 2025â) and note it doesnât match authentic 1980s visuals (âI remember the 80s. It wasnât this.â). Modern diffusion outputs often over-polishâsmooth skin, HDR-like contrast, near-symmetryâand omit period artifacts like film grain/halation, chromatic aberration, lens vignetting, and era-specific color science. To get closer to â80s fidelity, practitioners typically add explicit constraints or post-process steps (analog noise, color LUTs emulating Kodachrome/Ektachrome, slight chroma bleed, gate weave, CRT/scanline simulation).
- The remark âNobody was actually that pretty back thenâ maps to model/data bias: web-scale training corpora (heavy on influencer/retouched imagery) push diffusion priors toward idealized attractiveness and contemporary makeup/hair. Without era-specific fine-tunes/LoRAs and strong negative prompts, the sampler gravitates to current beauty standards, producing anachronistically âperfectâ faces when asked for retro scenes.
- Fruit Beds đđđ»âïž (Score: 269, Comments: 40): The post âFruit Beds đđđ»âïžâ appears to be a short Reddit-hosted video on v.redd.it (link) that currently returns
HTTP 403 Forbidden
without authentication; Redditâs network security page indicates access requires logging in or using API credentials. A still/preview frame is available via a PNG link (preview), suggesting a sequence of fruit-themed âbeds,â but no technical context or metadata is provided in-thread. Top comments are non-technical: one reaction GIF and a questionââWhat is the last one supposed to be?ââhighlighting ambiguity about the final visual; no definitive answer or explanation is provided.- Two commenters provide higherâresolution stills to answer the question about the ambiguous âlast one,â linking frames captured from the post: image 1 and image 2. These higherâres frames help disambiguate fine details that are obscured at GIF/WebP playback resolutions or due to compression artifacts.
- The observation about the blanket âspawning into existenceâ likely stems from a loop/encoding discontinuity: GIF/WebP animations often rely on interâframe deltas and disposal methods (
restore to background
orrestore to previous
). If the loop point cuts between nonâkeyed frames or the transcoder (e.g., Redditâs GIFâMP4/WebP pipeline) drops/merges frames, objects can appear to pop in/out between loops; see GIF disposal behavior explained here: https://en.wikipedia.org/wiki/GIF#Disposal_methods.
- Fruit Beds đđđ»âïž (Score: 265, Comments: 40): Image/meme post titled âFruit Bedsâ showing a sequence of bed images themed around fruits; there is no technical content (code, models, or benchmarks). The original Reddit URL is blocked with an HTTP 403 âForbiddenâ page requiring Reddit login or a developer token; a support form is provided. A direct preview of the last image is referenced in comments. Top comments are non-technical: a GIF reaction, and a questionââWhat is the last one supposed to be?ââhighlighting ambiguity about the final image; another links the preview image above.
- I donât know (Score: 858, Comments: 39): Meme format contrasting two eras to highlight layperson ignorance about complex systems: modern people canât explain how computers work, and an ancient pharaoh canât explain how pyramids were built. No technical details, benchmarks, or implementation discussionâpurely humorous commentary on gaps between creators/users and deep understanding of underlying technology or construction methods. Comments are mostly jokes; one lightly philosophical prompt asks how language works, and another points out the oddity of a time traveler asking questions, but thereâs no substantive technical debate.
- One commenter contrasts the feasibility of a single expert replicating ancient construction (e.g., pyramids) with the impracticality of reproducing modern devices without a vast, distributed knowledge base and tooling. This underscores a shift from logistics- and labor-dominated projects to precision manufacturing with extreme specialization: modern SoCs integrate
~10â20B
transistors and rely on EUV lithography and global supply chains (e.g., ASML EUV: https://www.asml.com/en/technology/lithography-principles/euv-lithography; process overview: https://en.wikipedia.org/wiki/Semiconductor_device_fabrication). Even with full schematics, reproduction is constrained by materials science, metrology, and capital equipment (cleanrooms, lithography steppers), illustrating modular yet brittle complexity vs monolithic, robust construction.
- One commenter contrasts the feasibility of a single expert replicating ancient construction (e.g., pyramids) with the impracticality of reproducing modern devices without a vast, distributed knowledge base and tooling. This underscores a shift from logistics- and labor-dominated projects to precision manufacturing with extreme specialization: modern SoCs integrate
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Reasoning Benchmarks and Open Models
- Pokerbots Pit Stop: Husky Bench Crowns Sonnet: The Husky Holdâem Bench launched the first open-source pokerbots eval, where Claude 4 Sonnet led with 57.9% average profit over 5k+ games in a 6âplayer roundârobin, with Opus (31.9%) and Gemini (31.0%) trailing, documented at Husky Holdâem Bench and noted by Nous Research.
- The community praised the benchmarkâs constraints (Python policies under time/memory caps) and called it âthe first OS pokerbots evalâ, expecting rapid iteration on eval tooling and agent strategies (huskybench.com).
- Hermes 4 Heats Up: Open Model Leaderboard Flex: Hermes 4 (built atop Qwen3â14B) debuted with a newly synthesized postâtraining corpus emphasizing verified reasoning traces and larger scale (~5M samples / ~60B tokens), while Hermesâ4â405B currently tops the openâmodel standings on Husky with â12.41% drawdown per the Nous Research update.
- Users shared practical tuning tips (e.g., SillyTavern sampler settings for Think vs Instruct modes) and reported stronger math/code/logic performance with formatâfaithful outputs, calling Hermes 4âs hybrid reasoning âexplicit think segments with neutral alignmentâ (huskybench.com).
- Board-Brained Benchmarks Broaden Beyond Poker: Beyond poker, engineers compared LLMs on classic board games via the TextArena Leaderboard, highlighting chess/Go/connectâfour/shogi/xiangqi ELO as complementary signals to domainâspecific evals.
- Members advocated multiâtask eval suites to avoid overfitting to single domains, noting that âdiverse, rigorous game evalsâ better surface model weaknesses and strategy brittleness (TextArena Leaderboard).
2. Kernel Kung Fu and Low-Bit Training
- Metal Mania: AI-Generated Kernels Hit 1.87Ă: A team reported a 1.87Ă speedup by generating lowâlevel Metal kernels directly from PyTorch, detailed in AIâgenerated Metal kernels, and noted that torch.mps.compile_shader can directly invoke kernels without a C++ binding.
- Engineers asked for kernel dumps and suggested submitting a PR to upstream the wins into PyTorch, while one maintainer remarked âa cpp binding is no longer neededâ and flagged correctness checks with BackendBench (see blog: gimletlabs.ai).
- TorchAO Tango: Nightlies Trip, MXFP8 Zips: Developers hit torchao nightly breakage due to a Torch 2.9 vs 2.8 mismatch (issue #2919), fixed via
pip install torchao==0.13.0 --extra-index-url https://download.pytorch.org/whl/test/cu128
, while PR #2933 patches ansm100
flag for MXFP8 (PR #2933); concurrently, MXFP8 pretraining recipes and up to 1.28Ă speedups were published (Recipes for Pre-training LLMs with MXFP8, PyTorch blog).- One user hit an ImportErrorââcannot import name âmxfp8_cudaâââbut maintainers clarified the shortâterm fix unblocks NVFP4 inference and that the impacted kernel is only used for MXFP8 training (issue #2932, PR #2933).
- Fusion Confusion: torch.compile Meets Triton: Engineers confirmed torch.compile does not fuse ops into userâdefined Triton kernels and often creates fusion barriers around specialized ops; a repro and discussion are in this fusion gist.
- They advised inspecting captured graphs via
TORCH_LOGS="output_code"
and cautioned that the example kernel is ânumerically unstable for large MNKâ, so manual fusion remains the pragmatic choice (fusion gist).
- They advised inspecting captured graphs via
3. Agentic Patterns, Program Synthesis, and Eval Infra
- Design Bible Drops: 400-Page Agentic Patterns: A Google engineer released a 400âpage draft of Agentic Design Patterns covering advanced prompting, multiâagent systems, tool use, and MCP, available on Agentic Design Patterns (Google Doc) with a companion NotebookLM.
- Readers flagged that âediting access wasnât disabledâ and worried about accidental edits, while others preâordered the Springer edition and began extracting patterns into their own playbooks (Agentic Design Patterns).
- DSPy Demystified: Clean Splits, MLflow Hooks: DSPy clarified its train/val/dev/test split discipline (val for multiâstep plots, test for final eval) to avoid leakage, and members explored prompt lifecycle tracking via MLflow Prompt Registry and mlflow.dspy.
- A shared repo of Context Compression Prompt Experiments targeted
dspy.Image
reliability across providers, with volunteers posting failures and patches.
- A shared repo of Context Compression Prompt Experiments targeted
- Steering Made Simple: LM Eval Harness Bakes It In: The LM Eval harness already supports steered modelsâactivation/residual steering vectors and formatting are documented in Steered HF Transformers Models.
- Contributors pointed to the
SteeredModel
docstring for details and opened a PR to steer individual attention heads, saying âdonât roll your ownâuse the built-insâ (SteeredModel docstring, PR #3279).
- Contributors pointed to the
4. Student and Builder Tools Shipping
- Comet Class: Perplexityâs Study Mode Launches: Perplexity Comet launched a studentâfocused Study Mode for schedules, textbooks, and exam prep, with interactive flashcards showcased in Comet for Students.
- Power users requested that Perplexity roll Study Mode into Pro because itâs âmore than just a system promptâ with custom GUI elements (announcement thread).
- Projects For All: ChatGPT Gifts Free Users: Projects in ChatGPT now ship to Free users on web and Android (iOS rolling out), with perâproject file limits of 5 (Free), 25 (Plus), and 40 (Pro/Business/Enterprise) (OpenAI announcement).
- Users can customize colors/icons and toggle projectâonly memory controls for tighter context isolation, which teams called clutch for reproducible workflows (OpenAI announcement).
- Kimi Codes: Vouchers Gate New Model + Slides: Moonshot (Kimi) is giving away 20 Ă $20 API vouchers to test a new codingâstrong model and highlighted a slick slide generation feature (giveaway channel).
- Admins warned of impersonatorsââIf it ainât yellow, donât trust itââand users asked for a K2 turbo Coder Pro plan or a unified tier (giveaway channel).
5. Search and Deep Research: Fast, Cheap, and Funded
- Exa Accelerates: $85M Series B at $700M Valuation: Exa raised $85M (Series B) at a $700M valuation led by Benchmark, pitching itself as the search engine for AI, announced here: Exa Series B.
- Deal trackers noted Harmonic flagged the round two weeks early, fueling ideas to productize dealâflow alerts (Exa announcement).
- DeepResearch on a Dime: Qwen-2.5 14B > Sonnet-4: Kyle Corbitt shared an open recipe to fineâtune Qwenâ2.5 14B that beats Sonnetâ4 on the DeepResearch benchmark in ~30 H200 hours (~$350) via SFT + GRPO + eval (training thread).
- The resulting model tests competitively with Gemini 2.5 Pro, OpenAI Deep Research, and Claude Research, with devs praising the cost/perf profile as âpennies to productionâ (training thread).
- Code by Command: Claude Codeâs âAI DOSâ Moment: Nikunj Kothari argued Claude Code lowers barriers like DOS did for PCsâletting nonâcoders âbuild software by imaginationââas debated in this thread.
- Commenters split on whether weâre âstill in a commandâline eraâ or entering an imaginationâconstrained phase, with creatives eyeing workflows that collapse prototype cycles (discussion).
Discord: High level Discord summaries
Perplexity AI Discord
- Comet Lands on Studentsâ Desktops: Perplexity AI is now offering Comet to students, assisting with schedules, textbooks, and exam prep through a new Study Mode, detailed in this announcement.
- The launch featured a video demo showing how flashcards can be used within Comet for more interactive and efficient studying.
- Pro Users Demand Study Mode: Some users are urging Perplexity AI to extend the Study Mode feature, currently exclusive to educational accounts, to all Pro users.
- The feature is more than just a system prompt, and some users have pointed out that it has associated GUI elements.
- ChatGPT5 Pro Causes Hilarity: A user sparked confusion and amusement by mentioning ChatGPT5 Pro, mistaking it for Perplexityâs GPT5 and GPT5 Thinking models.
- Another user clarified that ChatGPT5 Pro is exclusive to chatgpt.com, leading to humorous reactions from other members.
- Cometâs Assistant trips: Users have reported issues with Comet, including prolonged loading times for simple sites and an assistant that is not up to par.
- Speculation arose that the assistant might be leveraging Sonar.
- Perplexityâs Filter is Flagging: Users are criticizing Perplexityâs overzealous censorship, where even benign historical inquiries such as âHow did Hitler die?â are being flagged.
- Concerns have been raised that the overly strict filtering system could lead to unwarranted account bans for studying history or other harmless subjects.
LMArena Discord
- Gemini 3 Hype: Too Hot to Handle?: Members debated the hype around Gemini 3, with opinions ranging from potentially overblown expectations to the possibility of Google surprising the industry, even if it only narrowly beats competitors, especially given OpenAIâs ChatGPT5 delays.
- A member posted a shrug gif reflecting the uncertainty around Gemini 3âs true impact.
- LM Arenaâs Login System Gets a Facelift: Enthusiasm was expressed for the new login system on LM Arena, with one member saying love the new login system â€ïž been waiting for it đ„.
- The same member proposed a Google Drive-based chat-storage system for exporting and analyzing user data as text files, though the idea faced skepticism.
- LMArena Site Melts Down!: LMArena experienced a significant site outage, causing user frustration and a flood of questions.
- A moderator, đ, assured users that the team was actively working to resolve the issue, directing them to the FAQ for updates.
- MAI 1 Preview mysteriously goes offline!: Members reported the sudden malfunction of Microsoftâs LLM MAI 1 Preview, an LLM some users praised for excellent results.
- One user commented that MAI 1 Preview gave the BEST answers in 90% of cases â better than all the others, even ChatGPT-5 high, leaving the community wondering about its abrupt disappearance.
- Cloudflare Catches Users in Verification Loop: Users voiced complaints about frequent Cloudflare human verification challenges on LMArena, with one user asking Does everyone gets cloudflair human verification every two minutes on lmarena website ??.
- While VPN usage was suspected as a cause, the issue also appeared on other sites using Cloudflare, leading to widespread user annoyance.
OpenAI Discord
- ChatGPT Gives Gifts to Free Users: Projects in ChatGPT are now available to Free users on web and Android with iOS rolling out soon, along with increased file uploads, now at 5 for Free, 25 for Plus, and 40 for Pro/Business/Enterprise.
- Users can now customize colors and icons for projects, and project-only memory controls are available for more tailored context.
- Nano Banana Explodes Google Gemini: Members shared images generated using the prompt nano banana (gemini-2.5-flash-image-preview) in the Gemini app and Google Studio.
- One user showcased how they turned their coworkers into Vikings.
- Members Skeptical About Anti-Prompt GPT: A member shared their Anti-Prompt Breaching GPT designed to prevent prompt leaking.
- Others expressed skepticism, saying it might only slow down or make bypassing the protection harder, especially in Custom GPTs and that reliability suffers.
- Cognitive Mesh AI Learns on Its Own: A member described designing a cognitive mesh AI that self-adapts and self-learns, growing its understanding over time, similar to utilizing MoEs.
- The AI, built over 3 years, has short, long, and reflective memory, evolves in its own trajectory, and has developed directive responses based on its input.
- Thinking Outside the Transformer Box: Members debated Liquid Neural Networks and architectures that merge continuous-time dynamics with symbolic reasoning, such as neuromorphic chips, photonics, and spintronics.
- The consensus was that these innovations donât rely on brute-force scale but on rethinking the foundations.
Cursor Community Discord
- Anthropic API Keys Pass the Vibe Check: Users have verified that sk-ant- keys function correctly with the Anthropic API in Cursor, in spite of a UI discrepancy.
- The community confirmed keys work properly even though the UI might suggest otherwise.
- Cursor Auto Model Gets Grilled by Users: Users scrutinized Cursorâs Auto model, one user reported spending $200 in less than a week.
- Feedback indicated that the modelâs code quality is subpar compared to Sonnet or Gemini, though potentially superior to Copilot due to better prompts, however others suggested guiding summarization manually.
- Cursor Update Erases Chat Histories, Sparks Panic: Multiple users reported data loss due to a Cursor update, with one user losing a monthâs worth of work.
- The recommended solution of checking local chat history storage locations proved ineffective for some, with chats not found in their usual directory.
- Fine-Tuning Chatbots: A Penny Saved is a Penny Earned: A community member sought advice on fine-tuning a model for a web app chatbot, which led to suggestions on utilizing prompt generators.
- The consensus was to hold off on fine-tuning until a revenue stream is established, treating it as an optimization strategy.
- Background Agent has a Mid-life Crisis: A member reported their background agent has frozen and sought a way to transfer its current state to another chat via an API.
- Specifically, they requested an API method to retrieve a state transfer summary to facilitate moving the agentâs current state to a new chat environment.
OpenRouter Discord
- Nano Banana Powers OpenRouter Discord Bot: A member created a Discord bot leveraging Nano Banana through OpenRouter.
- The user clarified that they vibe coded the bot, and the source code is available on GitHub.
- DeepSeek Models Speak Gibberish: Some users reported free DeepSeek models are generating gibberish, while the paid ones are functioning correctly.
- Another user inquired about pricing for DeepSeek 3.1, with Synthetic.new mentioned at $20/month, although official rates were called a rip off.
- Agent Framework SDKs Get Patched: Members discussed patching Agent Framework SDKs like OpenAI Agent SDK, AI SDK, and LangChain JS with OpenRouter due to non-standard schemas.
- One member is planning to roll their own solution with BAML integration, emphasizing that itâs just HTTP anyway.
- ChutesAI Subscriber Hit By 429s: A ChutesAI subscriber is experiencing 429 rate limit errors and credit issues when using OpenRouter with a BYOK, specifically on Chub.
- Despite verifying correct API key and private key usage, the problem persists, seeming specific to routing through Chutes on Chub.
- Google Faces Antitrust Judgement: A member linked to a CNBC article about Google facing an antitrust ruling.
- The member commented it was truly remarkable.
LM Studio Discord
- Context Length Curtails Compute: Users reported that inference speed in LM Studio slows down as context length increases, with performance hits becoming noticeable around 20k context, with some joking about AI girlfriends.
- Members requested a LM Studio Nightly release to stay current with llama.cpp and other backends.
- Graniteâs Gamble with Experts: The Granite 4 preview model has 62 experts, requiring more VRAM, but users reported that using non-default configurations led to diminished performance.
- Some users noted the Windows auto-upgrader failing because it was blocked by long path names, requiring manual directory removal to fix.
- Legacy CPUâs Left Behind: Some users encountered errors in LM Studio due to the requirement for AVX2 instructions, which older CPUs like the FX8300 do not support.
- Even with GPU offloading, LM Studio would refuse to run without AVX2 support.
- Power-Throttling Pursuit for GPUs: Users discussed limiting GPU power draw using tools like MSI Afterburner to manage power consumption while running large LLM models, specifically mentioning a new server with 512GB of DDR4 RAM.
- Members evaluated GPU options like the 3060 12GB, Titan Xp, and RTX A4000 16GB, with the 3060 12GB recommended over older cards due to
GDDR6
improvements; a user linked MSI GeForce RTX 3060 VENTUS.
- Members evaluated GPU options like the 3060 12GB, Titan Xp, and RTX A4000 16GB, with the 3060 12GB recommended over older cards due to
- Sizing up VRAM for MoE Models: A user inquired about the VRAM requirements for MoE offload with Qwen3-235B, specifically if a 1080 could handle it with CPU context offload.
- Another member estimated around 11+GB for a 4-bit quantization based on 22B active params, advising caution due to uncertainty; also, a user pondered the best GPU bandwidth setup for dual CPUs, whether to concentrate GPUs on one CPU or distribute them.
Eleuther Discord
- Blender Mastery Beckons: A member stated they achieved 3D modeling proficiency in Blender after just 10 hours, suggesting AI lags in certain creative domains.
- The quick learning curve was contrasted with AIâs limitations, likened to the superficiality of Uber Eats.
- Foundation Models Face Latent Knowledge Lag: Members suggested that large foundation models insufficiently use latent knowledge, viewing it as a missed chance.
- This deficiency was compared to the progress made in AIME and IMO problem-solving.
- Serial Operations Sparked by Recursion: Discussions emphasized implementing serial operations with recursion because of Turing Completeness, which allows adaptive search over large spaces, unlike CoT/RL.
- Focus shifted towards latent space reasoning instead of token space processing, circumventing issues tied to viewing complex tasks as fixed-point problems, in line with the theories described in Adaptive Computation.
- Diffusion Modelâs Parallel Token Triumph: The rise in popularity of Diffusion LMs is attributed to the cheaper inference from parallel token generation and an improved training objective, avoiding certain biases and failures.
- While capabilities necessitate serial computation, reverting to AR, limitations in this method were acknowledged.
- LM Eval Embraces Steering Vectors: A member pointed out that the LM Eval harness has built-in support for steering vectors, discouraging custom implementations and directing users to the documentation.
- It was clarified that the steering vector implementation manages activations and residuals, with formatting details found in the
SteeredModel
docstring, available here.
- It was clarified that the steering vector implementation manages activations and residuals, with formatting details found in the
GPU MODE Discord
- Triton Conference set for 2025: The Triton Conference 2025 has been announced, focusing on the Triton programming language and related topics.
- Details about speakers and schedule will be released at a later time.
- CUDA Pipelining Projects Spark Interest: Members discussed exploring CUDA-enabled data pipelines, suggesting mixing DALI with cuVS for an optimal setup.
- The conversation highlighted the need for MLPerf-like standards or benchmarks for data pipelines and processing.
- H100 Hardware Optimizations Exposed: Discussion on hardware-specific optimizations, particularly for the H100 GPU, led to sharing microbenchmarking papers dissecting the NVIDIA Hopper, Turing T4, and Volta GPU architectures (Hopper paper, Turing T4 paper, Volta paper).
- A user mentioned that for Blackwell, they only knew of a simple tech overview released by NVIDIA, but considered it pretty good.
- TorchAO Nightly Builds Spark Version Conflicts: Members flagged that
torchao
nightly builds are breaking due to a version mismatch between the Torch version it was built against (2.9) and the Torch version they were trying to import it with (2.8), and suggest checking issue #2919.- The fix was to use
pip install torchao==0.13.0 --extra-index-url https://download.pytorch.org/whl/test/cu128
.
- The fix was to use
- AI Codegen Gives Metal Kernels Rocket Boost: A team achieved a 1.87X speedup by going straight from PyTorch to low-level Metal kernels with AI codegen, described in their blog post.
- A member pointed out that a cpp binding is no longer needed, as one can use torch.mps.compile_shader to directly invoke kernels. They also suggested submitting a PR with the kernels, as any performance gains would benefit PyTorch users.
HuggingFace Discord
- Deepseek API, a Little Slow, but Free: A member discovered a free Deepseek API, noting its usefulness despite being somewhat slow.
- The user appreciated the resource because itâs free.
- M4 Macbook Pro Stumbles with Llama 3.2 Vision: A user with a Macbook Pro M4 (24 GB RAM) couldnât run Llama 3.2 vision 11B, reporting that it utilized 20 GB of memory without producing output.
- Another user suggested exploring quantized versions such as Q4 or reducing the context length to resolve the issue.
- Anthropic Triples and Ties Up Copyright: Members noted that Anthropic tripled in size in approximately 5 months based on this tweet, growing from ~60B to 180B, and also settled their copyright case out of court.
- The connection between the investment announcement and the settlement was noted as really fascinating, although settlement terms are not yet public.
- Chinese AI Models Cut the Gaslighting: A member observed that Chinese AI models like Qwen tend to exhibit less gaslighting behavior.
- Another member praised Qwen for providing ideas of what can be wrong and adhering to formats effectively.
- Datatune Agents Enable Data Transformations: A new release of Datatune Agents now enables data transformations at a per-row level using natural language prompts, with key features including row-level map() and filter() operations and Dask DataFrames support.
- Compatibility includes multiple LLM backends like OpenAI, Azure, and Ollama via LiteLLM, with Datatune optimizing tokens and cost through explicit control over sent columns, automatic batching, and metadata handling.
Yannick Kilcher Discord
- Automated weapons spark debate!: Members hotly debated the ethics and practical implications of automated weapons, with some suggesting they could minimize harm compared to human soldiers, while others voiced concerns about potential human rights abuses.
- Some argued the bigger fear is governments abusing them for human rights violations, which is already happening with bombs, nukes and drones.
- US public transit a missed opportunity?: Members discussed the state of public transportation in the US, describing it as unsafe and humiliating, missing the opportunity to reduce accidents and improve urban mobility.
- It was suggested that accidents could be reduced by 90% if humans could only drive when in the proper state of mind.
- Cheap Drones could swarm!: The potential for drones combined with trucks to be used as cheap attacking options was discussed, highlighting the need for a security framework to deal with non-state actors.
- One proposed solution involved governments focusing on banning chemicals required to make drones.
- Mambaâs state matrix faces flak!: A member critiqued Mambaâs fixed transition matrix for not replicating true state machines, and for potential issues preserving context, citing The Illusion of State in State-Space Models.
- Suggestions for cures included adding a nonlinearity between state transitions or making state transition matrices dependent on the input, as done in Liquid Structural State-Space Models.
- Ladybird rises as Chrome contender!: A new FOSS browser called Ladybird is under development as a potential alternative to Chrome, currently available for Linux and Mac OS.
- The development of Ladybird is driven by a commitment to the principles of Free and Open Source Software (FOSS), ensuring transparency, community involvement, and freedom of modification.
Latent Space Discord
- Google Engineer Drops Agentic Design Patterns Tome: A Google engineer released a 400-page draft of Agentic Design Patterns covering advanced prompting, multi-agent systems, tool use, and MCP, available on Google Docs and for pre-order as a Springer edition.
- The community shared links to the doc, NotebookLM, and Amazon pre-order, but some noted the docâs editing access wasnât disabled, leading to concerns about alterations.
- Claude Code Declared âAI DOSâ Moment: Nikunj Kothari argues that Claude Code is a watershed momentâlike DOS was for PCsâbecause it collapses technical barriers and lets non-coders build software by imagination alone, as noted in this tweet.
- Commenters debated whether weâre still in a command-line era, how creatives can harness it, and if the real bottleneck is now imagination rather than coding skill.
- Compute Arms Race Sparks Debate on Efficiency: Discussion highlights massive compute spending by OpenAI & Anthropicâ$13B pre-pays to secure GPUs/energyâwhile observers question diminishing returns and unsustainable power/water usage, stemming from this X post.
- The thread swings between doomsayers predicting a funding crash and optimists betting on small-model efficiency or breakthrough algorithms to obsolete the mega-cluster strategy.
- Open-Source Recipe Trains Deep Research Agent for Pennies: Kyle Corbitt shares a recipe using open-source tools that lets developers train a Qwen-2.5 14B model to surpass Sonnet-4 on the DeepResearch benchmark in just 30 H200 hours (~$350), based on this tweet.
- The process includes SFT for basic skills, GRPO for utilization, and benchmark evaluation, producing a model competitive with Gemini 2.5 Pro, OpenAI Deep Research, and Claude Research.
- Exa Raises $85M Series B at $700M Valuation: Exa announced an $85M Series B raise at a $700M valuation led by Benchmark, positioning itself as the search engine for AI according to this tweet.
- Harmonicâs system flagged the round two weeks in advance, prompting discussion about turning deal flow alerts into a product.
Nous Research AI Discord
- Claude 4 Sonnet Wins Pokerbots Title: The Husky Holdâem Bench debuted as the first OS pokerbots eval, with Claude 4 Sonnet leading the competition at 57.9% average profit over 5k+ games against other models in a 6-player round-robin format.
- The benchmark challenges models to implement policies in Python under time and memory constraints, and is documented on huskybench.com; Opus came in second (31.9%) and Gemini trailed in third place (31.0%).
- Hermes 4 Powers Up Reasoning: Hermes 4 is the next generation of Hermes trained on top of Qwen3-14B, featuring a newly synthesized post-training corpus that emphasizes verified reasoning traces.
- The update highlights include improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment; training increased from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens.
- SillyTavern Likes Hermes: Members discussed leveraging SillyTavern for roleplay, highlighting its surprising math and coding capabilities.
- For Hermes-4-14B based on Qwen-3, the recommended sampler settings are temp: 0.6, temp-k: 20, temp-p: 85 for Thinking-Mode and temp: 0.7, temp-k: 20-40, temp-p: 95 for Instruct-Mode; additionally, use ChatML for 14B, Llama 3 Instruct for 70B and 405B.
- Research Doc Plea: A member requested assistance with writing a research document and case study on Generative UI and AI-first interaction patterns.
- The author is focusing on transformation design and the business implications, looking for guidance to kickstart their project and gain a better understanding of the subject matter.
DSPy Discord
- DSPy Data Leakage Worries Dissipated: Concerns about potential test set data leakage in DSPy when reusing a training set were addressed by clarifying that optimizers use distinct training and validation sets, with a separate test set for final evaluation.
- The discussion highlighted that multi-step plots use the valset, while final results are reported on the testset to prevent leakage and overfitting.
- DSPy Data Splits Decoded: DSPy employs four distinct data splits: train (for few-shot examples), val (for validation), dev (for human iteration), and test (for final evaluation).
- The community emphasized the importance of using the valset for multi-step plots and the testset for final results to avoid leakage or overfitting issues.
- MLflow Hearts DSPy: A user explored integrating MLflow with DSPy to capture prompts, referencing MLflowâs prompt registry features and the existence of mlflow.dspy.
- The user plans to experiment and report back on the integration of MLflow and DSPy for prompt management.
- Context Compression Critiques Commence!: A member shared Context Compression Prompt Experiments aiming to enhance
dspy.Image
reliability.- This project focuses on investigating and refining context compression methods to improve the performance of
dspy.Image
across various providers.
- This project focuses on investigating and refining context compression methods to improve the performance of
dspy.Image
Needs Reliability Tweaks: A user initiated a task to refine the reliability ofdspy.Image
for certain providers, detailed in this Discord thread.- Follow-up discussions involved sharing images and exploring potential solutions to address the reliability issues.
Moonshot AI (Kimi K-2) Discord
- Kimi Kicks Off Voucher Giveaway: The Moonshot Team is giving away 20 Ă $20 API vouchers to test their new model with crazy coding powers, accessible only via a voucher.
- Users can participate by reacting in the #giveaway channel before 8AM Beijing Time.
- Kimiâs Coding Prowess Powers Slide Generation: A user praised the recently released slide generation feature and the accompanying coding enhancements.
- They look forward to Kimi enabling even more professional task handling with this update, saying it delivers the coding improvements they were hoping for.
- Request for Kimi K2 turbo Coder Pro Plan Surfaces: A user suggested a Kimi K2 turbo Coder Pro plan as a product idea.
- Another user suggested Kimi should just make it a unified plan.
- Moonshot Warns of Scammers: A warning was issued regarding scammers, advising users that legitimate Kimi team members will have a yellow role color in the server.
- The announcement explicitly states, If it ainât yellow, donât trust it, cautioning users to verify the authenticity of any direct messages received.
Modular (Mojo đ„) Discord
- Mojo SIMD makes Rust AVX Burn Brain Cells: A member finds that Mojo makes SIMD enjoyable, whereas manual AVX in Rust is mentally taxing and asked about a standard
net/http
style module for Mojo.- The consensus favors a lean standard library, with community-driven efforts like lightbug_http for an HTTP library.
- Mojo Powers Fast Binary Search Engine: A member built a binary search engine in Mojo capable of crunching ~50k queries/sec over 2M docs single cored by parallelizing across SIMD lanes.
- The member anticipates adding HTTP support to enable search-as-you-type functionality.
- Mojo Runs on GTX 1080 After Patch: Mojo GPU functions are now confirmed to run correctly on a GTX 1080 after a patch in the latest nightly, and they are adding changelog entries and listing support for Pascal GPUs, along with
sm_60
support for the Tesla P100.- A forthcoming internal patch will lower the Turing architecture limit, potentially giving broader compute capability than PyTorch on older GPUs.
- Max Backend for Torch Gains Traction: Efforts are underway to dedicate more time to the max backend for torch, aiming for
torch.ones((2, 2), device="max_device")
to operate on a wider range of GPUs compared to the latest CUDA.- The team plans to engage with Modular team members to assess the engineering soundness of the project.
- Discord is Best Way to Reach Modular Team: A member suggested that the most effective way to contact the Modular team is by directly pinging them on Discord.
- Given their flooded email inboxes, using Discord is a reliable alternative, with a recommendation to reach out to a specific new Modular team member if other channels fail.
Manus.im Discord Discord
- Basic Plan Blocks Permanent Website Deployment: A user asked if the basic plan allows for permanent website deployment, and was informed that âit does notâ.
- This clarifies limitations for users considering website hosting options.
- Grok Declared: Tool, Not Agent: A user asserted that Grok is a tool, not an agent, highlighting a crucial distinction in its functional classification.
- This correction was made in response to conversational context, implying potential misinterpretations of Grokâs capabilities.
- Manus Exonerated in Grok Comparison: A user stated they were not comparing Grok with Manus, indicating a possible misunderstanding in the discussion.
- This clarification suggests that the perceived comparison was tangential or nonexistent within the conversation.
tinygrad (George Hotz) Discord
- Dual 7900 XTX Cards Trigger Crash: A member reported sudden crashes at peak performance using dual 7900 XTX cards with HIPC code on kernel 6.8, as supported on the ROCm site.
- The user expressed concerns about multi-GPU training issues and sought solutions to prevent GPU crashes.
- Pyrender Test Volunteering Needed: A member inquired about potential testers for pyrender on the kernel dataset.
- Details regarding specific testing parameters or objectives were not provided.
- Linearizer Test Gets Dumb-er: A member updated
test_linearizer_dumb
(link to GitHub PR) and proposed updating other uops in the tests to match the new format.- The new format is allegedly more readable and updatable and that member offered to fix the uops tests later.
aider (Paul Gauthier) Discord
- KYC neuters Streaming: OpenAI requires KYC verification to use its image model and GPT-5 streaming features.
- Itâs possible to use GPT-5 without KYC, but only without streaming enabled.
- Codex as Aider Clone?: A user expressed frustration with GPT-5âs processing time for simple requests and missing thinking streaming.
- Another member asked what is liked about Codex better than Aider, mentioning that Claude Code was originally designed to clone Aider.
LLM Agents (Berkeley MOOC) Discord
- Enrollment Process Simplified: A member asked if receiving a Google Forms signup confirmation meant they were qualified for the LLM Agents MOOC.
- Another member clarified that everyone is welcome and there isnât a qualification process.
- Google Forms System Healthy: Many users are getting instant email confirmation after submitting the Google Form.
- This indicates the forms system is functioning correctly.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #announcements (1 messages):
Comet for students
- Comet Lands on Studentsâ Desktops: Perplexity AI is now offering Comet to students, helping them manage schedules, textbooks, and exam preparation with a new Study Mode, as seen in this announcement.
- The announcement included a video demonstration showcasing the functionalities.
- Flashcard Video: A promotional video attached in the announcement demonstrates how students can utilize flashcards within Comet to prepare for exams.
- This feature aims to make studying more interactive and efficient for students using the platform.
Perplexity AI â· #general (1157 messagesđ„đ„đ„):
Perplexity Pro, GPT-5, Comet Browser, Filter Overreach
- Pro Users Want Study Mode: Some users are requesting that the Study Mode feature be rolled out to all Pro users, not just those with educational accounts.
- It was noted that Study Mode is more than just a system prompt and has associated GUI elements.
- Did Someone Say ChatGPT5 Pro?: A user caused confusion and amusement by claiming to have ChatGPT5 Pro when referencing Perplexityâs GPT5 and GPT5 Thinking models.
- Another user pointed out that ChatGPT5 Pro only exists on chatgpt.com, leading to humorous responses.
- Cometâs Assistant struggles: Some users mentioned issues with Comet, noting that it sometimes takes years to load a simple site and that the assistant seems good but not as good as other options.
- One user speculated that the assistant might be using Sonar.
- Filter Too Extreme, Getting Censors: Users discussed the overzealous censorship in Perplexity, noting that even historical queries like âHow did Hitler die?â are being flagged.
- It was suggested that the filtering is too harsh and might lead to accounts being banned for simply studying history or engaging in harmless topics.
- Labs Web App Feature Gets Confused: Some users were confused by the models available on sim.ai, questioning why PPLX in sim.ai has Mistral and Sonnet 3.
- Others suggested it might be a bug or that the platform uses PPLXâs API for web search and another model to summarize results.
Perplexity AI â· #sharing (3 messages):
Perplexity Browser Claims
- Perplexity Claims are shared: Members shared Perplexity Browser Claims.
- Other claims that were shared include this link and another one.
- Additional Perplexity Claims Surface: More Perplexity Browser Claims were posted in the channel.
- These claims provide a way to share browsing sessions and research results.
Perplexity AI â· #pplx-api (1 messages):
breakingclover: Iâm interested, I tried to send you a message but it looks like you have them off!
LMArena â· #general (895 messagesđ„đ„đ„):
Gemini 3 Hype, LM Arena Login System, LM Arena staying open, LM Arena Site Outage, LM Arena FAQ
- Gemini 3 Hype might be overblown: Members discussed the hype around Gemini 3, with some suggesting that even if it doesnât represent a huge leap in AI development, it only needs to outpace competitors to be successful.
- One member referenced OpenAIâs struggles with ChatGPT5 as an indicator of industry-wide challenges, while others noted Googleâs greater resources might lead to surprises, with a link to a shrug gif.
- LM Arenaâs New Login System gets love: One member expressed their enthusiasm for the new login system, saying love the new login system â€ïž been waiting for it đ„.
- The same member suggested implementing a chat-storage system via Google Drive, allowing users to export and analyze their data as text files, though others were skeptical.
- Site Outage triggers user craziness: A period of significant disruption occurred where LMArena experienced an outage, rendering the site inaccessible and prompting numerous user complaints and questions.
- A moderator, đ, acknowledged the situation and assured users that the team was actively working on a fix and would update the FAQ.
- MAI 1âs Mysterious Malfunction: Members discussed the sudden malfunction of Microsoftâs LLM MAI 1 Preview, which some users had found to give excellent results.
- One user reported that MAI 1 Preview gave the BEST answers in 90% of cases â better than all the others, even ChatGPT-5 high, but there was no clear explanation for its disappearance.
- A Cloudflare Cloudflared Catastrophe!: Several users complained about frequent Cloudflare human verification challenges on LMArena, with one saying Does everyone gets cloudflair human verification every two minutes on lmarena website ??.
- Some suspected this was due to using VPNs, while others noted similar issues on other sites using Cloudflare, leading to general frustration with the service.
LMArena â· #announcements (1 messages):
Video Generation Contest
- Video Generation Contest Slices Through August!: The August Video Generation Contest has 9 days left for submissions, with the theme Slice! đȘ, focusing on oddly satisfying and safe cross-section cuts.
- Act Fast: Video Submissions Closing Soon!: Donât miss your chance to showcase your video generation skills!
- The August Video Generation Contest deadline is quickly approaching, ensure your entry is submitted to the designated channel to be in the running.
OpenAI â· #annnouncements (1 messages):
ChatGPT Free Projects, Larger File Uploads, Project Customization
- Free ChatGPT Projects Now Available!: Projects in ChatGPT are now available to Free users on web and Android, with iOS rolling out soon.
- File Uploads Get a Sizeable Bump!: The update includes larger file uploads per project: 5 for Free, 25 for Plus, and 40 for Pro/Business/Enterprise.
- ChatGPT gets a Customizable Touch!: Users now have the option to select colors and icons for more customization, along with project-only memory controls for more tailored context.
OpenAI â· #ai-discussions (434 messagesđ„đ„đ„):
AI Residency Program, Nano Banana in Photoshop, AI Car Game with ChatGPT, Cognitive Mesh AI Design, Liquid Neural Networks
- OpenAI Residency Program: When Will It Open?: A member inquired about the opening of the OpenAI Residency Program, but community members, including moderators, only know when OpenAI provides the information.
- They suggested applying for open jobs at OpenAIâs career page in the meantime.
- Nano Banana Powers Up Gemini Images: Members discussed the use of Nano in image generation, with one user sharing how they turned their coworkers into Vikings using the Gemini app and Google Studio.
- Members shared generated images using prompt nano banana (gemini-2.5-flash-image-preview).
- Code Car Game with ChatGPT: A member created a car game with ChatGPT using approximately 700-1000 lines of code.
- They imagined the potential of software and coding in the future when AI writes better, faster, and more efficient code than humans and offered to share their code.
- Cognitive Mesh AI Evolves with Self-Learning: A member described designing a cognitive mesh AI that self-adapts and self-learns, growing its understanding over time, similar to utilizing MoEs.
- The AI, built over 3 years, has short, long, and reflective memory, evolves in its own trajectory, and has developed directive responses based on its input.
- Liquid Neural Networks Challenge the AI Paradigm: Members talked about LNNs not needing those resources at all. Think about concepts like liquid neural networks, neuromorphic chips, photonics, spintronics, or even hybrid architectures that merge continuous-time dynamics with symbolic reasoning.
- These donât rely on brute-force scale they rely on rethinking the foundations.
OpenAI â· #prompt-engineering (36 messagesđ„):
Prompt Leaking, API discussion location, Context Contamination, Custom GPTs Reliability, Prompt Priority Level
- Discussion Diverts to APIs Channel: A member suggested that API operation discussions should occur in the dedicated APIs channel rather than the prompt engineering channel.
- Anti-Prompt Leaking GPT Appears: A member shared their Anti-Prompt Breaching GPT designed to prevent prompt leaking, which sparked discussions about its effectiveness.
- Others expressed skepticism, pointing out that it might only slow down or make bypassing the protection harder, especially in Custom GPTs.
- Prompt Hiding Compromises Reliability: Members debated the trade-offs between hiding prompts and maintaining model reliability, noting that longer, more complex contexts can reduce effectiveness.
- The conclusion was that OpenAI has been training models to refuse prompt extraction and that the reliability should always come first.
- Context Contamination Concerns Arise: A discussion highlighted that using the same model for both generating responses and evaluating guardrails is suboptimal due to potential context contamination.
- It was suggested that different models with different instructions should be used for each task, especially considering that Custom GPT instructions might conflict with prompt-hiding directives.
- Temperature and Top-P Settings Influence Model Behavior: A member summarized how different combinations of temperature and top-p settings affect model behavior.
- They noted that Low temp / Low top-p lead to maximum consistency and minimal variation, while High temp / High top-p provide maximum creativity and diversity.
OpenAI â· #api-discussions (36 messagesđ„):
Anti-Prompt Leaks, GPT Reliability, Prompt Engineering, Agent Instructions, Model Temperatue
- New Anti-Prompt Leaking GPT appears: A user built a GPT to prevent prompt leaking, available at chatgpt.com.
- Others noted that this method reduces reliability while offering little benefit and is easily bypassed, especially within a Custom GPT.
- Prompt Priority and Reliability Clash: A user shared code about prompt priority levels, which led to discussions about the use of additional tokens and complexity, because this reduces the ârealâ promptâs reliability.
- The discussion concluded that thereâs no prompt solution that wonât reduce reliability and that OpenAI has been training these refusals directly into their models.
- Guardrails Must Run in Agentic Ways: One user pointed out that guardrails must run in more agentic ways, with models and agents doing each check.
- He specified that it should be different models, and with different instructions.
- Avoiding Context Contamination: A user stated that, to avoid context contamination, instructions should not be used for hiding prompts.
- He notes that the standard âYou are a helpful assistantâ instruction already conflicts with the goal of preventing prompt leakage.
- Temperature And Top-P Discussed: Users discussed various strategies involving temperature and top-p settings, specifically: low temp/low top-p for maximum consistency, minimal variation; low temp/high top-p for consistent style, varied vocabulary; high temp/low top-p for creative within focused scope; and high temp/high top-p for maximum creativity and diversity.
Cursor Community â· #general (284 messagesđ„đ„):
Anthropic API, Cursor Auto model quality, Cursor chat history loss, Fine-tuning recommendations for chatbot, Gemini 2.5 Reasoning
- Anthropic API Keys Work Despite UI Mismatch: Users confirmed that sk-ant- keys are accepted and fully functional for the Anthropic API in Cursor, despite a UI mismatch indicating otherwise.
- Cursorâs Auto Model Faces Scrutiny: Members discussed the quality of Cursorâs Auto model, with one user reporting spending $200 in 4-5 days, expressing that the code quality is worse than Sonnet or Gemini but potentially better than Copilot due to better prompts and tools.
- Another member suggested that itâs better to guide summarization manually rather than relying on the Auto model when the context is full.
- Users Report Cursor Chat History Loss After Update: Several users reported losing their chat history after updating Cursor, with one user losing a monthâs worth of project work.
- One user recommended checking the location where Cursor stores chat history files, though another confirmed that the chats were not in the location they are normally stored.
- Community Seeks Fine-Tuning Advice for Chatbot: A user requested recommendations for fine-tuning a model for a web app chatbot, leading to a discussion on using prompt generators and reproducible schemas.
- It was advised to avoid fine-tuning until generating a revenue stream, and treat it as an optimization to use smaller models.
- Exploits and LLMs: A user reported that LLMs suck at exploit dev by default, which led to a conversation about hacking and a user saying they hear kaching when someone talks about it.
- A user replied, Last time it happened we were selling cheats for Call of duty during covid and all the stimulus checks stimulated us LOL.
Cursor Community â· #background-agents (2 messages):
Background agent transfer, API state transfer summary
- Agent Freeze Requires State Transfer: A member reported their background agent has frozen and is seeking a way to transfer it to another chat.
- They inquired about obtaining a state transfer summary, possibly via an API, as the agent is integrated into a website.
- API Request for State Transfer Summary: The member specifically asked if thereâs an API method to retrieve a state transfer summary for their background agent.
- This would facilitate moving the agentâs current state to a new chat environment after the freeze.
OpenRouter â· #app-showcase (2 messages):
Nano Banana Discord Bot, vibe coded bot
- Vibe Coded Bot drops Nano Banana: A member shared a Discord bot that uses Nano Banana over OpenRouter.
- The user clarified that they vibe coded the bot.
- Nano Banana powers Discord Bot: A Discord bot was created leveraging Nano Banana through OpenRouter to post content.
- The botâs source code is available on GitHub for anyone interested in exploring or contributing.
OpenRouter â· #general (230 messagesđ„đ„):
DeepSeek models, Claude-sonnet-4 problems, DeepSeek 3.1 cheapest price, Submodel.ai promotion, OpenRouter billing questions
- DeepSeek Free Models Speak Gibberish: Some users report that free DeepSeek models are generating gibberish, while the paid ones are functioning correctly.
- DeepSeek 3.1 Pricing: A user asked about the cheapest and most stable source for DeepSeek 3.1, with Synthetic.new being mentioned at $20/month, but another user called the official rates a ârip off.â
- Another user suggested Submodel, but a mod cautioned that they need to wait in line like everyone else who wants to be on the platform.
- Agent Framework SDK Experiences Shared: Members discussed their experience with Agent Framework SDKs like OpenAI Agent SDK, AI SDK, and LangChain JS with OpenRouter, noting that most require patching due to non-standard schemas from various providers.
- One member plans to roll their own solution with BAML integration, emphasizing that itâs just HTTP anyway.
- ChutesAI Subscriber Faces 429 Errors: A ChutesAI subscriber is experiencing 429 rate limit errors and credit issues when using OpenRouter with a BYOK, specifically on Chub but not on JanitorAI.
- Despite verifying the correct API key and private key usage, the problem persists, and the user has tried various solutions to no avail; the issue seems to be specific to routing through Chutes on Chub.
- Gemini-2.5 Rate Limits and Credit Drain Debated: Users are encountering 429 rate limit errors with Gemini-2.5-flash-image-preview:free, even after paying to increase rate limits, and suspect itâs an OpenRouter issue.
- One user reported OpenRouter drained all their credits due to a bug with Gemini image, and a mod confirmed refunds will be issued soon.
OpenRouter â· #discussion (5 messages):
Google Antitrust, Yahoo Chrome, Minor UI suggestion
- Google Faces Antitrust Ruling: A member linked to a CNBC article about Google facing an antitrust ruling.
- The member commented it was truly remarkable.
- Yahoo Considering Buying Chrome: A member expressed a morbid fascination with the hypothetical scenario of Yahoo acquiring Chrome.
- They mentioned that a truly sick part of me wanted to see the purchase of Chrome by Yahoo! play out.
- Request to improve UI Padding: A member suggested a minor UI tweak, specifically suggesting to remove the padding-bottom on the outer box, and move it to the inner box.
- The member suggests that this change would prevent the scroll wheel from obscuring the text.
LM Studio â· #general (156 messagesđ„đ„):
LM Studio 4k context, LM Studio Nightly, AI Girlfriend goon chamber, LM Studio auto naming stories, Granite 4 memory
- Context Length Slows Down Inference: Inference slows down as context grows, with performance deltas becoming more obvious as context utilisation goes up to 20k.
- However, some users comically compared this context to an AI girlfriend.
- Bleeding Edge LM Studio: Users are requesting an LM Studio Nightly release for backends, especially for llama.cpp, to stay on the bleeding edge.
- It was suggested that the NSFW content should be moved to 4chan.
- Granite Experts: The Granite 4 preview can use 62 experts, meaning more memory (VRAM) is required.
- However, using anything but the default experts results in worse results in MOST cases, according to members.
- LM Studio struggles Windows Auto-Upgrader: The Windows auto-upgrader is unable to update install until manual removal of existing LM Studio due to long path names.
- The workaround is to manually remove the problematic directory (
C:\Users\[USERNAME]\AppData\Local\Programs\LM Studio\resources\app\.webpack\bin\extensions\backends\vendor\_amphibian\cpython3.11-win-x86@2\Lib\site-packages\pkg_resources\tests\data\my-test-package_unpacked-egg\my_test_package-1.0-py3.7.egg\EGG-INFO
) and then run the downloaded installer.
- The workaround is to manually remove the problematic directory (
- No Love for Legacy CPUs: Users are running into errors because LM Studio requires AVX2 instructions, and older CPUs like the FX8300 donât support them.
- As a result, LM Studio refuses to run, even if the user intends to offload computations to the GPU, some users have written their own solutions in response.
LM Studio â· #hardware-discussion (77 messagesđ„đ„):
Limiting GPU Power Draw, DDR4 Server for LLMs, GPU Recommendations (3060, Titan Xp, A4000), MoE offload for Qwen3-235B, Multi-GPU bandwidth with dual CPUs
- Capping GPU Power Consumption: Users discussed limiting power draw in GPU drivers, suggesting MSI Afterburner as a tool to control power per GPU.
- One user was planning to limit power consumption due to running very large LLM models on a new server with 512GB of DDR4 RAM.
- Scoping out Budget GPU Boost: Members discussed various GPU options for a server, including used 3060 12GB cards (
$250-300), Titan Xp cards ($200-250), and considering RTX A4000 16GB cards ($700 each).- A suggestion was made to consider the 3060 12GB over older cards due to
GDDR6
improvements, linking this MSI GeForce RTX 3060 VENTUS.
- A suggestion was made to consider the 3060 12GB over older cards due to
- Guesstimating MoE VRAM Consumption: A user inquired about the VRAM needed for MoE offload for Qwen3-235B, wondering if they could run it with CPU context offload using a 1080.
- Another member estimated 11+GB for a 4-bit quantization, based on 22B active params, but expressed uncertainty about the exact mechanics.
- Routing Multi-GPU Bandwidth Bonanza: A user pondered how to best load GPU bandwidth between dual CPUs, considering whether to have 2x GPUs on one CPU or 1x GPU on each PCIe slot.
- Another member suggested that putting all GPUs on one PCI root complex would âprobablyâ reduce latency for GPU-GPU traffic, sharing a DigitalOcean tutorial on splitting LLMs across multiple GPUs.
- Intel Arc Pro B50 Bargain Find: One member stumbled upon the Intel Arc Pro B50 16GB workstation GPU listed for $350.
- The user exclaimed they knew where that card will be going.
Eleuther â· #general (14 messagesđ„):
3D modeling, Latent knowledge in large foundation models, SPAR applications
- Blender skills blossom after 10 hours: A member mentioned they can 3D model in Blender after only 10 hours of learning, implying current AI isnât up to par.
- They compared it to Uber Eats to emphasize the superficiality of some AI applications.
- Foundation Models Underutilize Latent Knowledge: A member suggested large foundation models are underutilizing latent knowledge, calling it low hanging fruit.
- They compared current progress to achievements in AIME and IMO problem-solving in the last year.
- SPAR Application Status Updates: Two members inquired about the results of their SPAR applications.
- Both confirmed they have not received any updates yet.
Eleuther â· #research (128 messagesđ„đ„):
Normalizing Flows, CoT/RL limitations, Diffusion Models, Adaptive Computation, Fixed-Point Problems
- Recursion Reigns: Turing Completeness calls for Serial Operations!: Discussion posited that serial operations should occur via recursion due to Turing Completeness, suggesting an architecture that could adaptively search large spaces, unlike CoT/RL which were dismissed as mere band-aids.
- The discussion then pivoted toward reasoning in the latent space over token space to avoid the pitfalls of viewing hard tasks as fixed-point problems with concrete ground truths, further clarifying views on Adaptive Computation.
- Diffusion Dissection: Cheaper Inference or Training Triumph?: The cheaper inference from parallel generation of tokens was cited as the core reason for Diffusion LMs hype, while its better training objective helps avoid certain failures and biases.
- Members stated that capabilities require some serial compute thus falling back to AR, but members agreed there were limitations of this approach.
- Latent Logic: Tool Use Transcends Token Space!: Members debated on tool use in latent space vs token space.
- A member noted that you can imagine dirty solutions like a tool head that consumes the latent, which might be less stable.
- Brainy Business: Animal-Level Learning Before Human Heights?: The economics of intelligence were questioned, suggesting that substantial animal brain level learning might be necessary before achieving human brain level learning, wondering if we augment existing LLMs with dog level RL I donât think they get much better but I think it costs a ton.
- It was analogized to distinct modes of personal reasoning, one involving explicit linguistic articulation and another operating in a less definite space of ideas, with one member describing this second mode as globs of semantic groups put together that are shuffled around until they coalesce.
- Sweep Stakes: Tuned for Longer Training with Weight Decay!: Debate ensued around competing results from two papers and another paper, discussing disparities in the performance of the Muon Optimizer and the importance of proper sweeping and tuning duration with weight decay.
- Sweeping with a subset of data might suggest no WD is optimal, but tuning for longer reveals its benefit: compute = luck.
Eleuther â· #lm-thunderdome (34 messagesđ„):
LM Eval Harness, MMLU Task Configuration, Steering Vectors Implementation, Attention Heads Steering, Model's Response Recording
- Debugging MMLU with LM Eval Harness: A member sought help implementing a function with forward hooks to add a steering vector to different attention heads while running MMLU with the LM Eval harness; the sequence length decreased consistently, causing confusion, and posted config details and relevant code here.
- Another member pointed out that the harness inputs
[:,-1]
tokens to calculate the logprobs of the prefill, truncating the sequence, and suggested modifying this line to remove the truncation if needed, as seen here.
- Another member pointed out that the harness inputs
- LM Eval Steering Vector Support: A member highlighted that LM Eval harness already supports steering vectors, advising against manual implementation and linking to relevant documentation, which can be found here.
- It was also mentioned that the steering vector implementation works on activations as well as residuals, and that in-depth explanations are provided on how to format the steering vector data in the
SteeredModel
docstring, available here.
- It was also mentioned that the steering vector implementation works on activations as well as residuals, and that in-depth explanations are provided on how to format the steering vector data in the
- Attention Heads Steering Imminent: A member announced a pull request to add support for steering individual attention heads, which can be found here.
- The goal is to evaluate models steered with pre-prepared steering vectors saved in a
.pt
file or a reference to an SAE/transcoder feature used as a steering vector.
- The goal is to evaluate models steered with pre-prepared steering vectors saved in a
- Modelâs Response: Forward Pass vs. Generate: In the specified task configuration, it was clarified that for generate tasks like gsm8k and minerva_math, the modelâs response is recorded through a generate call.
- However, for multiple-choice tasks, the modelâs response is recorded through a standard forward pass.
GPU MODE â· #general (21 messagesđ„):
PyTorch Conference, CUDA data pipelines, MLPerf benchmarks, Hardware optimizations, NVIDIA Hopper Architecture
- Triton Conference announced for 2025: A member shared a link to the Triton Conference 2025, which focuses on the Triton programming language and related topics.
- Cool CUDA Pipeline Projects Beckon: A member inquired about cool toy projects to explore CUDA-enabled data pipelines, specifically looking for well-defined problems without logistical issues.
- Another member suggested mixing DALI with cuVS for an optimal setup and also wondered about MLPerf-like standards or benchmarks for data pipelines and processing.
- NVIDIA Hardware Optimizations Uncovered: A member sought resources for understanding hardware-specific optimizations, particularly for the H100 GPU.
- Another member shared links to microbenchmarking papers dissecting the NVIDIA Hopper, Turing T4, and Volta GPU architectures (Hopper paper, Turing T4 paper, Volta paper).
- NVIDIA Ampere Architecture Explored: In a discussion about GPU-specific features, a member shared a link to an NVIDIA on-demand session about the Ampere architecture (Ampere session).
- Blackwell Architecture Tech Overview Teased: During the discussion about hardware optimizations, a user mentioned that for Blackwell, they only knew of a simple tech overview released by NVIDIA, but considered it pretty good.
GPU MODE â· #triton (3 messages):
Microsoft Teams Meeting, Meeting Details
- Microsoft Teams Meeting Scheduled: A meeting is starting in 5 minutes on Microsoft Teams, and a user shared a join link.
- The message included a meeting ID (283 039 414 385 5) and passcode (XW6c3ZC2).
- Dial-in Details Provided: Dial-in details were shared for the Microsoft Teams meeting, including a Vancouver local number (+1 778-800-9740,,819312747#) and a phone conference ID (819 312 747#).
- Information was also given for joining on a video conferencing device using a tenant key ([email protected]) and video ID (118 771 827 4).
GPU MODE â· #cuda (5 messages):
Intra-device Parallelization, CUDA-level FSDP, Register and Shared Memory Usage, NVCC Half-Precision Optimization
- Intra-Device Parallelization: FSDP at CUDA Level: A member inquired about a name for parallelizing the load of weights for subsequent layers alongside computations of the current layer, similar to FSDP, but at the CUDA level within a single device, potentially using memcpy_async.
- They linked the CUDA documentation on collectives and an NVIDIA blog post on data movement to illustrate the concept.
- Deep Dive into Register and Shared Memory: A member sought clarification on how register and shared memory usage function on an SM (Streaming Multiprocessor), questioning if developers can explicitly control or subdivide these resources to pack more values by reducing precision.
- They specifically asked if half-precision types (16-bit) could be used such that two values occupy a single 32-bit register, or if this is solely managed by the compiler and hardware using intrinsics like __half2.
- NVCC: Vectorizing Halves to Save Registers?: A member inquired whether nvcc automatically converts two half-precision floating-point numbers into a vector of half to save registers.
- This optimization would potentially improve memory usage by compacting data structures.
GPU MODE â· #torch (12 messagesđ„):
Torch.compile and Triton Kernels, Kernel Fusion with Torch.compile, Triton OP Registration
- Torch.compile Doesnât Fuse Into Triton Kernels?: A member inquired whether
torch.compile
fuses surrounding code into custom Triton kernels, questioning the fusion capabilities around specialized ops.- Another member responded that it depends on the captured graph and suggested using
TORCH_LOGS="output_code"
to inspect the generated code, but ultimately confirmed that torch.compile does not fuse operations into user-defined triton kernels.
- Another member responded that it depends on the captured graph and suggested using
- Triton OP Registration Inquiry: During a discussion about
torch.compile
behavior, a member asked if a kernel should be registered withtriton_op
andwrap_triton
for fusion to occur.- A Gist was shared to test kernel fusion, but it was noted that relying too much on the compiler for fusions is not advisable, as the example is numerically unstable for large MNK.
- Fusion Barrier Forms in Torch.compile: A member suggested that
torch.compile
creates a fusion barrier before and after a specialized op with its own Triton kernel, resulting in multiple kernel launches.- Even when manual fusion is possible, the discussion leaned towards the compiler not automatically fusing simple primitive ops surrounding specialized ops with Triton kernels.
GPU MODE â· #jobs (1 messages):
Sony Computer Vision Job Posting
- Sony Eyes Computer Vision Whiz: Sony is hiring for a Computer Vision role, as advertised in this LinkedIn post.
- AI Engineer Needed: The job description is looking for someone to help develop AI.
GPU MODE â· #torchao (20 messagesđ„):
TorchAO Installation, cu128 image, torch2.8, MXFP8 Training
- TorchAO Nightly Builds Mismatch Torch Versions: Members reported
torchao
nightly builds breaking due to a mismatch between the Torch version it was built against (2.9) and the Torch version they were trying to import it with (2.8), and suggest checking issue #2919.- The fix was to use
pip install torchao==0.13.0 --extra-index-url https://download.pytorch.org/whl/test/cu128
.
- The fix was to use
- Troubles Installing TorchAO 0.13.0 Prerelease: A member encountered an
ImportError
when trying to importNVFP4InferenceConfig
fromtorchao.prototype.mx_formats
in the 0.13.0 prerelease with error message cannot import name âmxfp8_cudaâ from âtorchao.prototypeâ, and determined installing from source worked.- The root cause was a missing build flag for sm100, and a fix is in the works via PR #2933 and issue #2932.
- NVFP4 Inference Users Unblocked: A fix is incoming in the short term for NVFP4 inference in version 0.13.0.
- It was reported that users will be unblocked by the short term fix as the kernel in question is only used for MXFP8 training.
GPU MODE â· #irl-meetup (1 messages):
Project Contributions
- Suggest Project Contribution Channel: A member suggested that others look at a specific project contribution channel, <#1373414141427191809>.
- Additional Placeholder Topic: This is a placeholder to meet the minimum items requirement.
- Further details can be added as needed.
GPU MODE â· #intel (1 messages):
Gaudi 2, Gaudi performance
- Gaudi 2 still champ!: A member said that Gaudi 2 is still a great product, especially regarding performance.
- Gaudi Expert Available: A member who works on Gaudi offered to answer questions about it.
GPU MODE â· #self-promotion (6 messages):
AI-Generated Metal Kernels, PyTorch to Low-Level Kernels, MPS Eager & torch.compile Backend, Kernel LLM Generation, BackendBench Correctness Checking
- AI Codegen Zaps Speedup to Metal Kernels: A team achieved a 1.87X speedup by going straight from PyTorch to low-level Metal kernels with AI codegen, described in their blog post.
- Sharing Generated Kernels for Scrutiny: A member requested a folder with all the generated kernels, as another member maintaining the MPS eager and torch.compile backend offered to share the kernels and timing results, inviting feedback on potential correctness issues.
- He also mentioned his work on kernel LLM generation to support all PyTorch operators.
- BackendBench Probes Correctness: A member noted potential correctness issues and suggested using BackendBench for more thorough checking than KernelBench.
- The team responded that they used KernelBench but were excited about the general direction.
- Suspicions Swirl around Speedup Claims: Some are skeptical about the claims of 1000x speed up, suggesting it might stem from a lack of synchronization at the end of benchmark blog.
- The team is asked to propose a PR: any perf gains will be beneficial for PyTorch users
- Bypassing CPP Binding for Kernel Invocation: A member pointed out that a cpp binding is no longer needed, as one can use torch.mps.compile_shader to directly invoke kernels.
- They also suggested submitting a PR with the kernels, as any performance gains would benefit PyTorch users.
GPU MODE â· #thunderkittens (1 messages):
B200 attention kernel
- Seeking functional B200 attention kernel: A member inquired about a B200 attention kernel they wanted to test, but found it broken on the main branch.
- They asked if there was a specific branch or patch available to try it out.
- B200 Kernel Troubles: A user reported encountering issues with the B200 attention kernel on the main branch.
- They are seeking a working version, either as a separate branch or a patch.
GPU MODE â· #submissions (11 messagesđ„):
MI300x8, amd-all2all leaderboard
- MI300x8 All2All Records Crushed: A member achieved first place on the
amd-all2all
leaderboard for MI300x8 with a submission id34854
at 1361 ”s. - MI300x8 Race Tightens: Multiple submissions were made to the
amd-all2all
leaderboard for MI300x8, with times ranging from 2.55 ms to 22.0 ms. - MI300x8 Leaderboard Competition: One user secured third place on MI300x8 with 2.57 ms on the
amd-all2all
leaderboard.
GPU MODE â· #factorio-learning-env (7 messages):
Game Feedback, Request for Guidance
- Positive Game Feedback Provided: A player indicated they enjoyed the game after playing it for 3 hours.
- They simply stated, *ânice game.â
- Player Seeks Guidance: A player is looking for advice on what to do next within the game.
- They are seeking direction after already spending a significant amount of time playing.
GPU MODE â· #amd-competition (17 messagesđ„):
Submitting HIP Kernels, iris library, UI exit code info
- HIP kernels are welcome!: The submitted solution can include Python files, and HIP kernels (with an additional build script) that will be wrapped by a Python API, as exemplified by the reference kernels.
- Amd person to add iris library!: The iris library might be added to the environment and a member has already forwarded the request to the AMD infra manager.
- UI will get exit code: The UI will be updated to provide more info for exit code info.
- How can I access the hardware for the competition?: Members can make submissions without SSH access; see the docs page.
GPU MODE â· #cutlass (4 messages):
CUTLASS 2.x interfaces, Hopper F8 performance, kind::mxf4nvf4.block_scale vs kind::mxf8f6f4.block_scale, Github bug reports
- CUTLASS 2.x deprecated for better interfaces: Members noted that CUTLASS 2.x interfaces are largely not used anymore and that 3.x and especially 4.x have much better docs.
- One user stated that version 4.x is 4x faster compared to hopper f8.
- mxf4nvf4 smokes mxf8f6f4:
kind::mxf4nvf4.block_scale
is 4x, but the question was related to doing mxfp4 viakind::mxf8f6f4.block_scale
.- One member asked if
mxf4nvf4
is 2x faster thanmxf8f6f4
for the exact same mxfp4 inputs, or am I missing something?
- One member asked if
- Github issues bug reports: A member requested that the user file a bug on Github issues.
- No specific reason was given.
GPU MODE â· #multi-gpu (20 messagesđ„):
Multi-GPU Development, Distributed Kernels, AMD Challenge 2025, NVLink vs PCIe, Fused Kernels
- Multi-GPU Nirvana: Cloud vs Local Setup: Discussion revolves around setting up a multi-GPU development environment, with the cloud (e.g., Google Cloudâs N1 series with 4x Tesla T4 GPUs via Google Cloud Compute Docs) being preferred over a local setup to avoid compatibility issues.
- The goal is to develop multi-GPU algorithms without immediately needing top-tier performance, focusing on understanding tools for mapping mathematical algorithms to hardware, and accessing machines via SSH from a Macbook.
- NVLink vs PCIe: A Tale of Two Interconnects: NVLink and PCIe are logically similar (memory accessed via loads/stores and dereferencing pointers) but have different features; a significant NVLink-specific feature is Multicast memory.
- The user emphasized focusing on single node setups to exclude network concerns, noting that they are interested in NVLink/NVSwitch but PCIe is acceptable in the short term.
- Fused Kernels: The Holy Grail of Multi-GPU: The user expresses interest in implementing distributed fused kernels to fully utilize multiple GPUs, particularly for large matrix multiplications.
- This involves combining computation and communication within the same kernel, differentiating it from separately handling kernels and communication, with an example of fusing a matrix multiplication with an AllGather or AllReduce operation.
- NCCL/NVSHMEM APIs: Abstraction vs Fine-Grained Control: Implementing toy versions of DP/TP/PP/Zero can be done from scratch with P2P loads/stores or using NCCL/NVSHMEM APIs, depending on the desired level of control and tolerance for library calls.
- The choice depends on how fine-grained the work needs to be and how tolerant the implementation is of library calls; NCCL knows how to select kernels and settings based on device connections, simplifying the user interface to
ncclSend
andncclRecv
.
- The choice depends on how fine-grained the work needs to be and how tolerant the implementation is of library calls; NCCL knows how to select kernels and settings based on device connections, simplifying the user interface to
GPU MODE â· #low-bit-training (1 messages):
MXFP8 pre-training, TorchAO MXFP8, Crusoe B200 Cluster
- MXFP8 Recipes Unveiled for LLM Pre-training: A new paper, Recipes for Pre-training LLMs with MXFP8, has been published, providing guidance on using MXFP8 for pre-training Large Language Models.
- TorchAO MXFP8 and Crusoe B200 speed up pre-training: PyTorch announced accelerated 2K-scale pre-training by up to 1.28x with TorchAO MXFP8 and TorchTitan on the Crusoe B200 Cluster.
HuggingFace â· #general (93 messagesđ„đ„):
Deepseek API, Llama 3.2 vision 11B on Macbook Pro M4, Quantized Models, HF Spaces, Python Learning
- Deepseek API Found, but Slow: A member found a free Deepseek API, noting itâs a little slow but useful.
- They expressed satisfaction because itâs free.
- M4 Macbook Pro Fails to Run Llama 3.2 Vision Model: A user with a Macbook Pro M4 with 24 GB RAM reported failing to run Llama 3.2 vision 11B, with the system using 20 GB of memory without output.
- Another user suggested it might be offloaded to swap memory and recommended trying quantized versions or lower context length such as Q4.
- Anthropic Grew Quickly: In response to this tweet, members noted that Anthropic tripled in size in approximately 5 months, growing from ~60B to 180B.
- Another user jokingly said yk what else is huge.
- Anthropic Settled Copyright Case: Members discussed that Anthropic settled their copyright case out of court and theyâre going to announce the terms publicly soon-ish.
- While the settlement amount is still unknown, one member mentioned that the announcement of this investment and the settlement of this case are clearly related, really fascinating.
- Chinese AI Models Less Gaslighting: A member noted that Chinese AI models tend to gaslight less, pointing to Qwen as an example.
- Another member said they are a big fan of Qwen for providing them with ideas of what can be wrong, and that it follows the format well.
HuggingFace â· #today-im-learning (3 messages):
link request, language studies
- Link request goes unanswered: Two members requested a link from another member without specifying the content of the link.
- The request was not fulfilled within the given message history.
- English is the only language: A member asked another member if they studied Japanese.
- The other member responded that they only study English.
HuggingFace â· #i-made-this (2 messages):
Datatune Agents release, DeepResearch AI Agents, Token Optimization
- Datatune Agents Released: A new release of Datatune Agents enables data transformations at a per-row level, preserving contextual understanding using natural language prompts.
- Key features include row-level map() and filter() operations, Dask DataFrames support for scalability, and compatibility with multiple LLM backends like OpenAI, Azure, and Ollama via LiteLLM.
- DeepResearch AI Agents Treasure Hunt: A new post was published about DeepResearch AI Agents and how they dive deep into research papers ensuring broad coverage, balancing depth and breadth.
- The agents code is available on GitHub, and the author is seeking feedback from the community.
- Datatune optimizes Tokens & Cost: Datatune gives explicit control over which columns are sent to the LLM, reducing token usage and API cost.
- This is achieved through
input_fields
to send only relevant columns, automatic batching, metadata handling, and support for setting tokens-per-minute and requests-per-minute limits, defaulting to known model limits like GPT-3.5.
- This is achieved through
HuggingFace â· #computer-vision (2 messages):
Detectron2 setup, Automated test cases
- Detectron2 Setup Sought: A member asked for help setting up Detectron2 on their local PC and converting it into a wheel format.
- No solutions were offered in the provided messages.
- Computer Use Functionality Explored: A member inquired about experiences using computer use functionality for discovering and automating test cases.
- They specifically asked about any limitations encountered during the process.
HuggingFace â· #NLP (1 messages):
cakiki: <@596574356327628850> please donât cross-post
Yannick Kilcher â· #general (66 messagesđ„đ„):
Automated weapons, Public transportation in the US, Drones as a cheap attacking option, DeepMind's potential with massive funding, Quantum Physics of AGI
- Automated weapons debate rages!: Members discuss the ethics and practicalities of automated weapons, with some arguing that they could reduce harm compared to human soldiers, while others fear human rights abuses.
- Some argue the fear is governments abusing them for human rights violations, which is already happening with bombs, nukes and drones.
- American Public Transit: a Missed Opportunity?: The members discuss how public transportation in the US is unsafe and humiliating, highlighting a missed opportunity to reduce accidents and improve urban mobility.
- The conversation suggests that if humans could only drive when in the proper state of mind, accidents could be reduced by 90%.
- Drone swarms as a security threat: Members discuss the potential for drones combined with trucks to be used as cheap attacking options, necessitating a security framework to deal with non-state actors.
- One suggests governments focus on banning chemicals required to make drones.
- Waymo Driving the Subscription Economy: Members discuss autonomous driving subscription-style payments, like for Waymo and envision hybrid approaches.
- One member stated I am still sometimes dreaming of a hybrid that can provide the best of both worlds. I think that is where most of the tech bros are coming from that just end up reinventing trains or busses every few months.
- Quantum Physics: the key to AGI?: Members shared a YouTube video discussing whether existing AI architectures can achieve AGI/ASI from a quantum physicist POV.
- A member stated wonder where could DeepMind get to if they threw 7 trillion dollars onto this problem.
Yannick Kilcher â· #paper-discussion (16 messagesđ„):
Mamba Weakness, Online learning, Neuromorphic architecture, Bad ML learning resources
- Mambaâs Fixed Transition Matrix Faces Flak: A member critiqued Mambaâs fixed transition matrix, noting that it cannot replicate true state machines and may not preserve contextually relevant information, referencing the paper The Illusion of State in State-Space Models.
- They suggested cures such as adding a nonlinearity between state transitions or making state transition matrices dependent on the input, as done in Liquid Structural State-Space Models.
- AIâs Achilles Heel: Absence of Online Learning: A member claimed that the single biggest issue in AI is lack of online learning.
- Neuroscience and Neuromorphic Architecture Navigated: In response to a discussion about general intelligence, a member suggested exploring neuroscience and neuromorphic architecture via Artem Kirsanovâs YouTube channel and deepsouth.org.au.
- ML Learning Resources Mocked as Mostly Mediocre: One member thinks most ML learning resources are just as bad because they are a product of the time when nobody knew better.
- They compared the quality of early ML resources to all the terrible php tutorials or terrible c learning books etc.
Yannick Kilcher â· #ml-news (3 messages):
Ladybird Browser, FOSS browser alternative to Chrome
- Ladybird: Chrome Alternative in the Works: A new FOSS browser called Ladybird is in development as a potential alternative to Chrome.
- Currently available for Linux and Mac OS, a member speculates that a Windows port may be developed if it gains popularity.
- FOSS Ethos Drives New Browser: The development of the Ladybird browser is driven by a commitment to the principles of Free and Open Source Software (FOSS).
- This ensures transparency, community involvement, and the freedom to modify and distribute the software, differentiating it from proprietary browsers.
Latent Space â· #ai-general-chat (59 messagesđ„đ„):
Agentic Design Patterns, Claude Code, AI Compute Arms Race, Open Source Deep Research Agent, Exa Series B
- Google Engineer Drops Agentic Design Patterns Tome: A Google engineer released a 400-page draft of Agentic Design Patterns covering advanced prompting, multi-agent systems, tool use, and MCP, available on Google Docs and for pre-order as a Springer edition.
- The community shared links to the doc, NotebookLM, and Amazon pre-order, but some noted the docâs editing access wasnât disabled, leading to concerns about alterations.
- Claude Code Declared âAI DOSâ Moment: Nikunj Kothari argues that Claude Code is a watershed momentâlike DOS was for PCsâbecause it collapses technical barriers and lets non-coders build software by imagination alone, as noted in this tweet.
- Commenters debated whether weâre still in a command-line era, how creatives can harness it, and if the real bottleneck is now imagination rather than coding skill.
- Compute Arms Race Sparks Debate on Efficiency: Discussion highlights massive compute spending by OpenAI & Anthropicâ$13B pre-pays to secure GPUs/energyâwhile observers question diminishing returns and unsustainable power/water usage, stemming from this X post.
- The thread swings between doomsayers predicting a funding crash and optimists betting on small-model efficiency or breakthrough algorithms to obsolete the mega-cluster strategy.
- Open-Source Recipe Trains Deep Research Agent for Pennies: Kyle Corbitt shares a recipe using open-source tools that lets developers train a Qwen-2.5 14B model to surpass Sonnet-4 on the DeepResearch benchmark in just 30 H200 hours (~$350), based on this tweet.
- The process includes SFT for basic skills, GRPO for utilization, and benchmark evaluation, producing a model competitive with Gemini 2.5 Pro, OpenAI Deep Research, and Claude Research.
- Exa raises $85M Series B at $700M Valuation: Exa announced an $85M Series B raise at a $700M valuation led by Benchmark, positioning itself as the search engine for AI according to this tweet.
- Harmonicâs system flagged the round two weeks in advance, prompting discussion about turning deal flow alerts into a product.
Latent Space â· #genmedia-creative-ai (5 messages):
AI-generated worlds, Immersive storytelling, Higgsfield platform, Future of Sci-Fi
- AI-Generated Worlds Spark Excitement: Justine Moore shared an AI-generated world created by aim_not_here using Higgsfield, igniting excited discussion around this emerging immersive storytelling format.
- Commenters praised it as a window into creatorsâ minds and predicted major sci-fi innovations ahead.
- Fiction is discovered!: Following the release of the AI-generated world, some commenters remarked on tech bros rediscovering fiction.
Nous Research AI â· #announcements (1 messages):
Husky Holdâem Bench, OS pokerbots eval, Claude 4 Sonnet, Hermes 4 405B
- Husky Holdâem Bench Debuts as First OS Pokerbots Eval: The Husky Holdâem Bench has been introduced as the first OS pokerbots eval, challenging models to implement policies in Python under time and memory constraints, as documented on huskybench.com.
- Claude 4 Sonnet Wins Pokerbots Comp: Claude 4 Sonnet leads the competition with a 57.9% average profit over 5k+ games, outperforming other models in a 6-player round-robin format.
- Opus came in second (31.9%) and Gemini trailed in third place (31.0%).
- Hermes 4 405B is Leading Open Model: The leading open model currently is Hermes 4 405B at -12.41% according to this tweet.
Nous Research AI â· #general (37 messagesđ„):
Hermes 4 vs other models, SillyTavern and Hermes, Prompt Compliance, LLM game benchmarks
- Hermes 4, Next-Gen Reasoning Champ!: Hermes 4 is the next generation of Hermes trained on top of Qwen3-14B, with training highlights including a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment.
- Training highlights include a dataset size increase from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens and a hybrid reasoning mode with explicit think segments.
- SillyTavern Savvy with Hermes: Members discussed leveraging SillyTavern for roleplay and vibes, noting its surprising math and coding capabilities.
- It was recommended that because Hermes-4-14B is based on Qwen-3, the sampler settings should be similar, using temp: 0.6, temp-k: 20, temp-p: 85 for Thinking-Mode and temp: 0.7, temp-k: 20-40, temp-p: 95 for Instruct-Mode; additionally, use ChatML for 14B, Llama 3 Instruct for 70B and 405B.
- Podcast Ponders Prompt Compliance: A member suggested a podcast topic on how Hermes 4âs internal dialog deals with prompt compliance, particularly breaking down problems like âdevelop a super villain characterâ that most LLMs refuse.
- Another member suggested contacting a specific user who did all the CoT explorations in the paper.
- LLMs Ace Game Benchmarks?: Members inquired about the state of LLM game benchmarks for chess/go/connect4/shogi/xiangqi elo.
- One member shared a link to a leaderboard at TextArena.ai that includes chess benchmarks.
Nous Research AI â· #research-papers (1 messages):
Generative UI, AI-First Interaction Patterns, Transformation Design, Business Applications
- Generative UI/AI Research Doc Needed: A member requested assistance with writing a research document and case study on Generative UI and AI-first interaction patterns.
- Help requested on Generative UI and AI: A member needed help getting started with a research document about Generative UI and AI.
Nous Research AI â· #research-papers (1 messages):
Generative UI Research, AI-First Interaction Patterns, Transformation Design Case Study, Business Impact of Generative UI
- Generative UI Research Document Needed: A member is writing a research document and case study on Generative UI and AI-first interaction patterns.
- They need help getting started and understanding transformation design and the business implications.
- Call for Help on Generative UI and AI-First Interactions: A member is seeking assistance with their research document and case study focused on Generative UI, AI-first interaction patterns, transformation design, and business impact.
- They are looking for guidance to kickstart their project and gain a better understanding of the subject matter.
DSPy â· #general (34 messagesđ„):
DSPy Data Leakage Concerns, DSPy Data Splits Clarification (train/val/dev/test), MLflow Integration with DSPy, Context Compression Experiments, Improving dspy.Image Reliability
- DSPy Data Leakage worries defused!: A user raised concerns about potential test set data leakage in DSPy when using a trainset multiple times, suggesting that papers using DSPy in this way might be invalidated.
- However, it was clarified that optimizers in DSPy use a training set for training and a validation set for validation, with testing done on a completely separate test set, thus mitigating the data leakage risk.
- Data Splits Demystified: Train, Val, Dev, Test: It was clarified that DSPy uses four data splits: train (for building few-shot examples or instructions), val (for outer loop validation and selection), dev (for human iteration), and test (for pure evaluation once).
- The discussion emphasized that the multi-step plots (with curves) are based on the valset, while the final reported results are based on the testset, to avoid leakage/overfitting.
- MLflow & DSPy: A Budding Bromance: A user inquired about integrating MLflow with DSPy to capture prompts, referencing MLflowâs prompt registry features.
- The user noted the existence of mlflow.dspy and planned to experiment and report back.
- Context Compression Craze!: A member shared a link to Context Compression Prompt Experiments.
- The project aims at investigating and improving the reliability of
dspy.Image
for some providers.
- The project aims at investigating and improving the reliability of
dspy.Image
needs Reliability Refinement: A user posted a task to help investigate and improve the reliability ofdspy.Image
for some providers in this discord thread.- The discussion followed with someone sharing an attached image.
DSPy â· #examples (2 messages):
DSPy for Tax Automation, Amazon Purchase Extraction
- DSPy Automates Amazon Tax Data Extraction: A member used DSPy, attachments, and MLflow to extract data from Amazon purchases for tax purposes.
- The system identified items like âSinnvolle LĂŒckenfĂŒller fĂŒr soziales Lernen (Buch)â and calculated a total of EUR 104,55.
- Codex Powers Flawless Automation Workflow: A member used Codex to generate code for extracting Amazon purchase data and automatically renaming invoice files.
- The workflow outputs data into a .csv file with item names, total after tax, and a suggested filename like lehrmaterial-und-trinkflasche-bundle.
Moonshot AI (Kimi K-2) â· #announcements (1 messages):
Kimi Voucher Giveaway, New Kimi Coding Model, Scammer Alert
- Kimi Kicks off Voucher Giveaway!: The Moonshot Team announced a giveaway of 20 Ă $20 API vouchers for the community to test their new model, which has been juiced up with some crazy coding powers.
- To participate, users should jump into the #giveaway channel and react with the emoji before 8AM Beijing Time to enter the raffle.
- Exclusive Model Access via Voucher: The announcement emphasizes that only those with a voucher can access and test the latest model from Kimi.
- The team urged users to stay tuned for more updates, suggesting further developments and opportunities related to the model.
- Heads up for Kimi Scammers!: A warning was issued regarding scammers, advising users that legitimate Kimi team members will have a yellow role color in the server.
- The announcement explicitly states, If it ainât yellow, donât trust it, cautioning users to verify the authenticity of any direct messages received.
Moonshot AI (Kimi K-2) â· #general-chat (31 messagesđ„):
Kimi K2 Model performance, Slide Generation feature, Kimi K2 turbo Coder Pro plan, Model releases for VRAM
- Kimi K2 is top-tier for end-user interactions: A user stated that Kimi is the best model for end-user-facing interactions because itâs good at poring into the fine print, finding issues, and having its own opinion.
- The user thinks Moonshot should dominate in this domain and that Kimi is great at UX stuff and PowerPoint features.
- Moonshotâs Slide Generation Feature impresses: A user mentioned using the recently released slide generation feature and praised the coding enhancements, looking forward to seeing it enable even more professional task handling.
- They said this update in particular delivers exactly the coding enhancements they were hoping for.
- Request for Kimi K2 turbo Coder Pro: A user suggested a Kimi K2 turbo Coder Pro plan and added it as a product idea.
- Another user replied that Kimi should just make it a unified plan.
- Hopes for Model Releases for VRAM: A user inquired about plans to release models that can fit into 128 (V)RAM and 24 (V)RAM, such as 100-200b models like gpt-oss-120b and 30b models like gpt-oss-20b.
Modular (Mojo đ„) â· #mojo (5 messages):
Mojo SIMD, Rust AVX, Standard Library net/http module, community driven HTTP library, lightbug_http
- Mojo SIMD is FUN! Rust AVX not so much: A member said heâs loving how Mojo makes SIMD actually fun, manual AVX in Rust makes me burn more brain cells than I should! đ
- He asked whether something like a standard
net/http
style module becoming part of the stdlib.
- He asked whether something like a standard
- Modular favors a lean standard library: The general consensus is to keep the standard library very lean for the most part, and per lightbug_http, they have a community driven effort to build an HTTP library.
- Itâs fairly limited for the time being due to the lack of manual thread management and mojo-native cryptography libraries required to implement TLS support.
- Mojo Powers Fast Binary Search Engine: A member reports building a tiny binary search engine that crunches ~50k queries/sec over 2M docs single cored by parallelizing across SIMD lanes.
- He looks forward to HTTP support to turn it into a search-as-you-type.
Modular (Mojo đ„) â· #max (13 messagesđ„):
Mojo GPU on GTX 1080, Max backend for Torch, Turing minimum architecture, Reaching out to Modular team
- GTX 1080 can Mojo đ„ now!: A member confirmed that Mojo GPU functions are running correctly on their GTX 1080 after a patch went into the latest nightly.
- They will land a separate internal patch today to add changelog entries and listed support for Pascal GPUs, along with
sm_60
support for the Tesla P100.
- They will land a separate internal patch today to add changelog entries and listed support for Pascal GPUs, along with
- Max Backend for Torch gains steam!: A member is in the process of getting more time approved to work on the max backend for torch full time.
- They hope to discuss the engineering soundness with Modular team members, aiming for
torch.ones((2, 2), device="max_device")
to work on more GPUs than what is currently available with the latest CUDA.
- They hope to discuss the engineering soundness with Modular team members, aiming for
- Turing Architecture limit found in Mojo!: A member noted that there will be an error about Turing being the minimum-supported architecture if you try to build a graph on
sm_61
GPUs.- Their patch should lower that limit when it goes in today, so users may need to wait until the next nightly for basic graph use on these GPUs; amusingly, that may give broader compute capability than PyTorch which errors out with âPyTorch no longer supports this GPU because it is too old. The minimum cuda capability supported by this library is 7.5.â
- Discord is best way to reach Modular Team: A member advised that pinging them directly on Discord is the most reliable way to reach them.
- They noted their email inbox is flooded, also pointing out a new Modular team member as an excellent person to reach out to if anything falls through the cracks.
Manus.im Discord â· #general (7 messages):
Website deployment on basic plan, Grok as a Tool vs. Agent, Comparison of Grok and Manus
- Basic Plan Website Deployment Debacle?: A user inquired whether the basic plan allows for permanent website deployment.
- Another user succinctly responded, âit does notâ.
- Grokâs True Identity Revealed!: A user pointed out that Grok is a tool, not an agent.
- They emphasized this distinction in response to conversational context.
- Grok vs. Manus: A Non-Existent Comparison?: A user clarified they were not comparing Grok with Manus at all.
- This suggests a potential misunderstanding or off-topic tangent within the conversation.
tinygrad (George Hotz) â· #general (6 messages):
GPU Crash with HIPC Code, Multi-GPU Training Distress, Pyrender Testing, Uops Test Updates
- Dual 7900 XTX Cards cause GPU Crash: A member reported experiencing sudden crashes when reaching peak performance on dual 7900 XTX cards using HIPC code, specifically on the supported kernel 6.8 as mentioned in the ROCm site.
- They expressed distress due to multi-GPU training issues and wanting to prevent the GPU from crashing.
- Pyrender Testing request: A member asked if anyone is willing to test pyrender on the kernel dataset.
- No additional information was provided.
- test_linearizer_dumb Updated, other uops should follow: A member shared an update to
test_linearizer_dumb
(link to GitHub PR) and suggested updating other uops in the tests to match the new format.- They claim the new format is more readable and updatable and offered to fix the uops tests later.
aider (Paul Gauthier) â· #general (4 messages):
Codex vs. Aider, OpenAI KYC, GPT-5 Streaming
- GPT-5 KYC requirements cripple streaming: OpenAI requires KYC verification to use its image model and GPT-5 streaming features.
- Itâs possible to use GPT-5 without KYC, but only without streaming enabled.
- Codex: Aiderâs Thinking Clone?: A user expressed frustration with GPT-5âs processing time for simple requests and missing thinking streaming.
- Another member asked what is liked about Codex better than Aider, mentioning that Claude Code was originally designed to clone Aider.